Ask Slashdot: Practical Bitrot Detection For Backups?

PAR2 by Anonymous Coward · 2013-12-10 05:18 · Score: 5, Informative

http://www.quickpar.org.uk/
http://chuchusoft.com/par2_tbb/

Re:PAR2 by Anonymous Coward · 2013-12-10 05:40 · Score: 1

dvdisaster also uses Reed Solomon Codes, but lets you burn the data to a CD/DVD/BD in a single image:
http://dvdisaster.net/en/index.html
Re: PAR2 by Qzukk · 2013-12-10 06:46 · Score: 1

I'm glad one person remembers optical media and its lack of this side effect
Funny, I remember optical media being unreadable just months after it was burned. Sure, you can say don't use cheap media, but how do you know your media is good?

--
If I have been able to see further than others, it is because I bought a pair of binoculars.
Re: PAR2 by djsmiley · 2013-12-10 06:48 · Score: 1

yes, because rewritable disks have never gone wrong, right?

--
- http://www.milkme.co.uk
Re: PAR2 by Miamicanes · 2013-12-10 07:32 · Score: 4, Informative

Use non-LTH BD-R media. It's seriously the best media we've ever had for long-term archival storage, hands-down, no contest. Unlike DVD+/-R, it's phase-change magneto-optical WORM... the laser liquefies the plastic, the magnet orients little shiny planar mirrors, the plastic solidifies, and the bits are about as close to 'carved in stone' as you're likely to ever get. As a technology, it's not cheap... but it definitely minimizes the number of things that can go wrong over a ~25-year timeframe:
* decouples media from its player... the achilles heel of hard drive-based backup schemes. A broken hard drive means a spectacularly expensive data-recovery job. A broken BD drive means buying a new one.
* phase-change MO media doesn't bleach or darken with age... and if it's going to delaminate or anything (like early optical discs often do), it's overwhelmingly likely to happen sooner rather than later (while you still have the originals available to re-archive if necessary).
* I think we can safely accept that future evolution to optical discs will remain downwards-compatible with reading older media. Seriously, CDs are THIRTY YEARS OLD, and any Blu-Ray player from China can still play them just fine (plus everything that's ever been commonly burned/stamped into them). A 2037 Apple Eve might have the masses drooling over its legacy-free minimalist purity, but the rest of us will have a 600 petabyte optical drive manufactured by a sweatshop in Uganda or Haiti that can read old BD-R discs just fine (at least, after opening it up and soldering a wire across two pads on the circuit board to make it think it's supposed to be their $6,000 enterprise version instead).
Re: PAR2 by egarland · 2013-12-10 08:28 · Score: 1

Optical has had a good run, but I'm betting that in 2037, optical will be dying or dead.
There's a lot of theoretical improvements left in optical disk technology, but they're unlikely to become common or cheap. I see possibly one generation after Blu-ray before the consumer standards stop and the access to cheap technology to drive advancements in optical storage disappears. Spinning disk is largely thought of as the primary competitor, but what's going to give optical the biggest headache is flash.
Flash storage's non-existent power requirements, extremely high density, naturally long read-only lifespan, re-usability, and flexible expansion options make it poised to take over the world of archival storage if it can come anywhere near cost-parity. My bet is that it will make it.

--
set softtabstop=4 shiftwidth=4 expandtab nocp worlddomination
Re:PAR2 by JoshRosenbaum · 2013-12-10 08:56 · Score: 1

Multipar has superseded Quickpar. It allows multiple directories to be handled and is actually still being developed unlike most of the other old par programs.
http://multipar.eu/
http://hp.vector.co.jp/authors/VA021385/
Re: PAR2 by Anonymous Coward · 2013-12-10 09:02 · Score: 1

Flash isn't an archival medium. Once the electrons leave the gates, Elvis has left the building, the data is gone and gone for good. Maybe is someone makes a flash drive that has the ability to constantly check and repair itself (how much ECC is enough can be debated), with the capability to alert users that the drive is about to tank, and to plug in another one so the data can be copied to another drive before the lights go out.
Optical isn't going anywhere. Yes, one can stream Netflix, but bandwidth isn't increasing in a lot of areas around the globe. CDs, DVDs, and BD media will always have/need players. Plus, there is a good chance that a 10 year old CD will play and rip. Flash media has not been out long enough for us to know if in 10 years that the SD card with pictures sitting on the shelf will be usable or if the data will be completely gone.
Optical has a good ways to go before it is dead. Holographic storage has been a flash in the pan from the days of Tamarak to InPhase technologies. However, it is only a matter of time before we see the technology in the second generation after Blu-Ray (the generation after Blu-Ray has been finalized by Sony and Panasonic with 300GB disks initially.) From there, who knows... holographic storage can go into the terabytes without issue in theory, but I've yet to see a HVD in the wild.
Re: PAR2 by egarland · 2013-12-10 09:23 · Score: 1

> Flash isn't an archival medium.
Anything can be an archival medium, it's just a question of if it's good at it. Fash has been in large-scale use for 20 years now. It became the primary way computers stored their BIOS back in 95 so I think we have a fairly good understanding of it's long-term storage characteristics. Optical won't die today, and I expect the market to stay strong for 5 or 10 years, but in 20, it will be all but gone.

--
set softtabstop=4 shiftwidth=4 expandtab nocp worlddomination
Re: PAR2 by Joce640k · 2013-12-10 11:06 · Score: 1

. We'll see in 1,000 years if that claim holds up. ;-)
*as some would see it.
Nobody will ever know because there won't be a drive capable of reading it.

--
No sig today...
Re: PAR2 by Miamicanes · 2013-12-10 11:40 · Score: 3, Informative

EEPROM also happens to be the ancestor of SLC flash, not MLC, TLC or worse.
Flash is like a leaky bucket that starts out full of water, and gets drained to some level when a cell's value is set:
SLC == "The bucket is either totally empty (0), or has some water in it (1)"
MLC == "The bucket can be totally empty (00), non-empty to ~33% full (01), 33%-~66% full (10), or 66-100% full (10). After 1/3 the water leaks out, the cell's value is corrupt.
TLC == same idea as MLC, but the bucket has EIGHT levels instead of four. Do the math to figure out how much metaphorical water can leak out before the cell's value becomes corrupted.
BIOS eeproms are also a larger process than high-density flash, so the buckets themselves are larger while the leaks remain relatively constant in size. In other words, you're comparing a metaphorical 55 gallon drum with a slow drip that has to be completely empty to change from 1 to 0 to a thimble with 8 tick marks on the side and a leak of the same size.
Re: PAR2 by kesuki · 2013-12-10 14:41 · Score: 1

they said that about paper every time it's been invented, but the problem with paper is it's inability to handle too little humidity (dry rot) too much humidity (mildew etc) and it's tempting nesting site for insects that routinely eat tree leaves. oh and it's bitrate per energy put into it is atrocious especially if you throw in modern hermetically sealed deoxygenated and humidity controlled environs. but it is easy to copy, any schooled child can copy letters from one piece of paper to another. but computers are even more awesome for data sharing and copying. even if laws against it exist. but i digress. having an optical backup is fine, there are times where optical is necessary, but it doesn't prevent accidental damage of discs or make sorting them easier. bitrot detection is an underserved market. raid has it, bluray doesn't and some filesystems actively have deduplication to reduce the number of copies left undeleted are few. anyways, the best way to check for bitrot is by scanning the md5sum on them with a script that runs automatically on the server that sends out the files to the offsite and if bitrot is detected it simply requests the data from the off site storage. before it is lost.

--
https://www.gnu.org/philosophy/free-sw.html
Re: PAR2 by V+for+Vendetta · 2013-12-11 05:01 · Score: 1

In thousand years? Just ask your replicator (aka "3D printer" these days) to create one ...
Re: PAR2 by Gen_Music · 2013-12-11 11:21 · Score: 1

Flash's high density still can't hold a candle to holographic storage.

ZFS filesystem by Anonymous Coward · 2013-12-10 05:19 · Score: 5, Informative

One single cmd will do that,

zpool scrub

Re:ZFS filesystem by ravenswood1000 · 2013-12-10 05:20 · Score: 1

Yep, ZFS
Re:ZFS filesystem by vecctor · 2013-12-10 05:41 · Score: 5, Informative

Agreed, ZFS does exactly this, though without the remote file retrieval portion.
To elaborate:
http://en.wikipedia.org/wiki/ZFS#ZFS_data_integrity
End-to-end file system checksumming is built in, but by itself this will only tell you the files are corrupt. To get the automatic correction, you also need to use one of the RAID-Z modes (multiple drives in a software raid). OP said they wanted to avoid that, but for this kind of data I think it should be done. Having both RAID and an offsite copy is the best course.
You could combine it with some scripts inside a storage appliance (or old PC) using something like Nas4Free (http://www.nas4free.org/), but I'm not sure what it has "out of the box" for doing something like the remote file retrieval. What it would give is the drive health checks that OP was talking about; this can be done with both S.M.A.R.T. info and emailing error reports every time the system does a scrub of the data (which can be scheduled).
Building something like this may cost a bit more than for just an external drive, but for this kind of irreplaceable data it is worth it. A small atom server board with 3-4 drives attached would be plenty, would take minimal power, and would allow access to the data from anywhere (for automated offsite backup pushes, viewing files from other devices in the house, etc).
I run a nas4free box at home with RAID-Z3 and have been very happy with the capabilities. In this configuration you can lose 3 drives completely and not lose any data.

--
Why, yes I have been touched by His noodly appendage. And I plan to sue.
Re:ZFS filesystem by Guspaz · 2013-12-10 05:52 · Score: 5, Informative

You don't need raidz or multiple drives to get protection against corrupt blocks with ZFS. It supports ditto blocks, which basically just means mirrored copies of blocks. It tries to keep ditto blocks as far apart from eachother on the disk as possible.
By default, ZFS only uses ditto blocks for important filesystem metadata (the more important the data, the more copies). But you can tell it that you want to use ditto blocks on user data too. All you do is set the "copies" property:
# zfs set copies=2 tank
Re:ZFS filesystem by Mike+Kirk · 2013-12-10 06:04 · Score: 2, Informative

I'm another fan of backups to disks stitched together with ZFS. In the last year I've had two cases where "zfs scrub" started to report and correct errors in files one to two months in advance of a physical hard drive failure (I have it scheduled to run weekly). Eventually the drives faulted and were replaced, but I had plenty of warning, and RAIDZ2 kept everything humming along perfectly while I sourced replacements.
For offsite backups I currently rotate offline HDD's, but I should move to Cloud storage. Give a bit of my surplus space and bandwidth to someone like Symform, and in turn they give me a free little slice of the Cloud to have TrueCrypt archives mirrored into. Win-win!
Re:ZFS filesystem by x_t0ken_407 · 2013-12-10 08:36 · Score: 1

ZFS immediately came to mind when I read the summary.
Re:ZFS filesystem by cas2000 · 2013-12-10 12:58 · Score: 2

true, but you do need multiple disks (mirrored or raidz) to protect against drive failure.
two or more copies of your data on the one disk won't help at all if that disk dies.
fortunately, zfs can give you both raid-like multiple disk storage (mirroring and/or raidz) as well as errror detection and correction.
That ZFS_data_integrity link in the post you were replying to gives a pretty good summary of how it works.
The paragraphs immediately above that (titled 'Data integrity', 'Error rates in hard disks', and 'Silent data corruption') also give a good summary of why error-correcting filesystems like ZFS (and btrfs) are necessary, especially with the huge sizes of modern drives.
In fact, anyone interested should read the entire wikipedia article.
ps: neither raid nor ZFS is a substitute for backups. you still need backups of your data (preferably with off-site copies) to protect against accidental deletion or overwrite (snapshots can help with this if used intelligently prior to the event) or burglary or catastrophic damage like fire or flood.
Re:ZFS filesystem by Guspaz · 2013-12-11 10:31 · Score: 1

I agree, which is why I'm using raidz2. But that's not the problem I was suggesting a solution to. I was suggesting a solution to the problem of "data on single hard drive eventually goes corrupt, and I don't want to buy a second hard drive."

Checksums? by nine-times · 2013-12-10 05:21 · Score: 1

I don't know if there's a better solution, but you could store checksums of each archived file, and then periodically check the file against its checksum. It'd be a bit resource intensive to do, but it should work. I think some advanced filesystems can do automatic checksums (e.g. ZFS, BTRFS), but those may not be an option, and I'm not entirely sure how it works in practice.

Re:Checksums? by QuietLagoon · 2013-12-10 05:33 · Score: 2

I use checksums to check for bitrot.
.
Once a week, I use openssl to calculate a checksum for each file; and I write that checksum, along with the path/filename, to a file. The next week, I do the same thing, and I compare (diff) the prior checksum file with the current checksum file.
With about a terabyte of data, I've not seen any bitrot yet.
Long term, I plan to move to ZFS, as the server's disk capacity will be rising significantly.
Re:Checksums? by Anonymous Coward · 2013-12-10 05:39 · Score: 2, Interesting

Periodically checking them is the important part that no one seems to want to do.
A few years back we had a massive system failure and once we recovered the underlying problems and began recovery we found that most of the server image backup tapes for 6 months+ could not be loaded. The ops guys took a severe beating for it.
You think this stuff will never happen but it always does. We had triple redundancy with our own power backups but even that wasn't on a regular test cycle. Some maintenance guy left the switch open between floors for some reno job over a year prior and while the generators were running the power didn't make it to infrastructure.... it was as if hundreds of UPSs screamed at once and were silenced when failover didn't happen.
You really can't beat Murphy's Law, but with regular testing you can soften the effects.
Re:Checksums? by Waffle+Iron · 2013-12-10 05:50 · Score: 5, Informative

I never archive any significant amount of data without first running this script at the top:
find -type f -not -name md5sum.txt -print0|xargs -0 md5sum >> md5sum.txt
It's always good to run md5sum --check right after copying or burning the data. In the past, at least a couple of percent of all the DVDs that I've burned had some kind of immediate data error
(A while back, I rescanned a couple of hundred old DVDs that I burned ranging up to 10 years old, and I didn't find a single additional data error. I think that a lot of cases where people report that DVDs deteriorate over time, they never had good data on them in the first place and only discover it later.)
Re:Checksums? by QuietLagoon · 2013-12-10 06:00 · Score: 1

You are assuming you started with good files.
No assumption on my part. I did start with good files. :)

In the submitter's case, he started with some good files, some unknown number of bad files, etc.
That's not how I read the comment. From the OP:
With the quantity of data (~2 TB at present), it's not really practical for us to examine every one of these periodically so we can manually restore them from a different copy.
That sound to me as if he wants to check the files from time to time and locate ones that have gone bad.
Re:Checksums? by failedlogic · 2013-12-10 06:45 · Score: 1

I don't have a large amount of critical data to backup (mostly documents for research). I've been using PAR (or rather relying on it) to verify and correct errors when recovering data.
That said, I realize I should probably also have a checksum. Should one consider a different algorithm then MD5, for example to prevent collisions of the hashes?
Re:Checksums? by Waffle+Iron · 2013-12-10 07:11 · Score: 1

While MD5 isn't really secure against intentional attacks any more, the probability of an random collision is still negligible.
I originally started using MD5 for this purpose because in a test I did many years ago one some machine, md5sum actually ran faster than cksum. The shorter cksum data also does have a chance to generate hash collisions on reasonable sized data sets, although that probably doesn't matter too much for just disk error checking. I don't use the newer algorithms because they're overkill and their hash strings just look too long.
Re:Checksums? by NatasRevol · 2013-12-10 07:22 · Score: 1

weekly zfs scrub does the checks for you.

--
There are two types of people in the world: Those who crave closure
Re:Checksums? by Anonymous Coward · 2013-12-10 08:44 · Score: 1

(A while back, I rescanned a couple of hundred old DVDs that I burned ranging up to 10 years old, and I didn't find a single additional data error. I think that a lot of cases where people report that DVDs deteriorate over time, they never had good data on them in the first place and only discover it later.)
I burnt a bunch of MD5 hashes on my cd-rs with the data nine or so years back _and_ checked them back right after burning, on a different drive, too. (I'm paranoid about data integrity.) Each passed the MD5 check. Today, I get an unreadable sector about once every 500-600 MB. Most of my data was on Verbatim cd-rs; another brand fared a bit better (one corruption every 900 MB or so).
(Later on, I started burning par2 files to accompany dvds but soon gave up since the calculation took way too long.)
Re:Checksums? by hippo · 2013-12-10 08:46 · Score: 1

I run a weekly cron job that calculates md5sums for all the files on the media drive. Then it compares it to the previous weeks and emails the diff. If anything goes wrong I restore the file from one of my backups and check the MD5 again. I did have one drive that was slowly losing data. Turned out to be a dodgy sata port/cable but I've not lost a file yet.
Re:Checksums? by nctritech · 2013-12-10 08:52 · Score: 1

Or use sha1deep from the md5deep package. It's made specifically for hashing and comparing file trees and has heaps of behavior-modifying options.

How are you getting bitrot? by drussell · 2013-12-10 05:24 · Score: 1

If your physical media is dying, you'll get hardware errors so restore from a(nother) backup and replace the media.

If your files are being corrupted, what kind of crappy filesystem are you using to store these precious memories?!!

Re:How are you getting bitrot? by Anonymous Coward · 2013-12-10 05:39 · Score: 1

Cosmic rays, magnetic data corruption. If you do not re-write the bits they decay.

ZFS by Electricity+Likes+Me · 2013-12-10 05:27 · Score: 4, Interesting

ZFS without RAID will still detect corrupt files, and more importantly tell you exactly which files are corrupt. So a distributed group of ZFS drives could be used to rebuild a complete backup by only copying uncorrupt files from each.

You still need redundancy, but you can get away without the RAID in each case.

Re:Excellent question by sandytaru · 2013-12-10 05:27 · Score: 2

There are, but you'll be paying a lot of $$$ for that kind of storage in the cloud. I get 4GB for free from DropBox. SkyDrive from Microsoft will set you back $1000/month for 2TB - DropBox is about twice that much. It's not really practical for media files.

A much better solution would be archival quality Blue-Rays. They can hold 25 GB apiece and they're supposed to last 100 years, but they really just need to last long enough until a new, even denser storage media comes along.

--
Occasionally living proof of the Ballmer peak.

Re:Excellent question by SirMasterboy · 2013-12-10 05:32 · Score: 5, Informative

Not all cloud storage is expensive. It's only $4 a month for unlimited backups to CrashPlan.

They also do checksums and versioning and can be set to never remove deleted files from the backup.

I have 12.8TB backed up to them and it's been working great.

Other than that, ZFS can't be beat. I use that as well.

Par2 and Reed-Solomon by mpol · 2013-12-10 05:33 · Score: 1

Bitrot does happen.
When a disk has a bad block and detects that, it will try to read the data from it and put it on a block from the reserve-pool. However, the data might be bad and corrupt, so you lose data.
Disks do have a Reed-Solomon (aka par-files) index, so it can repair some damage, but it doesn't always succeed.

Anyway, what I do for important things, is have par2 blocks that go along with the data. All my photo-archives have par2 files attached to them.

I reckon you could even automate it. To have a script that traverses all directories and tries to repair the data if it's broken. If it fails, you get notified.

--

Well, don't worry about that. We can get you back before you leave. (Dr. Who)

zfs or btrfs by Anonymous Coward · 2013-12-10 05:34 · Score: 1, Interesting

First off, make sure you have a separate backup storage volume that doesn't get touched by normal applications and which keeps history. Backup doesn't protect you very much if accidental deletes or application bugs corrupt all your copies within one backup cycle. Use an appropriate backup tool to manage this, where appropriateness depends on your skill and willingness to tinker. You could use something as simple as an rsync --link-dest job, or rsync --inplace in combination with filesystem snapshots, or some backup suite that will store history in its own format.

For bit-rot protection of the stored backup data, make a backup volume using zfs or btrfs with at least two disks in a mirroring configuration (where the filesystem manages the duplicate data, not a separate raid layer). Set it to periodically scrub itself, perhaps weekly. It will validate checksums on individual file extents. If one copy of a file extent cannot be read successfully, it will rewrite it using the other valid mirror. This rewrite will allow the disk's block remapping to relocate a bad block and keep going. The ability to validate checksums is the value add beyond normal raid, where the typical raid system only notices a problem when the disk starts reporting errors.

Monitor overall disk health and preemptively replace drives that start to show many errors, just as with regular raid. Some people consider the first block remapping event to be a failure sign, but you may replace a lot of disks this way. Others will wait to see if it starts having many such events within days or weeks before considering the disk bad.

Re:That's what RAID is /for/ by Anonymous Coward · 2013-12-10 05:35 · Score: 1

I don't think you understand what RAID is or what it does.

Re: Rewritten for /. by techprophet · 2013-12-10 05:35 · Score: 1

And thus the saga of that damned Frenchman continues

Re:uhuh by Anonymous Coward · 2013-12-10 05:36 · Score: 2, Informative

Warning for all UNIX newbies: that command will reset the file to 0 bytes. Just that you know.

(I've seen some cases when a rookie is setting up a Linux system and people jokingly throw him these "rm -rf /" commands and the poor guy actually ends up wrecking his system.)

Re:Excellent question by lgw · 2013-12-10 05:40 · Score: 3, Insightful

Bitrot is a myth in modern times. Floppies and cheap-ass tape drives from the 90s had this problem, but anything reasonably modern (GMR) will read what you wrote until mechanical failure.

The key therefore is to verify as you write. Usually, verifying a sample of a few GB will let you know if everything went OK. DO your backups with checksums of some sort. A modern tape drive and backup software will do that automatically, and let you schedule a verify automatically as part of backups (2 TB? That's 1 tape - might want to consider that), though ideally you should verify a tape on a different drive than the one you wrote it on.

For disk-based backups, local or cloud, I strongly recommend archiving to a format with checksums (RAR etc) over some sort of raw file copy. Especially for anything going over the network: RAR a volume/file set locally first, then upload, then test the archive.

If you have a superstitious fear of bitrot, you can always do some random sampling of archive integrity, and keep multiple historical copies of files just in case (e.g., don't just delete backup N-1 when you do backup N, do a rotation scheme).

--
Socialism: a lie told by totalitarians and believed by fools.

Rar by rava · 2013-12-10 05:41 · Score: 1

I'm glad you're bringing this up. I haven't seen any backup software that addresses bitrot. And bitrot does happen, I lost a few pics to it. What I do: I have a monthly script that makes a RAR archive from my pictures directory. RAR checks file integrity but also has "recovery" options that allow you to recover files from a damaged archive (to a point)

--
{Science sans conscience n'est que ruine de l'âme}

A paranoid setup by brokenin2 · 2013-12-10 05:42 · Score: 4, Interesting

If you really want hassle free and safe, it would be expensive, but this is what I would do:

ZFS for the main storage - Either using double parity via ZFS or on a raid 6 via hardware raid.

Second location - Same setup, but maybe with a little more space

Use rsync between them using the --backup switch so that any changes get put into a different folder.

What you get:

Pretty disaster tolerant
Easy to maintain/manage
A clear list of any files that may have been changed for *any* reason (Cryptolocker anyone?)
Upgradable - just change drives
Expense - You can build it for about $1800 per machine or $3600 total if you go full-on hardware raid. That would give you about 4TB storage after parity (4 2TB drives - $800, Raid Card - $500, basic server with room in the case - $500)

What you don't get: Lost baby pictures/videos. I've been there, and I'd pay a lot more than this to get them back at this point, and my wife would pay a lot more than I would..

Your current setup is going to be time consuming, and you're going to lose things here and there anyway.. If you just try to do the same thing but make it a little better, you're still going to have the same situation, just not as bad. In this setup you have to have like 5 catastrophic failures to lose anything, sometimes even more..

Re:A paranoid setup by Minwee · 2013-12-10 05:55 · Score: 1

Expense - You can build it for about $1800 per machine or $3600 total if you go full-on hardware raid. That would give you about 4TB storage after parity (4 2TB drives - $800, Raid Card - $500, basic server with room in the case - $500)
Either use a RAID controller or use ZFS. It's not a good idea to use both at the same time.
Re:A paranoid setup by brokenin2 · 2013-12-10 06:05 · Score: 1

I've used them together. Seems to work just fine.. Just don't let ZFS know that there's more than 1 drive. You can't have them both trying to manage the redundant storage.
ZFS has some great features besides it's redundant storage. You can get them from other filesystems too though I suppose, but I like snapshots built into the filesystem. It *is* overkill to have the filesystem doing checksums and the raid card detecting errors as well, but that's why this is the paranoia setup... Not really looking for the performance king..
ZFS certainly isn't necessary though, if you've got hardware raid.
Re:A paranoid setup by fnj · 2013-12-10 06:35 · Score: 1

Never use a RAID controller, period. ZFS builtin RAIDZ is far superior in every way.
Re:A paranoid setup by Anonymous Coward · 2013-12-10 06:49 · Score: 1

If you really want hassle free and safe, it would be expensive, but this is what I would do:
ZFS for the main storage - Either using double parity via ZFS or on a raid 6 via hardware raid.
Second location - Same setup, but maybe with a little more space
Use rsync between them using the --backup switch so that any changes get put into a different folder.
What you get:
Pretty disaster tolerant
Easy to maintain/manage
A clear list of any files that may have been changed for *any* reason (Cryptolocker anyone?)
Upgradable - just change drives
Expense - You can build it for about $1800 per machine or $3600 total if you go full-on hardware raid. That would give you about 4TB storage after parity (4 2TB drives - $800, Raid Card - $500, basic server with room in the case - $500)
What you don't get: Lost baby pictures/videos. I've been there, and I'd pay a lot more than this to get them back at this point, and my wife would pay a lot more than I would..
Your current setup is going to be time consuming, and you're going to lose things here and there anyway.. If you just try to do the same thing but make it a little better, you're still going to have the same situation, just not as bad. In this setup you have to have like 5 catastrophic failures to lose anything, sometimes even more..
$100/tb is pretty expensive. $40 or $50 per TB if you wait for something good on Slickdeals. Enterprise/highspeed drives are a waste of $ since this is cold storage, and you will want to upgrade after 3 years anyway. You can also get a 2 bay hardware NAS that lets you do whatever you want (linux based OS) for pretty cheap. In short, you're right and you're wrong. A careful spender could get exactly what you describe for about half the cost.
Re:A paranoid setup by bill_mcgonigle · 2013-12-10 07:07 · Score: 1

Use rsync between them using the --backup switch so that any changes get put into a different folder. ...
A clear list of any files that may have been changed for *any* reason (Cryptolocker anyone?)
+1 Clever.

--
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
Re:A paranoid setup by safetyinnumbers · 2013-12-10 11:25 · Score: 1

You may need to use -c to force rsync to compare checksums.

I use something like this as part of my backup DATE=$(date +%C%y%m%d%H%M)
rsync --del --backup --backup-dir=../changedfiles_$DATE

The whole backup also goes to S3 glacier.

As an added step - I don't delete pictures from my camera unless they match the checksum of files in the _backup_ - not the original copy (via a script).

That way, once they're first copied from the camera, a single failure in the original, PC copy or backup copy will all result in the camera version remaining and I can check what has gone wrong.
Re:A paranoid setup by Anonymous Coward · 2013-12-10 12:34 · Score: 1

If you use ZFS on a hardware raid you can only detect bitrot. Data will be lost!
If you use RAIDZ[23] on individual drives you can repair bitrot. Data can be saved!
Re:A paranoid setup by brokenin2 · 2013-12-10 12:49 · Score: 1

Even it's ability to chirp loudly when a drive fails?
I think you've used some pretty lame raid controllers.
How about it's ability to not waste CPU?
ZFS is good... great even, but (irony intentional) absolute statements are always wrong!
Re:A paranoid setup by cas2000 · 2013-12-10 13:50 · Score: 2

> Just don't let ZFS know that there's more than 1 drive.
That is *precisely* the wrong thing to do. As in, the exact opposite of how you should do it.
Instead, configure the RAID card to be JBOD and let ZFS handle the multiple-drive redundancy (raidz and/or mirroring), as well as the error detection and correction.
Otherwise, there is little or no benefit in using ZFS. ZFS can't correct many problems if it doesn't have direct control over the individual disks, and RAID simply can't do the things that ZFS can do.
Of course, this means that you're actually better off with a cheap dumb non-raid HBA card (or even just the SATA ports on your motherboard if there's enough of them) than an expensive HW RAID card. This is another advantage of ZFS.
(a good option is to use an LSI SAS2008 card or similar, and make sure it's re-flashed to "IT" mode firmware if you're using consumer-grade SATA drives with it to avoid TLER issues. readily available brand new for under $100 for 8 SAS/SATA ports)
> You can't have them both trying to manage the redundant storage.
yes. and it's ZFS that should be managing it, not the raid card.
> ZFS certainly isn't necessary though, if you've got hardware raid.
wrong. RAID does not provide error detection or correction. RAID protects against drive failures only, not silent corruption.
Re:A paranoid setup by cas2000 · 2013-12-10 14:06 · Score: 3, Informative

good post, except for three details:
1. if you're using ZFS on both systems, you're *much* better off using 'zfs send' and 'zfs recv' than rsync.
do the initial full copy, and from then you can just send the incremental snapshot differences from then on.
one advantage of zfs send over rsync is that rsync has to check each file for changes (either file timestamp or block checksum or both) every time you rsync a filesystem or directory tree. With and incremental 'zfs send', it only sends the incremental difference between the last snapshot sent and the current snapshot.
you've also got the full zfs snapshot history on the remote copy as well as on the local copy.
(and, like rsync, you can still run the copy over ssh so that the transfer is encrypted over the network)
2. your price estimates seem very expensive. with just a little smart shopping, it wouldn't be hard to do what you're suggesting for less than half your estimate.
3. if you've got a choice between hardware raid and ZFS then choose ZFS. Even if you've already spent the money on an expensive hardware raid controller, just use it as JBOD and let ZFS handle the raid function.
Re:A paranoid setup by rdnetto · 2013-12-12 00:10 · Score: 1

Hardware RAID is a bad idea for backups, as the card is a single point of failure, and anything not from the exact same batch may use a different (proprietary) RAID format. At least with Linux softraid (either mdadm or btrfs/ZFS), you can always download a copy of the source and checkout the old version, if necessary.

--
Most human behaviour can be explained in terms of identity.

WinRAR... by mlts · 2013-12-10 05:42 · Score: 1

WinRAR isn't perfect, but it works on a number of platforms, be is OS X, Windows, Linux, or BSD. This provides not just CRC checking, but one can add recovery records for being able to repair damage. If storing data on a number of volumes (like optical media), one can make recovery volumes as well, so only four CDs out of a five CD set are needed to get everything back.

It isn't as easy as ZFS, but it does work fairly well for long term archiving, and one can tell if the archive has been damaged years to decades down the road.

Re:uhuh by CanHasDIY · 2013-12-10 05:42 · Score: 1

Warning for all UNIX newbies: that command will reset the file to 0 bytes. Just that you know.

(I've seen some cases when a rookie is setting up a Linux system and people jokingly throw him these "rm -rf /" commands and the poor guy actually ends up wrecking his system.)

I think the general consensus is that if you're stupid enough to run a command you got from SomeRandomInternetAsshole420 without verifying what it will do first, you deserve to have your system wiped.

--
An enigma, wrapped in a riddle, shrouded in bacon and cheese

Re:Excellent question by clickclickdrone · 2013-12-10 05:43 · Score: 2

Are there cloud storage providers that can do this for the above example of an approx. 2 TB data set, and provide complete security?

Cloud and complete security together is an oxymoron.

--
I want a list of atrocities done in your name - Recoil

BTRFS or ZFS by mcelrath · 2013-12-10 05:46 · Score: 1

BTRFS and ZFS both do checksumming and can detect bit-rot. If you create a RAID array with them (using their native RAID capabilities) they can automatically correct it too. Using rsync and unison I once found a file with a nice track of modified bytes in it -- spinning rust makes a great cosmic ray or nuclear recoil detector. Or maybe the cosmic ray hit the RAM and it got written to disk. So, use ECC RAM.

But "bit-rot" occurs far less frequently than this: I find is that on a semi-regular basis my entire filesystem gets trashed (about once every year or three). This happened to me just last week...my RAID1 BTRFS partitions (both of them) got trashed because one of my memory modules went bad. In the past I've had power supplies go bad causing this, or brown outs, and in other cases I never identified the cause. I've seen this happen across ext3, jfs, xfs, and btrfs so it's (probably) not the file system's fault. In such cases, fsck will often make the problem worse. (Use LVM and its "snapshot" feature to perform fsck on a snapshot without destroying the original). You'd think these advanced filesystems would have a way to rewind to a working copy (for instance in BTRFS -- mount a previous "generation") but this seems to not be the case.

Anyway, btrfs guys, your recovery tools could be a lot better. The COW enables some pretty fancy recovery techniques that you guys don't seem to be doing yet. If you've got a great btrfs or zfs recovery technique, please reply and tell us.

--
1^2=1; (-1)^2=1; 1^2=(-1)^2; 1=-1; 1=0.

Re:BTRFS or ZFS by fnj · 2013-12-10 06:37 · Score: 1

Just say no to BTRFS. Use ZFS with RAIDZ.
Re:BTRFS or ZFS by rrohbeck · 2013-12-10 10:00 · Score: 1

If it was integrated into the Linux kernel I'd use it. But with a chance that the next kernel update will make my FS driver unusable I won't touch it with a long pole.
So that leaves btrfs.

--
thegodmovie.com - watch it
Re:BTRFS or ZFS by rrohbeck · 2013-12-11 09:12 · Score: 1

Yes it is a Linux issue - specifically that ZFS licensing is incompatible with the GPL so ZFS can't be integrated into the kernel.

--
thegodmovie.com - watch it
Re:BTRFS or ZFS by rrohbeck · 2013-12-12 11:18 · Score: 1

There are some very good reasons for the GPL. I like it.

--
thegodmovie.com - watch it

Re:uhuh by Sarten-X · 2013-12-10 05:50 · Score: 2

And yet, one of FLOSS's selling points is our great community support...

--
You do not have a moral or legal right to do absolutely anything you want.

Re:Excellent question by heypete · 2013-12-10 05:51 · Score: 1

It depends on your storage needs. For things that you need to regularly access, Amazon S3 will cost you about $175/month for 2TB storage plus transfer fees, but is readily accessible at any time.

Amazon Glacier would only cost you $20/month for that amount of storage, but has various limitations on retrieval time (~4 hour minimum) and higher costs if you need to retrieve more data in a shorter amount of time. As the name suggests, it's designed for "cold storage".

Both offer extremely high degrees of reliability.

RAID + redundancy by sl4shd0rk · 2013-12-10 05:53 · Score: 1

There's really no way around it. Storage media is not permanent. You can store your important stuff on RAID but keep the array backed-up often. RAID is there to keep a disk*N failure from borking your production storage and that's it. If you can afford cloud storage, encrypt your array contents (encfs is good) and mirror the contents with rsnapshot or rsync to amazon, dropbox, a friends raid array, whatever. SATA drives are cheap enough to keep a couple sitting around to just plug in and mirror to every weekend but you'll probably find a friend's cable modem and rsync+ssh a very handy alternative (hint: check out --bwlimit option) when run from cron.

--
Join the Slashcott! Feb 10 thru Feb 17!

Re:Excellent question by mlts · 2013-12-10 05:59 · Score: 3, Interesting

In reality, Dropbox, Skydrive, and other cloud services should be treated as a type of media, just like BD-ROMs, tape, SDD, HDD, and even hard copy.

The trick is to use different media to protect against different things. My Blu-Ray disks protect an archive against tampering or CryptoLocker (barring a hack that flashes the BD burner's ROM to allow the laser to overwrite written sectors.) However, they have to be maintained in a good environment with a good indexing system. My files stashed on Dropbox bring me accessibility virtually anywhere... but malware that erases files could wipe that volume out in no time.

Similar with external HDDs. Those are great for dealing with a complete bare metal restore, but provide little to no protection against malware. Tape, OTOH, is expensive for the drive and requires a fast computer, but once the read-only tab is flipped or the WORM session is closed, the data is there until the tape is physically destroyed.

Of course, there is not just media... there are backup programs. This is why I use the KISS principle when it comes to backups. I use an archiving utility to break up a large backup into segments (with recovery segments to allow the archive to be repaired should media go bad), then burn the segments onto optical media.

I've found that using a backup utility can work well... until one has to restore, the company is out of business, and one can't find the CD key or serial number so the software will install. One major program I used for years worked excellently... then just refused to support new optical drives (as in ignoring them completely.) So, unless I can find a DVD drive on its antiquated hardware list on eBay, all my backups are inaccessible. I was lucky enough to find that and copy the data to a HDD, but using the lowest common denominator is a good thing.

Backups are the often neglected underbelly of the IT world. While storage, security, availability and other technologies have advanced significantly, backups on the non-enterprise level are still languishing behind in almost every way possible. It was only a few years ago that encryption became standard with backup utilities [1].

[1]: With encryption comes key management, and some backup programs make that easy, some make it incredibly hard.

Re:Excellent question by lxs · 2013-12-10 06:00 · Score: 1

So if someone doesn't have your level of expertise on a single isolated topic you automatically dismiss this person as unworthy of your company?
This is why people don't like you.

Freenas (ZFS based) or BTRFS by hibble · 2013-12-10 06:01 · Score: 1

"We'd love it if the file-system could detect this and try correcting first, and if it couldn't correct the problem, it could trigger the restoration. But that only seems to be an option for RAID type systems, where the drives are colocated."

If you have ~2TB of irreplaceable memories set up a NAS with a RAID array. whilst bit-rot can be detected it can only correct itself if the file system knows what the bits should have been. To this end BTRFS and my recommendation ZFS can be set to say scan all data 1 a week/month etc and using the redundant data in the RAID array correct the 'Bit-Rot'.

I have a intel atom board in a old case with 4 drives(2x 500GB mirror and 2x 1TB mirror). I have FREENAS on this it is powered on every night by wake on lan. Backs up any new data and gets shut down. once a week it backs up new data then runs the command 'zfs scrub' this checks for bit-rot or inconsistencies in the file-system and corrects them if any are found.(can email you a warning if you want as well). This way if any files get damaged on a home pc/ laptop etc.. any user can turn on the NAS and recover there files from the shared folder.

1 point of warning ZFS is RAM hungry so 4GB is the minimum. something to keep in mind when ebaying for a old pc to use. others will also point out that file transfers are ~20-30MB/s with a low powered atom so use something with more grunt if its to be a 24/7 NAS.

That's what some RAID levels _could_ be for by Sloppy · 2013-12-10 06:01 · Score: 1

A two-disk RAID1, or a RAID5, theoretically ought to be able to detect when there's corruption, but shouldn't be able to correct it. If you've got two different data values, you don't know which one is right.

But it occurs to me: RAID6 (or three-or-more disk RAID1) really ought to be able to correct. Imagine a three-disk RAID1: if two disks say a byte is 03 and one disk says 02, then 03 is probably right. RAID6, similarly, has enough information to be able to do the kinds of repairs that you could do with par2.

It'd be cool to find out this is already in the kernel's md device. Probably not so yet, though. ?

--
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.

Re:That's what some RAID levels _could_ be for by cmurf · 2013-12-10 12:20 · Score: 1

For raid5/raid6 this is called scrub, or in md parlance writing either check or repair to md/sync_action for the md device. Check records mismatches, it doesn't fix them. Critically though, if there are drive read errors reported, the normal read error handling will cause the underlying sectors on the drive to be overwritten with rebuilt data.

But as for constantly doing a parity check, that's not how any RAID I'm aware of works because it would be as slow as running a degraded array. No optimizations for small file reads would be possible, it would always have to do full stripe reads, compute parity and then compare to the parity chunk on disk. And for RAID6 this would effectively bring the write performance penalty to reads.

For RAID1, normally different LBA requests are made for each device, which is why RAID1 reads are faster than single device. If instead the same LBAs are read and then compared, again this is slow. And so the correct way to do it is scheduled scrubs.

Re:Look to the past by Venotar · 2013-12-10 06:02 · Score: 2

The tapes may be stable (I'm suspicious of that claim: their temperature tolerances aren't as high as modern hard drives, they actually care about dust, and I would expect them to be more susceptible to magnetic interference); but the tape drives are not. Over time drive heads become misaligned. They continue to write fine and can read what they write; but sufficient misalignment prevents other drives of the same type from reading the tape. That tape then becomes only as useful as the drive that wrote it. Lose the drive, you lose the use of the data on the tape. Unless you test reading the tape in a different drive than it was written from (while the writing drive is still available for pulling the data out), this condition's effectively undetectable until you actually need the data.

There's a reason so many shops have moved to disk based backups. Tape simply isn't reliable. Tape is cheap; but definitely NOT reliable.

Re:Excellent question by QuietLagoon · 2013-12-10 06:02 · Score: 1

Bitrot is a myth in modern times.

You state this without any substantiation as if it were a fact.

Re:Excellent question by rabtech · 2013-12-10 06:03 · Score: 5, Interesting

Bitrot is a myth in modern times. Floppies and cheap-ass tape drives from the 90s had this problem, but anything reasonably modern (GMR) will read what you wrote until mechanical failure.

This isn't just wrong, it's laughably wrong. ZFS has proven that a wide variety of chipset bugs, firmware bugs, actual mechanical failure, etc are still present and actively corrupting our data. It applies to HDDs and flash. Worse, this corruption in most cases appears randomly over time so your proposal to verify the written data immediately is useless.

Prior to the widespread deployment of this new generation of check-summing filesystems, I made the same faulty assumption you made: that data isn't subject to bit rot and will reproduce what was written.

ZFS or BTRFS will disabuse you of these notions very quickly. (Be sure to turn on idle scrubbing).

It also appears that the error rate is roughly constant but storage densities are increasing, so the bit errors per GB stored per month are increasing as well.

Microsoft needs to move ReFS down to consumer euro ducts ASAP. BTRFS needs to become the Linux default FS. Apple needs to get with the program already and adopt a modern filesystem.

--
Natural != (nontoxic || beneficial)

Just get a carbonite account by gravis777 · 2013-12-10 06:04 · Score: 1

I have been going through this issue myself. In a single weekend of photo and video taking, I can easily fill up a 16 gig memory card, sometimes a 32 gig. About 10 years ago I lost about two years worth of pictures due to bitrot (ie my primary failed, and the backup DVD-Rs were unreadable after only a year - I was able to recover only a handfull of photos using disc-recovery software). Since then, I kept at least three backups, and reburning discs every couple of years. But if I can fill up two BD-Rs in a weekend, and given the high price of media, that wasn't an option. Extra harddrives?

I finally realized the best way was just to get a Carbonite account. They are about $70 a year for unlimited encrypted storage space (if you are really anal, I guess you could always put things into TrueCrypt encrypted file containers and upload them). The worst part is how long it takes to do a backup on a residental broadband line (it would also suck if your ISP has data caps). It has taken me about 2 weeks to do half a terrabyte.

The deal is, the peace of mind that comes from this is huge, and it is cheaper than buying another harddrive.

Yes, I know that is not the question you asked, but I feel like it is a much more practical alternative. I mean, as I continue backing stuff up, I am sure I will pass a terrabyte. How much are you going to pay for discs, for harddrives? Then trying to keep them safe and secure, and having to worry about bitrot?

Seriously, I've lost family pictures and videos before even though I had backups, and it sucked. Do yourself a favor and get a cloud backup. Yeah, it may take a while to do your backups and restorations, but it is worth it.

Re:Just get a carbonite account by Anonymous Coward · 2013-12-10 06:18 · Score: 1

Regarding bitrot as above commenters -- how can you be sure that your cloud provider is not suffering from bitrot on your stored files? PAR2 files help, but a cloud provider that would checksum the files you uploaded with a checksum file you provide (and not just say "Yep, everything is fine, nothing to see here.") would ease my mind somewhat.
Assuming, of course, the cloud provider is using a modern ZFS or BTRFS to store your cloud data.
Re:Just get a carbonite account by gravis777 · 2013-12-10 07:26 · Score: 2

how can you be sure that your cloud provider is not suffering from bitrot on your stored files?
http://en.wikipedia.org/wiki/Carbonite_(online_backup)#Product_details
Works for me - better than what I have going on at home, and cheaper than I could set up something like this. And anyways, I still have my External HDD backups as well. Its just another level of backup to keep me from data loss.
Re:Just get a carbonite account by AmiMoJo · 2013-12-10 20:16 · Score: 1

It's not clear but it sounds like the files are encrypted but probably still available to the company the owns the servers. At the very least their client software is closed source and the data is stored in the NSA^h^h^h USA so I wouldn't recommend it.

--
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
Re:Just get a carbonite account by gravis777 · 2013-12-11 01:41 · Score: 1

http://www.jimkarpen.com/cloudsecurity.html
They also allow you to manage your own encryption keys
http://www.carbonite.com/online-backup/home/safe-and-secure
Re:Just get a carbonite account by AmiMoJo · 2013-12-11 02:17 · Score: 1

Assuming you trust that their client software doesn't make a key available to the NSA when they want it, of course. Sorry but all US providers are suspect now. They could get a letter that forces them to do pretty much anything to their customers and can't even ask a lawyer about it.

--
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC

Re:Splendid by Sloppy · 2013-12-10 06:07 · Score: 1

You really gotta be careful with that attitude. The photos seem worthless at the time you take them, and most of them remain worthless forever. Most of them. Then you see that old picture of when your now-grown-up dog used to be a cute little puppy, and awww!!!

--
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.

Re:Excellent question by Anonymous Coward · 2013-12-10 06:07 · Score: 2, Insightful

it doesn't seem that way... http://forums.freenas.org/threads/ecc-vs-non-ecc-ram-and-zfs.15449/

How about Stone? by Fussen · 2013-12-10 06:07 · Score: 1

M-DISC:
DVD format presently, BLU-RAY format in the future. Someday an electronic eye will just be able to look at the disc surface and see it all in one snapshot.

They aim for 1000 years. I expect 100. It may be reasonable. Just keep drives around.

http://www.mdisc.com/proving-ground/

Re:How about Stone? by Miamicanes · 2013-12-10 10:49 · Score: 1

If you're storing anything besides DVDs that need to be capable of direct casual playback on a DVD player, you're better off just burning the files (or even the .iso file of a DVD) to a non-LTH BD-R disc.
M-Disc is just a non-LTH BD-R with DVD geometry. It's an elegant solution for preserving DVDs in a way that gives you the best of both non-LTH BD-R and casual playability of a DVD, but it's stupid to spend M-disc prices for bulk data backup, including digital photos, when you can buy a brand new BD-R drive and two 25-gig non-LTH discs for what you'll spend on a 10-pack of 5-gig M-discs alone.
There's nothing exotic about BD-R anymore. DL and 3L BD-R discs are pretty expensive, but single-layer 25-gig non-LTH BD-R discs are cheap online, and an OEM-wrapped bare drive with software bundle costs maybe $50 more than a DVD+/-RW drive. And if you have a laptop that doesn't officially have a BD-R drive, you can probably buy a bare drive on eBay and swap it out yourself as long as your computer isn't a Macbook or weird ultra-ultra-thin PC notebook. For more normal laptops, there are basically two optical-drive form factors with two loading-forms (tray or slot). As long as you don't mind cannibalizing the bezel from the laptop's original drive, the hardest part of the whole thing is the bezel swap.
One warning: 95% or more of the BD-R discs you'll find at any retail store (Best Buy, Tiger Direct, etc) are going to be LTH, and manufacturers don't exactly bend over backwards to make it obvious that the discs in a package ARE LTH type. Make sure you consult Google -- or at least Newegg -- before buying blanks, and if the discs are less than a buck apiece, they're almost GUARANTEED to be LTH.
If you use LTH discs, all longevity bets are off. LTH discs are inferior junk made with cheap organic dye, just like DVD+/-R discs are. LTH discs exist for exactly one reason -- cost reduction. Genuine phase-change discs aren't cheap to manufacture, disc manufacturers spent lots of money tooling up to make blank DVD media based on organic dyes, and LTH lets them repurpose it for making cheap BD-R media. If you're burning a disc that only has to last until next week, go ahead & use LTH. If you're burning a disc that you want to be readable (at least, without expensive data recovery and bit rot) 25 years from now, spend a few bucks more on phase-change media.
Re:How about Stone? by Fussen · 2013-12-10 14:36 · Score: 1

What I'm confused about is the reference of M-Discs to LTH media.. M-Discs don't use dyes.. they require LG drives with modified lasers that actually burn pits into synthetic stone.

So, as time rolls forward, the only thing that needs to be concerned is the preservation of the disc and the ability to read that disc with a drive that is functional. The trade off is that the disc is only so large, and may require many discs.. but the trade-off of having a stack of discs / records that take space but hold the data seems reasonable if the sole purpose is just to make that data survive in a non-editable format.

Please correct me if I am wrong, but that's the benefit of M-Disc and the US Navy investing in using M-Disc as a media choice for hardened / critical situations. M-Disc is an ancient approach to the digital age; etch your story in stone and people will read it one day when you are dead.
Re:How about Stone? by Fussen · 2013-12-10 14:42 · Score: 1

I think M-Disc is worth a look. Yes M-Disc is not compatible with writers unless it is an M-Disc certified writer, such as drives made by LG.

The big mistake you have made immediately is that M-Disc IS compatible with DVD drives. That's what makes it such a good choice, because long after it's hard to find a M-Disc burner drive, one just has to find ANY optical drive that understands DVD or Blu-Ray media (if M-Disc Blu-Ray is chosen / available.)

This is a totally different deal than Low To High disc writing as there is no dye's used. M-Disc writers etch physical pits into synthetic stone (requiring a special disc drive laser and increased power at point of writing,) preventing the concept of bit rot, since it's stone. The only thing that can happen to the substrate is the surrounding medium collapses and makes it impossible to view the stone substrate.
Re:How about Stone? by Miamicanes · 2013-12-10 15:53 · Score: 1

I think you just accidentally misread it... I said that M-discs are basically non-LTH BD-R discs with DVD track geometry.
LTH discs are the ones made with organic dyes, just like DVD+/-R.
M-Disc is NOT made with organic dyes. It's a phase-change magneto-optical recordable DVD that's readable by normal drives/players, but requires a BD-R drive with the right firmware to burn.

Re:Splendid by F.+Lynx+Pardinus · 2013-12-10 06:07 · Score: 1

I understood him to be commenting on the number, not the existence, of the photos. I'm the designated archivist for the family's (7 members in 2 households) photos. At last check , I have about 20k photos in the archive. It's hard to imagine having "hundreds of thousands" without having enormous amounts of redundant or irrelevant photos, which is what the parent post is poking fun of.

Re:uhuh by behrooz0az · 2013-12-10 06:07 · Score: 1

WARNING: DO NOT RUN ANY COMMAND IN THE PARENT, THIS COMMENT OR ANY OF THE SIBLING COMMENTS.
You really suck at being an asshole too, the right command for destroying files and being innocently obfuscated is:
dd if=/dev/zero|pv|dd bs=1024 count=$(ls -s 'filename'|awk '{print $1}' of='filename'|openssl sha1

--
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion. -- Spazmania (174582)

Re:BTRFS filesystem by mlts · 2013-12-10 06:09 · Score: 4, Informative

I'll be the heretic here, but on Windows 8.1 and Windows Server 2012 R2, there is a feature called Storage Spaces. It works similar to ZFS where you toss drives into a pool, then create a volume that is either simple, mirror, or with parity, and Windows does the rest. If a volume needs more space, toss some more drives in the pool.

To boot, it even offers autotiering so data can be stored on a SSD that is frequently used, or remain on the HDDs if it isn't. Deduplication is handled on the filesystem level [1].

No, this isn't a replacement for a SAN with RAID 6 and real-time deduplication, but it does get Windows at least in the same ballgame as Oracle with ZFS.

[1]: Not active deduplication. The data is initially stored duplicated, but a background task finds identical blocks and adds pointers. Of course, the made from scratch filesystem, ReFS (which has the ability to check for bit rot on reads like ZFS), doesn't have this, so one is still stuck with NTFS for this feature.

Re:Excellent question by yakatz · 2013-12-10 06:14 · Score: 1

Second shoutout for Crashplan! I have eight computers backing up to one account with "unlimited" storage and versioning.

Re:Excellent question by mlts · 2013-12-10 06:14 · Score: 1

I'm curious how that is doable. Even Amazon Glacier would be about $10.24 per terabyte stored per month, so I'd be looking at about $130/month for that much info.

I am not passing judgement... just have not heard much about CrashPlan, good/bad other than a quick search on it.

Re:Excellent question by drussell · 2013-12-10 06:16 · Score: 1

Actually, that was a reply to THIS post, not the original question posted by timothy...

I really hope this discussion provides good answers, with practical solutions for Windows, IOS, and Linux... I think that this is the sort of thing that everyone could really use!

Are there cloud storage providers that can do this for the above example of an approx. 2 TB data set, and provide complete security?

I still think questions about basic data integrity, checksums, parity, ECC on disks etc. should be completely unnecessary and most certainly already be second nature to the slashdot crowd, but I guess I'm just living in the past.

Thanks for immediately jumping down my throat, though ;)

Re:Look to the past by mlts · 2013-12-10 06:19 · Score: 1

I just wish LTO drives were cheaper. Otherwise, they would be ideal for backups because they support encryption on the drives themselves. All LTO-4 tapes and newer support this, so any LTO-4 drive given the right key can decrypt another drive's tape.

Of course, WORM media is always nice, especially with malware being a constant threat.

Re:Excellent question by drussell · 2013-12-10 06:20 · Score: 1

So if someone doesn't have your level of expertise on a single isolated topic you automatically dismiss this person as unworthy of your company?

The Anonymous Cowards? Yes.

Please continue the technical discussion. Sorry for the noise.

Re:Excellent question by DigiShaman · 2013-12-10 06:21 · Score: 1

CDRs suffer nasty bitrot. Usually most CDs made in the past 10 years. I suppose you could have vacuum sealed them, but how many people knew to do that?!! You can get medical grade gold disks, but those you have to special order (not found in your local computer store).

One of my clients geoscience data projects archived in CDRs. It's only when they went to pull them did they discover the bitrot problem. We used Nero DiskSpeed to performa surface scan. You can see entire segments where goes green (good), transitions into yellow (correctible), to red (damaged unreadable) and the back out to yellow and green again. It's the material that oxidizes. Since then, they pulled all data they could back onto disk and tape. God only knows how long that will last too.

--
Life is not for the lazy.

Re:Excellent question by SirMasterboy · 2013-12-10 06:24 · Score: 1

Well, BackBlaze is another similar backup company who is far more public about their costs and operations. I think they have said their customer break-even point is around 3-4TB. So if most customers have far less than that, then a few can have far more and it all works out.

http://www.wired.com/insights/wp-content/uploads/2011/10/backblaze-cost.png

Bacula by dshk · 2013-12-10 06:24 · Score: 1

It might be an overkill, but the open source backup software Bacula has a verify task, which you can schedule to run regularly. It can compare the contents of files to thir saved state in backup volumes, or it can compare the MD5 or SHA1 hashes which were saved in the previous run. I assume other backup software has similar features.

Re:Excellent question by fnj · 2013-12-10 06:25 · Score: 1

Bitrot is a myth in modern times. Floppies and cheap-ass tape drives from the 90s had this problem, but anything reasonably modern (GMR) will read what you wrote until mechanical failure.

Oh, really? Is that why drive manufacturers specify a non-recoverable read error rate - typically on the order of 1 bit per 100 terabits? Let's see now. A single 4TB drive contains 32 terabits of data. So if you have three of them, either in a RAID or separately, and you try to read the entire contents, you can expect an average of one bit to be rotted permanently and lost forever. Or that bad bit could happen a lot earlier. Conceivably the first bit you try to read. Or the one millionth. And that is not considered a failed drive. You can't magically guard against these by verifying the recorded data one time, either a nominal portion or even in its entirety.

RAR's checksums will only detect errors that happen to occur when you test read the RAR archive. They won't repair it, and testing OK is no guarantee that it won't have an error the next time you read it. PAR2, on the other hand, does provide for repair.

ZFS can at least detect, and optionally repair (if you use the redundancy options) these isolated bad bits, without the necessity for any special file metadata like PAR2. Of course, there's nothing to say you can't use both ZFS and PAR2.

Re:Excellent question by linear+a · 2013-12-10 06:28 · Score: 1

Bitrot is a myth in modern times.

You state this without any substantiation as if it were a fact.

And I'll counter the above. The last bitrot event I had to deal with - on current server grade (Windoze, tho) hardware was waaaay back last Friday.

Re:Excellent question by drussell · 2013-12-10 06:28 · Score: 1

Thank you. A thoughtful, concise Anonymous post... You've just restored some of my faith in the AC. ;)

Re:Excellent question by heypete · 2013-12-10 06:31 · Score: 1

Users that utilize large amounts of storage are relatively uncommon and are subsidized, in part, by users who utilize less storage. If everyone used terabytes of storage at $4/month, that wouldn't really be sustainable.

Although just a personal anecdote, I've used CrashPlan for ~4 years now (with 11 computers belonging to various family members all backing up to their service with a total of around 500GB being stored with them). Zero complaints. It's done everything I expected, always worked, and never had issues. When I had a laptop stolen and purchased a replacement, I was able to restore all the files from CrashPlan in about a day or two of downloading. I highly recommend it.

Have mercy! by c0d3g33k · 2013-12-10 06:32 · Score: 4, Funny

We have hundreds of thousands of family pictures and videos we're trying to save using this advice. But in some sparse searching of our archives, we're seeing bitrot destroying our memories. With the quantity of data (~2 TB at present),

As the proud owner of dozens of family photo albums, a stack of PhotoCDs etc which rarely see the light of day, the bigger challenge is whether anyone will ever voluntarily look at those terabytes of photos. Having been the victim of excruciating vacation slide shows that only consisted of 40-50 images on a number of occasions (not to mention the more modern version involving a phone/tablet waving in my face), I can only imagine the pain you could inflict on someone with the arsenal you are amassing.

Re:Have mercy! by Anonymous Coward · 2013-12-10 15:07 · Score: 1

I agree. My wife will kill at the thought - BUT - Just go through the damn photos and pick 3-5 per year per person max. Nuke the rest.
1- YOU will never look at them all.
2- YOUR FAMILY will never look at them all.
3- YOUR EXTENDED FAMILY sure as hell won't look at them all.
4- YOUR DECEDENTS absolutely sure as hell won't look at almost any of them.
5- Within 200 years no one alive will give a rats ass about you or your family beyond a general vague genealogy spread sheet with 1-5 pictures of you max.
6- Within 1000 years no one will even care about that
7- Within 4 billion years the solar system will die along with the Earth that will already be dead at that point for a billion or 2 itself as far as life is concerned. So really what is the point in hundreds of thousands of family photos.

Re:Excellent question by bluefoxlucid · 2013-12-10 06:33 · Score: 1

"We're experiencing data going bad and not being restorable from back-ups because it just CORRUPTS itself for no visible reason" "That's a myth and doesn't actually happen."

HIV was created by racist bigots to slander blacks and homosexuals.

--
Support my political activism on Patreon.

Re:Excellent question by bluefoxlucid · 2013-12-10 06:34 · Score: 1

Outsourced information services in general have known security concerns. That they come under a new buzzword doesn't make them less secure. Even contractors who come in and touch your systems can walk out with massive amounts of private data.

--
Support my political activism on Patreon.

Re:Look to the past by linear+a · 2013-12-10 06:34 · Score: 1

Tape MUST be sufficiently stable. Reading the reliability specs off the box in front of me and running a few calculations shows that of all the tape operations ever done (at least for my brand of tape) there should be zero or at most one (1.3% chance) tape error in the history of all tape storage by humanity.

Re:Excellent question by mlts · 2013-12-10 06:38 · Score: 1

You hit the nail on the head. Apple should either get with Oracle and put ZFS back in the OS X kernel as the default filesystem, get with Microsoft and license ReFS. HFS+ was a good filesystem when OS X hit the market, but it has been over a decade, and everyone else has moved on.

One reason why the IT industry moved from RAID 5 to RAID 6 as a standard is because even though disk capacities are growing, but I/O is not keeping pace. So, it takes longer and longer to rebuild a drive. RAID 6 is now a must because of the length of a rebuild being so long that there is a good chance of another drive failing while the RAID array is in degraded mode. Of course, this is for tier 3 storage, but tier 2 storage is also having similar issues as well.

The old-fashioned method by TheloniousToady · 2013-12-10 06:44 · Score: 4, Interesting

Don't forget the old-fashioned method: make archival prints of your photos and spread copies among your relatives. Although that isn't practical for "hundreds of thousands", it is practical for the hundreds of photos you or your descendants might really care about. The advantage of this method is that it is a simple technology that will make your photos accessible into the far future. And it has a proven track record.

Every other solution I've seen described here better addresses your specific question, but doesn't really address your basic problem. In fact, the more specific and exotic the technology (file systems, services, RAID, etc.) the less likely your data is to be accessible in the far future. At best, those sorts of solutions provide you a migration path to the next storage technology. One can imagine that such a large amount of data would need to be transported across systems and technologies multiple times to last even a few decades. But will someone care enough to do that when you're gone? Compare that to the humble black-and-white paper print, which if created and stored properly can last for well over a hundred years with no maintenance whatsoever.

Culling down to a few hundred photos may seem like a sacrifice, but those who receive your pictures in the future will thank you for it. In my experience, just a few photos of an ancestor, each taken at a different age or at a different stage of life, is all I really want anyway. It's also important to carefully label them on the back, where the information can't get lost, because a photo without context information is nearly meaningless. Names are especially important: a photo of an unknown person is of virtually no interest.

Sorry I don't have a low-tech answer for video, but video (or "home movies", as we used to call it) will be far less important to your descendants anyway.

Re:The old-fashioned method by Grizzley9 · 2013-12-10 07:50 · Score: 3, Interesting

Agreed. Looking through a family picture album from the late 1800's I realized my hundreds of GB's of current family pics will likely die with me. There are a ton of family images and a select few family pics may be copied by progeny but unlike their printed counterparts, there are no names or locations on many (and sometimes dates if the exif gets corrupted or overwritten).

So what good is a bunch of pics or videos of long past events except to the person involved? Digital images today, unless meticulously managed and edited do little good for historical purposes like the photo album of yesterday. Especially if those are locked away in some online archive that may or may not be easily accessed if the owner can keep up with format and company changes over the decades they will have them and descendants know where they are.

Prepare for maintainer-rot, too by Rob+the+Bold · 2013-12-10 06:46 · Score: 3, Interesting

A family archive maintained by the "tech guy/gal" in the family is also subject to failure from death or disability or the aforementioned maintainer. Any storage/backup solution should therefore be sufficiently documented (probably on paper, too) that the grieving loved ones can get things back after a year or two of zero maintenance and care of the system. That would also imply eschewing home-brew type systems in favor of using standard tools so a knowledgeable tech person not familiar with the creator's original design can salvage things in this tragic but possible scenario. Document the system so even if the family can't do it themselves, and an IT guy has to be contracted to resurrect the data, he'll have the information needed to do so.

Any system sufficiently dependent on regular maintenance by just one particular person is indistinguishable from a dead-man time-bomb.

--
I am not a crackpot.

You need an editing plan more than a backup plan by neo-mkrey · 2013-12-10 06:48 · Score: 4, Interesting

100,000s -- like 300,000? More? How many of them will you actually ever look at again? Less 1% I'm guessing. Here's my advice (and it's what I do), step 1) when transferring pics to your computer, delete the ones that are out of focus, bad lighting, framed poorly, etc. This is about 15%. Step 2) once a month, go through the photos you have taken the previous month and delete those that just don't mean as much anymore (if they have decreased in emotional value in 30 days, just think how utterly worthless they would be in 5 years?). This takes care of another 30%. Step 3) once every 3 months, I and my wife pick the cream of the crop for physical prints. This is about 10%. These are stuck into photo albums, labeled and kept in a fire proof safe in our basement. So 200 photos a month, gets reduced to ~100, and then 10 per month are printed. YMMV

Re:Excellent question by entrigant · 2013-12-10 06:48 · Score: 3, Interesting

I've been surprised by the lack of reference of proper error checked data paths so far in these comments. I'm continually saddened by ever increasing aggressiveness in clocks and density of RAM in consumer level systems while stubbornly refusing to implement ECC. Many people are even hostile to the idea as if ECC RAM is somehow tainted.

This article points out something else I'd not even considered. A scenario where lack of ECC on a self healing file system can amplify a RAM failure to a catastrophic degree making such filesystems even riskier to run on consumer grade systems.

Thank you for sharing.

Photos = Lightroom plus DNG on a Drobo by carlcmc · 2013-12-10 06:51 · Score: 2

Convert photos to DNG in Adobe Lightroom and use the ability for it to check for file changes. Store on a Drobo with dual disk redundancy.

Re:There are pills which can help you by Kevoco · 2013-12-10 06:52 · Score: 1

I work next to a moving and storage company. Occasionally the dumpster out back can be found unceremoniously overflowing with the contents of a forgotten storage locker. Anything of value has been teased out - you know what gets tossed? Everything else, especially photo albums, trophies, diplomas, etc.

“What is most personal is most general”— Carl Rogers

ZFS, of course by rainer_d · 2013-12-10 06:56 · Score: 2

but there is a catch: to reliably detect bit-rot and other problems, you also need server-grade hardware with ECC.
ZFS (especially when your dataset-size increases and you add more RAM) is picky about that, too.
Bit-rot does not only occur in hard-disks or flash.
You should really, really take a hard look at every set of photos and select one or two from each "set", then have these printed (black and white, for extra longevity).
If this results in still too many images, only print a selection of the selection and let the rest die.

--
Windows 2000 - from the guys who brought us edlin

Re:ZFS, of course by rrohbeck · 2013-12-10 10:02 · Score: 1

I run btrfs on RAID6 (with weekly scrubbing) on a system with ECC RAM. That should reduce the incidence of bit rot to a negligeable level.

--
thegodmovie.com - watch it

Back up more frequently and to more places by brunes69 · 2013-12-10 06:57 · Score: 1

The solution to Bitrot and reading of old media is very simple and honestly I don't know why it comes up so much. Storage is DIRT CHEAP. 2TB of Data is NOTHING, you can get a 3TB+ external drive for $100 or even less on sale. Buy 3 drives, keep 1 in SAFELOCATION*, Back up to 1 drive every even week, and the second one every odd week, and once a month swap the one in the SAFELOCATION out for a local one and repeat the cycle. Increase or decrease frequency of SAFELOCATION swapping depending on level of paranoia.

There, the problem is simply and very cheaply solved and there is no level of bit rot that is going to cause all 3 of these backups to be destroyed within a 1 month time window.

* where SAFELOCATION is a off-premise location, either a close friend's house or a locked office desk or a family member's house or a safe deposit box

Re:Back up more frequently and to more places by cmurf · 2013-12-10 13:40 · Score: 1

This is asking too much for most people. For one, they aren't going to backup this consistently, especially off site. And then they are unlikely to turn backup drives into shelved archives once they're full, instead they tend to reformat them and reuse them. And that means any corrupt files on the source end up being replicated to all backups, eventually. So rather than considering one particular strategy as golden and spending too much time on it, multiple strategies is more effective.

I like the idea of printing photos, on acid free paper with pigment inks tested in combination for print permanence of course, and giving copies to family members possibly the best. It's a lot of material to create, store, move, protect, but its encoding is really simple, and requires no software, hardware, electricity, to decode.

Re:uhuh by isorox · 2013-12-10 06:57 · Score: 1

WARNING: DO NOT RUN ANY COMMAND IN THE PARENT, THIS COMMENT OR ANY OF THE SIBLING COMMENTS.

Unless you are working on the nsa's main database. Then you should run these commands several times, just To be sure the backup is complete. Then take a sledge hammer to the original files, for securit. And restore from the backup, to guarantee the backup worked.

Book a flight to Moscow first though

Re:uhuh by CanHasDIY · 2013-12-10 07:03 · Score: 1

And yet, one of FLOSS's selling points is our great community support...

Every community with a notable population size is going to have its share of bad actors.

Besides, ever since you were a kid you've been taught to not trust strangers based on their word alone.

--
An enigma, wrapped in a riddle, shrouded in bacon and cheese

Checksumming + sufficient redundancy by MetricT · 2013-12-10 07:06 · Score: 1

We wrote our own parallel filesystem to handle just that. It stores a checksum of the file in the metadata. We can (optionally) verify the checksum when a file is read, or run a weekly "scrubber" to detect errors.

We also have Reed-Solomon 6+3 redundancy, so fixing bitrot is usually pretty easy.

Re:Excellent question by s13g3 · 2013-12-10 07:07 · Score: 1

This doesn't even count the fact that optical media is still subject to the same degradation and bitrot that tape is.

And anyone who thinks electromagnetic tape is "dead" is naive or just ignorant. People have been predicting the death of tape for decades, and it's no more true today than it was in the 70's. Modern EM tape is typically rated for 15 to 30 years of retention, and as long as it is not over-exposed to moisture during storage, it has proven to be able to last that long: otherwise, the manufacturers would be out of business because the Fortune 500 and S&P 500 companies - the majority of whom backup to tape and send it off-site - would have sued them to extinction.

On the other hand, according to archives.gov:

"CD/DVD experiential life expectancy is 2 to 5 years even though published life expectancies are often cited as 10 years, 25 years, or longer. However, a variety of factors discussed in the sources cited in FAQ 15, below, may result in a much shorter life span for CDs/DVDs."

--
"Inveniemus Viam Aut Faciemus" 'We will find a way... Or we will make one!' --Hannibal of Carthage

Re:Excellent question by bill_mcgonigle · 2013-12-10 07:09 · Score: 1

It's only $4 a month for unlimited backups to CrashPlan.

Do they throttle? I looked into the one that advertises unlimited backups for $60/yr and they rate limit the connection down as you increase your data. I estimated 9 years for the first backup to complete based on published rates.

"Unlimited" - IDTIMWYTIM.

--
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)

Re:Excellent question by SirMasterboy · 2013-12-10 07:15 · Score: 1

Not that I have seen. It maxes my 5Mbit upload and my downloads are 15-20Mbit.

Do not defrag ? Definitely do not over clock. by perpenso · 2013-12-10 07:16 · Score: 2

ZFS has proven that a wide variety of chipset bugs, firmware bugs, actual mechanical failure, etc are still present and actively corrupting our data.

And I expect that defragging aggravates this. Read a perfectly good block of data from disk into flaky RAM, have a bit flip, and write out that corrupted data to its new location. Even if the software is verifying its likely to verify against RAM and it did successfully write what is in RAM.

And then there is over clocking. If a computer is just used for gaming, no problem. But if its used for more serious things or archiving things of value to you then you may want to pass on over clocking. Folks who say you can verify an over clocked CPU are mistaken. Its not a crash or no crash thing, at a certain unpredictable point in over clocking an unpredictable CPU instruction may simply give an incorrect result. This incorrect result could end up in your data or image. I've seen over clocked CPUs mess up a text string that is supplied by the CPU itself, CPUID's vendor string.

Re:Do not defrag ? Definitely do not over clock. by Anonymous Coward · 2013-12-10 07:31 · Score: 1

Every ECC-equipped RAM module I've seen these days corrects single-bit errors and warns about multiple-bit errors.
If your machine doesn't have ECC RAM installed, and it *can* have it installed, strongly think about doing so.
If it cannot, add that feature to the list of things to look for in your next machine. (Protip: These days, *all* motherboards with AMD chipsets support ECC RAM.)

Re:Excellent question by NatasRevol · 2013-12-10 07:16 · Score: 1

"IDTIMWYTIM." should be worked out to be SOMEIDIOT

--
There are two types of people in the world: Those who crave closure

Errors While Copying by organgtool · 2013-12-10 07:21 · Score: 1

As other people have mentioned, a lot of these errors can occur while you are actually copying the files. I have copied files and immediately executed md5sums on the source and dest files only to find differences. Unfortunately, I didn't start this practice until after I had to restore from backup only to find that some of the backup files were corrupted.

And given that this seems to be a common problem, why in the holiest of hells does the cp command not have a verify option? Yeah, it's easy enough to wrap the copy command with md5sums, but a verify option would be even easier. Throw in an auto-retry function on top of that and you'd be really cooking.

By the way, the submitter did not mention the current method of backup, but if they are using Linux with the cp command, they would be better served by moving over to something like rsync.

Re:Errors While Copying by EmagGeek · 2013-12-10 08:10 · Score: 1

The question is, why the holy hell are you using cp and not rsync?
Re:Errors While Copying by organgtool · 2013-12-10 08:38 · Score: 1

Because at the time, I didn't know about rsync, let alone understand it well enough to feel comfortable using it for backups. Also, sometimes rsync is overkill when I just need to copy a few files but would like to know that the destination files aren't corrupt.

ZFS is one option, Glacier is worth looking at. by jafo · 2013-12-10 07:23 · Score: 1

I've used ZFS under Linux for 5 years now for exactly this sort of thing. I picked ZFS because I was putting photos and other things on it for storage that I wasn't likely to be looking at actively and wouldn't be able to detect bit-rot until it was far too late. ZFS has detected and corrected numerous device corruption or unreadable issues over the years and corrected them, via monthly "zpool scrub" operations.

I have been backing these files up to another ZFS system off-site. But now I'm starting to look at other options because it's looking like I can begin doing it more cheaply than even my free hosting of a box I bought can provide.

Amazon Glacier reduces the cost of S3 storage by an order of magnitude, making 2TB of storage cost around $20/month. For a backup copy, it's hard to compete with this, even just buying a USB drive to stick somewhere... You do have to be careful about recovery though, they charge based on peak download speed (a very weird pricing).

Re:Excellent question by SecurityTheatre · 2013-12-10 07:23 · Score: 1

What is the most practical way to maintain bitwise accuracy on a diverse set of binary data in an automated way using "diff and md5sum"?

Note that part where he was looking for an automated solution that will run itself without intervention, or a better means than hard drives...

You suggested... "Do some manual stuff using hard drives".

Right.

git annex by rescdsk · 2013-12-10 07:24 · Score: 1

git annex is an open source project that lets you distribute files around various media (including external HDs, Amazon S3, SSH-connected computers, etc.). It has an fsck command for checking that your data still matches its checksums.

There's a GUI interface that makes it a lot like Dropbox, where you just add files to a folder, and they are sync'd.

It works on OS X and Linux, with an alpha for Windows.

--
-- rm -rf / tells you if you have root or not

Re:Excellent question by lgw · 2013-12-10 07:38 · Score: 2

Well, I did backup software and hardware for nearly 20 years. But I can't substantiate that with a link.

--
Socialism: a lie told by totalitarians and believed by fools.

Re:Excellent question by lgw · 2013-12-10 07:42 · Score: 1

I've investigated hundreds of cases of "bit rot" over the years in my job, and other than very weak magnetic media (or CD-Rs as someone upthread pointed out), corrupt backups were always corrupt when written. Had the poor SOB only verified his backups day 1, he'd not be in a world of shit. Every single time.

--
Socialism: a lie told by totalitarians and believed by fools.

Re:Excellent question by lgw · 2013-12-10 07:44 · Score: 1

The error rate from other sources (e.g. on the network copy) is far higher. If your backups are corrupt, it's almost certain they were corrupt day 1.

Test your backups after you make them: it's a cheap and easy 99% solution.

--
Socialism: a lie told by totalitarians and believed by fools.

Re:Splendid by aaaaaaargh! · 2013-12-10 07:46 · Score: 1

Jesus Christ, take it easy, man. I was making a harmless joke that anyone who was ever forced to watch boring holiday slideshows would be able to understand. Now I'm being accused of mental health issues, not being able to procreate and whatever else.

If hundreds of thousands of family pictures doesn't seem a bit excessive to you, so be it. After all, it takes only a few weeks to sort through them. But please calm down a little and stop spamming AC troll posts.

Re:Excellent question by lgw · 2013-12-10 07:47 · Score: 1

You make a great point about CD-Rs, I guess I should have broadened my statement to "cheap-ass backup solutions from the 90s", not just floppies and tape.

--
Socialism: a lie told by totalitarians and believed by fools.

Re:Excellent question by bluefoxlucid · 2013-12-10 08:04 · Score: 3, Interesting

I used to fancy a girl who worked as a data recovery engineer. You wouldn't believe how many people hear the RAID controller alarming and get up to close the case instead of hot swapping a spare drive.. then a week later the second drive goes. She had a fanciful story about how spinning disks used to occasionally fail in such a way that a random sector would go bad, report incorrect data, and a RAID-1 mirror would "fix" it by destroying data on the other drive. She also used to tell me software RAID options had a tendency to actually beat hardware RAID options for data integrity outside of other inline failures--that is, when the system is operating under optimal circumstances, most hardware RAID systems more often self-corrupt than software RAID systems. Just an odd statistic, and I never got overall risk performance stats out of her.

--
Support my political activism on Patreon.

MD5 and a few scripts by MooseTick · 2013-12-10 08:15 · Score: 2

Here's a cheap easy solution (assuming you can write some basic scripts)

1. Start by taking an MD5 of all your pics.Save the results.
2. Backup everything to a 2nd drive. Take MD5s and be sure they match using basic scripts.
3. Perioducally scan drive 1 and 2 and compare against their expected MD5 value. If one has changed, copy it from the other (assuming it is still correct)

You could expand this with more drives if you are extra paranoid. You could do this cheap, check regularly, and know when bitrot is happening.

--
Ninjas don't carry tic tacs

Re:MD5 and a few scripts by Smork · 2013-12-11 00:44 · Score: 1

Instead of writing your own scripts, perhaps you can try http://md5deep.sourceforge.net/

surprising recovery by shokk · 2013-12-10 08:20 · Score: 1

I think that when writable CDs first came out, we thought that they would last forever. And in some sense they do last long enough. The other day I found a CD binder full of games and a few backups from 1996. The most surprising of all was a collection of photos that I thought had been long lost, and with a little rsync running over and over and over, I got all the files off intact and saved them to my Flickr account.

The most important thing to understand, I think, is that we have to look at digital storage as a convenient and temporary medium and that anything longer lasting would need to be hard copied. It’s not a guarantee, but it’s a better likelihood of survival. Pictures can survive by pure chance for a couple hundred years. We’re lucky if our current stuff will handle a few years, much less natural disasters and history itself.

For many, the cloud seems to be a utopia, but corporate and national politics can make all your treasured media disappear without warning, and none of the free services give you a guarantee of safety if something craps out on their systems. And as for paid cloud services, ask yourself if anyone will bother to take care of it after you’re gone, or if anyone will bother to archive it, or if your family will just toss it aside even if they are able to get them as part of your estate. Ask yourself who you’re saving all that for. Are we just digital hoarders?

--
"Beware of he who would deny you access to information, for in his heart, he dreams himself your master."

Re:Splendid by flyingfsck · 2013-12-10 08:20 · Score: 1

I was thinking that bitrot is the computer god's way to protect our descendants...

--
Excuse me, but please get off my Pennisetum Clandestinum, eh!

Re:Excellent question by doggo · 2013-12-10 08:32 · Score: 1

"Thanks for immediately jumping down my throat, though ;)"

Yeah. 'Cause you're the victim. WTF? Someone calls you out for being dickish, and they're jumping down your throat?

Re:Excellent question by DigiShaman · 2013-12-10 08:38 · Score: 1

I don't know of any long-term backup solutions aside from gold CDs to be quite honest. If they're not prone to bit-rot, they media reader's interface will be obsolete on new equipment. It's doable, but not without first creating bridge solutions and data migration. The way I see it, migrate from media to media as technology progresses, or face an entire migration project later.

I suppose you could archive on flash drives, but I haven't a clue as to what the life expectancy of the flash chips are before bits start flipping randomly (gates change on die).

--
Life is not for the lazy.

It's not bit-rot by dhaen · 2013-12-10 08:43 · Score: 1

If you're noticing data corruption on only 2TB it's probably not what we normally call bit-rot. A bit that changes state for no apparent reason within a very large set of data can be described as bit-rot, otherwise it's general data corruption which has many causes which all are understood: Poor media, poor transmission of data, overwriting of data etc. Once you've got the system sorted out so you don't get data corruption, start thinking about the nature of your data. How much redundancy is in it? If it's jpegs the almost none, so a single bit error could be serious to a file. If uncompressed TIFFs then there is a lot of data redundancy and the single bit error might only be an error of a single pixel, which you might not even notice. And finally, don't expect optical media to be safe from errors. Only use it as part of a DR plan.

Re:Excellent question by Mysticalfruit · 2013-12-10 08:52 · Score: 1

As someone who has 100's of TB's of data stored in ZFS I couldn't agree more. In most cases if ZFS spits out a drive because it's convinced it's writing bad blocks, I believe it. In most cases (if it's a seagate drive) seatools backs me up on this... in several cases sea tools doing a quick check says the drive is fine... it never fails if I do a "full" scan of the drive it'll eventually throw an error.

I've found damaged SAS cables, JBOD enclosures with dodgy bridges, etc. because of ZFS.

With that all said, now that you've gone out and bought a small PC, stuffed 4, 4TB drives into it and set it up as a raid10 using ZFS you now need to ask the next question... what's more likely... I'm going to have two drives fail simultaneously or that my house is going to get hit with a {flood, lightning, fire, thieves, etc}

Honestly, I'd build two of these devices, one for local backups and I'd put one at a buddies house and do remote backups from your local device.

--
Yes Francis, the world has gone crazy.

snapraid by JoshRosenbaum · 2013-12-10 09:04 · Score: 1

Snapraid (free!) might be an option: http://snapraid.sourceforge.net/

It snapshots your data to some parity files on a separate drive. All you would have to do is occasionally copy those files offsite. Snapraid includes commands that allows you to check and fix bitrot as well.

CrashPlan PRO Enterprise by AdamInParadise · 2013-12-10 09:08 · Score: 1

CrashPlan could help you a lot. First, CrashPlan is a backup system, so it makes and manages a copy of your data, including every version of every file. CrashPlan addresses the bitrot problem on their side by running their own checksums on the stored files : if they detect an issue with a stored file, they will replace it with the original version, still stored on their computer. If some files get corrupted on your computer, you can restore them from CrashPlan, but you will need something on your side to tell you that something went wrong. Now, even if you realize that the file is corrupted years after it happens, you can still recover the previous non-corrupted version from CrashPlan.

Now, 2TB is a bit much to store on CrashPlan's cloud : unless you have a very fast connection (at least 100MB) it's going to take you a while to upload your data. The solution is to run your own CrashPlan PRO Enterprise server onsite (with periodical offsite backups of course). Don't be fooled by the name, it's pretty easy to set up and administer, and the licenses are fairly affordable (75$/user/year).

I've supporting CrashPlan PRO Enterprise in my company for 3 years, with 25 clients and about 1TB of data. While I'm not super-happy with the way the Code42 people run their CrashPlan business, the tech is solid. I'm kind of thinking that other backup systems work in similar ways.

Now, I hope that you'll excuse me for asking this question, but which kind of crappy file systems and hard drives are you using that generate significant levels of "bitrot" in files which are basically just sitting there?

--
Nobox: Only simple products.

Re:Excellent question by ThatsMyNick · 2013-12-10 09:14 · Score: 1

You are missing a key ingredient: encryption.

Re:BTRFS filesystem by RR · 2013-12-10 09:27 · Score: 3, Informative

The only way to truly prevent bitrot is by maintaining at least three complete copies of the data, and regularly compare between them.

There you go again. Acting like you know what you're talking about, but you don't.

ZFS and BTRFS have a much more efficient way to ensure correctness: CRC of everything written. That is what is checked when you do a zpool scrub or a btrfs scrub. Random errors are very unlikely to produce the same checksum, so then you only need a second copy that doesn't produce CRC errors.

Hard drives are nowhere near as reliable as their manufacturers claim. Modern drives don't store the bits that you feed them exactly as you give them. Instead, they use CRC and error correcting codes, so they only need most of the data to be correct. Usually, if the data doesn't match the CRC, and it cannot be corrected by ECC, then you get a read error instead of corrupted data. Which, I guess, is better than getting a corrupted picture. Ideally, a RAID would be able to recreate the missing block, but I can't find any reference to a RAID doing that.

But I've seen enough errors that I suspect something else is going on. It surely doesn't help that modern computers have many gigabytes of memory, but almost none have ECC on that memory. Your computer can be corrupting your data, and you have no warning that it's happening. In addition, hard drives lie. I'm not optimistic about the long-term storage of electronic data.

--
Have a nice time.

Reed Solomon FEC by flyingfsck · 2013-12-10 09:34 · Score: 1

There is also rsbep, which uses Reed Solomon FEC. This is a classic filter, so you can use it together with tar, gzip and gpg to protect archives against NSA snooping and bit rot simultaneously.

Something like:
$ tar -cz indirectory | rsbep | gpg -e > out.tar.gz.rs.gpg

La voila!

--
Excuse me, but please get off my Pennisetum Clandestinum, eh!

Stone age by flyingfsck · 2013-12-10 09:36 · Score: 1

Got to carve those pics in stone, in Egypt, else nobody will care about them later.

--
Excuse me, but please get off my Pennisetum Clandestinum, eh!

ZFS is Not a Panacea by ewhac · 2013-12-10 09:39 · Score: 1

FreeNAS and ZFS are indeed awesome. But before y'all go installing FreeNAS on some spare hardware and think your problem is solved, you need to be aware that ZFS is not a panacea. You can't just drop it on Any Old Box with default settings and expect it to magically keep your data safe unto perpetuity. You need to pay attention to what you're doing.

Some highlights:

ZFS's design requires RAM to be perfectly reliable, or at least report imperfections. Undetected bitrot in RAM can and will destroy your entire ZFS pool. Thus, a machine with ECC RAM installed is a requirement.
As if that weren't enough, ZFS eats huge amounts of RAM. The current guideline is 1 GiB of RAM per TB of disk spindles, with 8 GiB as a practical minimum.
ZFS assumes it has perfect knowledge of disk writes in-flight, and as such doesn't play well with RAID controllers, which can silently re-order writes. If your machine has a RAID controller, the RAID features should be turned off. Don't worry, ZFS has its own RAID features. However:
Because drive densities are now approaching drive error rates (10**13 bits of storage, with manufacturers quoting uncorrectable errors every 10**14 bits read), ZFS RAID-Z1 is no longer considered sufficient to ensure storage integrity, and you should plan for RAID-Z2 (two parity drives).
For the same reason as turning off RAID, a "production" FreeNAS/ZFS installation should not be run in a virtual machine. It's okay if you're just test-driving it to get a sense of what it can do, but a live system should run on actual hardware.
Using ZFS's de-duplication feature is officially discouraged. It may seem like a great idea, but it will gobble all your RAM and return very little benefit. On average, you're better off using compression.

When ZFS dies, it dies in a big and fairly comprehensive way, and ZFS will die if you under-provide it. In any event, you should RTFM before contemplating a build, and know the trade-offs you're getting in to.

Schwab

--
Editor, A1-AAA AmeriCaptions

git-annex by halfnerd · 2013-12-10 09:43 · Score: 1

http://git-annex.branchable.com/

Re:Excellent question by lgw · 2013-12-10 09:46 · Score: 1

Sounds right to me - and there are sadly still people who need to be told "RAID is not backup".

--
Socialism: a lie told by totalitarians and believed by fools.

Re:Excellent question by lgw · 2013-12-10 09:52 · Score: 1

LTO has 30-year media easily available, and there's a lot of basis for tape for judging the real lifetime, since the technology has been around forever. For modern archive-quality tape, the backing will fail before the magnetic media. For normal LTO tape different manufacturers make different claims, but more than 10 years is normal. Insuring you can still read the tape is of course a different challenge, but the drives try to be backwards compatible for a while (and the drives are fairly robust when in limited use). Fortunately, connection interfaces seem to be slowing their rate of change - a PCIe card will likely find a slot in servers for years to come, and SAS will also likely be around for quite some time, though the cards may get pricey if they become legacy-only.

--
Socialism: a lie told by totalitarians and believed by fools.

Re:BTRFS filesystem by girlintraining · 2013-12-10 09:59 · Score: 2

There you go again. Acting like you know what you're talking about, but you don't. ZFS and BTRFS have ...

Exactly dick to do with what I said. The filesystem doesn't matter. The operating system doesn't even matter.

Modern drives don't store the bits that you feed them exactly as you give them. Instead, they use CRC and error correcting codes, so they

... Which again counts for exactly dick. I'm talking about infrastructure and architecture, while you're blubbering on about the hardware.

Which, I guess, is better than getting a corrupted picture. Ideally, a RAID would be able to recreate the missing block, but I can't find any reference to a RAID doing that.

That's because you have no experience as a network administrator in a professional environment. Because then you'd know that's the very thing RAID was designed to do: Recover from hardware failure, which includes sectors becoming unreadable. You are clearly confused both which what level of abstraction is being discussed (architecture versus hardware), as well as the different types of failure modes each of these solutions presents. Bit rot is a physical process that occurs in all magnetic media, and at sufficiently small-scale, can also affect non-persistent storage such as RAM.

It surely doesn't help that modern computers have many gigabytes of memory, but almost none have ECC on that memory.

That's because ECC adds an extra layer of complexity to solve a problem that doesn't occur very often in computers, and when it does, the most severe consequence is usually that the computer crashes or behaves abnormally. For residential, and even most commercial uses, ECC memory just isn't needed. But for a select few use scenarios where data integrity is absolutely critical -- such as, say, nuclear power plants, air traffic control systems, certain types of hospital equipment, or financial processing systems, the added cost is justified because they need high availability/high reliability of those systems. It's also used in certain aerospace applications because the physical mechanism that causes bitrot -- high energy radiation, increases quite a bit at higher altitudes, and in space increases several orders of magnitude -- and if you're going to put something in geostationary orbit, it then takes the full brunt of solar radiation with no mitigation. Correcting for memory problems in these situations is better done at the hardware level; hence ECC memory.

Your consumer-grade computer's memory is a piece of shit. It's made with commodity capacitors and ICs that are stamped out in bulk for super cheap. And, big surprise -- super cheap doesn't mean super reliable. But we don't need super reliability -- when our system shows obvious signs of a failing memory stick, we just drive to the store, plunk down a $20 and abscond with a new one. Problem solved.

I'm not optimistic about the long-term storage of electronic data.

That's because, as previously pointed out, your experience comes from consumer-grade hardware that you don't fully understand the design considerations made. NASA has had great success in the long-term storage of magnetic media -- in fact there was an article not long ago about how they had to reverse-engineer equipment designed during the 1960s for the Apollo program to recover data on tape reels, when they lacked the original equipment it was recorded from. They discussed how the tapes themselves had become brittle and the ferrous oxide would actually peel off in chunks while reading, much like how paint peels off a house, but they were able to recover this data anyway. The technology we have today is far more sophisticated and unlike old tape-technology doesn't require physical contact with the source media to read it. There are companies like OnTrack that specialize in data recovery from harddrives and boast a rema

--
#fuckbeta #iamslashdot #dicemustdie

Re:PAR2? No, MultiPar. by grep+-v+'.*'+* · 2013-12-10 10:05 · Score: 1

Try again, but this time with subdirectories

PAR2 with subs: Multipar and alternate

I've been using it for well over a year, it works great. Was using this for a while -- it's OK, but Multipar is much better.

Or just continue to use PAR on single directories with subs placed in some type of archive (zip, 7z, tar) file.

None of these holds a candle to ZFS as a live file system, but these all work great when archiving files to DVD/BD.

Heck, I'm currently copying multiple dirs to BD and using Multipar as "only" a checksumming and renaming repair tool -- not even bothering with the file content recovery option. For that matter, I've even created a (single) disc with 300% recovery -- if I lose all of the primary files and over half of the recovery content bits, I can STILL recover the contents. (I've tested this by manually damaging the file contents. I have multiple copies in different places, too -- there are just a few static files that I do *NOT* want to lose.)

--
If the universe is someone's simulation -- does that mean the stars are just stuck pixels?

Re:Excellent question by bluefoxlucid · 2013-12-10 10:11 · Score: 1

Oh my god she said that FIVE TIMES EVERY DAY!

--
Support my political activism on Patreon.

M-Disc by cfulton · 2013-12-10 10:20 · Score: 1

You might try to backing up with http://www.mdisc.com/what-is-mdisc/ I've been using them since they came out and all my backups still work. It is supposed to last a thousand years. I don't know about that, but they do seem to be better than backing up to regular dvd which I have had go bad in as little as a year.

--
No sigs in BETA. Beta SUCKS.

Re:uhuh by macbeth66 · 2013-12-10 10:24 · Score: 1

A rite of passage? You must be joking! I've never met anyone stupid enough to have actually run that command with those parms. The first time someone tried that on me, I did a 'man rm' and looked the doc. I always thought that was the lesson; RTFM.

Re:BTRFS filesystem by bidule · 2013-12-10 11:00 · Score: 1

The only way to truly prevent bitrot is by maintaining at least three complete copies of the data, and regularly compare between them.

Erm, no. Hamming(7,4) doesn't even need double the space, and that was 60 years ago.

--
ID: the nose did not occur naturally, how would we wear glasses otherwise? (apologies to Voltaire)

Re:Excellent question by funwithBSD · 2013-12-10 11:05 · Score: 1

And I have my FreeBSD server acting as a local backup with ZFS backed storage.

So if I do need something, I just grab it back local.

--
Never answer an anonymous letter. - Yogi Berra

Re:Excellent question by DarkTempes · 2013-12-10 11:12 · Score: 1

In general ISPs didn't ever have unlimited. They advertised unlimited and then knocked people off if they passed some secret unpublished limit.

The difference now is that they no longer advertise a lie and they have published and trackable limits. The only issue is that the limits are in many cases absurdly low but otherwise it's a better practice than what they were doing before.

Re:BTRFS filesystem by MarkTina · 2013-12-10 11:17 · Score: 2, Informative

RAID10 and similar systems are two RAID5 systems which are independent and regularly compare data; These can detect which system is inconsistent, so you will always have at least one copy of your data in a consistent state.

You were doing quite well up until you said that sentance .....

Re:BTRFS filesystem by DamnStupidElf · 2013-12-10 11:22 · Score: 1

Without parity checking, you simply aren't addressing bit rot. Period. It could be Raid 9 Million(tm) and if all it's doing is copying the data, and not comparing it, bit rot will still proceed apace, silently eating your data. But let's say you're a good administrator that has enabled parity. Great! But there's still a problem: parity cannot restore data that has become corrupted due to bit rot -- it is a detection-only mechanism.

This is incorrect for Reed-Solomon based RAID (levels 6 and higher such as RAID Z3). RAID6 can correct bit rot on a single disk and in general for t parity disks, floor(t/2) random errors per RS code can be corrected. All the RS-based RAID systems I've seen essentially store the RS code across devices using a GF(2^8) code, meaning that up to an entire byte could be corrupted by bit rot at a given logical address across all the stripes and still be corrected. All the details are on Wikipedia. Not all RAID-6+ implementations actually check the parity when reading, and I have no idea how many can solve the error locator polynomial for each RS code to actually identify and correct bit rot in multiple locations in different codes versus just dealing with known bulk errors (e.g. failed disks).

Now that I've explained all the ways that you're wrong, let me say that bit rot is probably not the cause of the OPs problems. Infact, USB devices are well-known for corrupting filesystems because of spontanious disconnects, power loss events, etc., and this is simply what can be expected in a typical residential environment. Even a RAID configuration in a residential environment isn't invulnerable to the "write hole" problem -- where data is partially committed to disk, but then the array suffers a power loss event.

Any proper file system will have a large enough transaction/intent log that can be replayed to correct partial data/metadata writes due to power failure and the RAID write hole, etc.. Most file systems in use are not proper, of course, but at least a few are available.

Re:Excellent question by MarkTina · 2013-12-10 11:22 · Score: 1

Bitrot is a myth in modern times. Floppies and cheap-ass tape drives from the 90s had this problem, but anything reasonably modern (GMR) will read what you wrote until mechanical failure.

Well fingers crossed you are not the storage admin for anyone I deal with!

Re:BTRFS filesystem by GigaplexNZ · 2013-12-10 11:48 · Score: 1

My understanding is that Storage Spaces is (as he says) MS's version of ZFS - does it not have the same data-checking features/ performance hit that 'regular' ZFS does?

No, it does not have the same data-checking features. Yes, it has a performance hit. Worst of both worlds. I've used it, and junked it as it was literally an order of magnitude slower than RAID5 via mdadm on Linux and didn't actually add any resiliency over RAID5 or flexibility as to grow an existing pool, you need to add multiple similarly sized drives since it doesn't rebalance. This is despite their marketing claims that you can add mismatched drives in an ad hoc fashion and have it "just work".

The only way to get Microsofts unproven resiliency benefits is to use ReFS in conjunction with mirroring (not parity) on the expensive server editions. Windows 8/8.1 does not support ReFS.

Re:Excellent question by GigaplexNZ · 2013-12-10 12:10 · Score: 1

Honestly, I'd build two of these devices, one for local backups and I'd put one at a buddies house and do remote backups from your local device.

Oh what I'd do for usable upload bandwidth and reasonable data caps...

Re:Excellent question by GigaplexNZ · 2013-12-10 12:16 · Score: 1

Test your backups after you make them

Obviously.

it's a cheap and easy 99% solution

It's not a solution. It's a bare minimum requirement that doesn't solve for bitrot.

Re:Excellent question by lgw · 2013-12-10 12:59 · Score: 1

Well, maybe I don't understand what you mean by "bitrot". GMR media doesn't "rot" in the classic sense of bits flipping over time (well, not in human-scale time), the way that happened with floppies and QUIC tape. If you're adding some new meaning to that term, you'll need to explain it.

But if your talking about odd disk failures: as I said at the top of the thread, if you're using disk, archive stuff in RARs (or other checksummed archives), test those checksums from time to time, and don't purge old backups the moment you make new ones. Or just use tape and you're fine, at least until it gets hard to find a drive old enough to accept the tape (10+ years).

--
Socialism: a lie told by totalitarians and believed by fools.

Re:BTRFS filesystem by rsmith-mac · 2013-12-10 13:36 · Score: 1

Without parity checking, you simply aren't addressing bit rot. Period. It could be Raid 9 Million(tm) and if all it's doing is copying the data, and not comparing it, bit rot will still proceed apace, silently eating your data. But let's say you're a good administrator that has enabled parity. Great! But there's still a problem: parity cannot restore data that has become corrupted due to bit rot -- it is a detection-only mechanism. So if you have two drives in a RAID-1 with parity configuration, as you also suggest... it will detect the file corruption, but as it cannot correct it, it will then promptly seize up and fall over dead. This is because for every N clusters written, a parity cluster is also written; This allows the array to detect if that data chunk was correctly committed; But if the data on any of the clusters within the chunk are altered later, the RAID array will only know that this chunk of data (known as a stripe in RAID), is invalid. It cannot correct it.

One quick note: a mirrored space running ReFS will do automatic checksumming and scrubbing. This isn't done for parity spaces, though I'm not sure why this is.

http://blogs.msdn.com/b/b8/archive/2012/01/16/building-the-next-generation-file-system-for-windows-refs.aspx

Re:BTRFS filesystem by girlintraining · 2013-12-10 13:58 · Score: 1

This is incorrect for Reed-Solomon based RAID (levels 6 and higher such as RAID Z3). RAID6 can correct

... Yes, but earlier systems, which the OP was suggesting could be used for this purpose, lacks that functionality. Also, please reset your sarcasm detector, it appears to be out of alignment -- a functional detector would have pinged on "Raid 9 Million(tm)".

Any proper file system will have a large enough transaction/intent log that can be replayed to correct partial data/metadata writes due to power failure and the RAID write hole, etc.. Most file systems in use are not proper, of course, but at least a few are available.

Correct, and those that are aren't immune to human stupidity. No filesystem can save you from a guy who decides to pour beer into the storage array, or who goes to move a directory and misclicks sending it to the trash. Disaster recovery is not a simple matter of choosing the right filesystem and then patting yourself on the back. It requires careful planning and consideration... None of which the majority of the people on this thread seem to be capable of. At least you seem to have some grasp of the underlying technology.

--
#fuckbeta #iamslashdot #dicemustdie

Re:M-Disc by cmurf · 2013-12-10 14:34 · Score: 1

Right. The physical structure and materials used for stamped vs "burned" DVD/BR media are completely different. The photosensitive "burned" media can't be considered to have any useful permanence.

However, the biggest problem we face with any of these discs, is what hardware we will use to gain access to the encoded data on them? PATA is effectively dead, yet not even 10 years since then we'd have some difficulty reading data from a PATA drive just because the connector is uncommon. What about in another 10 years? In 20 years will there be any mainstream computers using USB at all? What about in 50 years? If we need to keep weird ancient junk around just to extract data from disks or discs, then the plan has failed. Pretty much from the outset for mortal consumers, a do it yourself digital archive is a recipe for a data recovery project in the future.

Re:BTRFS filesystem by gmhowell · 2013-12-10 15:29 · Score: 1

The next time you want to slam someone for "acting like you know what you're talking about", don't respond with a bunch of links to Wikipedia. Links, I might add, that are only marginally-relevant to the topic at hand. That shit wouldn't fly in college, so why do you think it's going to hold weight in a professional environment?

Slashdot, a 'professional environment'? As if we needed more proof that you're a fucking lunatic...

--
Jesus was all right but his disciples were thick and ordinary. -John Lennon

Re:Excellent question by baffled · 2013-12-10 17:37 · Score: 1

BTRFS needs to become the Linux default FS.

I just lost my wife's BTRFS partition yesterday after a hard-reset. Consulted Google for btrfs repair options and discovered they are lacking. Kept reporting root->node assertion failed, whatever that's supposed to mean. I don't recall the last time I've lost a partition like this, I assumed fsck would have done the trick.

See https://btrfs.wiki.kernel.org/index.php/Btrfsck :

Note that while this tool should be able to repair broken filesystems, it is still relatively new code, and has not seen widespread testing on a large range of real-life breakage. It is possible that it may cause additional damage in the process of repair.

Re:BTRFS filesystem by Common+Joe · 2013-12-10 18:00 · Score: 1

The only way to truly prevent bitrot is by maintaining at least three complete copies of the data, and regularly compare between them

Disagree. I've had an idea for a while that I'm surprised backup vendors don't do: two copies with a check sum and automatic restore*. The two copies and a check sum are a variation of the three-copy idea, but without the third copy. (I'd write a backup program myself with this idea except it would take too long to implement all of the ideas I have that I think every home backup program should have. The backup programs on the market are getting better, but they could still stand a few more improvements like this idea.)

My idea: On the first backup, the original copy on the hard drive gets backed up to the USB backup drive along with a check sum. (Despite your concerns about USB, I believe the original poster is talking about home use and can't really avoid this without significant costs.) When the backup is run a second time (like a day or week later), the original on the hard drive is compared to what is on the backup. Check sums are also performed. If something doesn't match, then you know you have bit rot. The check sum will determine whether the backup or the original is invalid and the program will then take appropriate action all without asking the user.

*Of course, Microsoft had to monkey up the works with using this idea. When you merely open an Excel file, it will modify the contents of the file. Very little can be found on this phenomenon, but here is something about it from Microsoft. Through personal experience, I have found it does not change the modified date and time after the file is closed, but it does modify contents. (I discovered this while playing with a prototype of my idea.) This fits with what they say in the link I provide, but it's not exactly the thing that jumps out at you after the first or second read. When only a single user uses the file, this phenomenon is not seen, although I suspect that Microsoft writes to the file then as well -- an idea which I absolutely hate. Truecrypt is also guilty of this, but at least it does it on purpose, it is documented, and you can turn it off. For security reasons, there is a setting that allows changes to a truecrypt container without changing the modified date and time marks of the truecrypt container file.

Re:BTRFS filesystem by RR · 2013-12-10 18:18 · Score: 1

I know, I shouldn't respond to a troll, but I'm feeling generous today.

There you go again. Acting like you know what you're talking about, but you don't. ZFS and BTRFS have ...

Exactly dick to do with what I said. The filesystem doesn't matter. The operating system doesn't even matter.

Um, excuse me? The filesystem absolutely does matter. Traditionally, the filesystem assumes that any data retrieved from the drive has been put there, earlier. Obviously, drives don't do that 100% reliably. It's an important innovation, that these newer filesystems will add their own checksums to the data that they write, so they can detect and sometimes fix corrupted reads.

I'm talking about infrastructure and architecture, while you're blubbering on about the hardware.

Get your head out of the clouds. Everything does come down to hardware. In fact, given your other posts about hardware, I sometimes doubt that you actually interact with the hardware that you talk about.

Ideally, a RAID would be able to recreate the missing block, but I can't find any reference to a RAID doing that.

That's because you have no experience as a network administrator in a professional environment. Because then you'd know that's the very thing RAID was designed to do: Recover from hardware failure, which includes sectors becoming unreadable.

That's an aspect of software. Of course a RAID with sufficient parity will recover from a total drive failure. It's much harder to find reference to how a particular RAID will respond to intermittent errors. But if you're not just a blowhard, I'd like to see some of your links to documents describing how the RAIDs that you know will handle drive read errors. Not total failures. Just read errors.

Speaking of RAID, ZFS has its own concept of RAID that supports up to triple parity, with a different architecture than a normal storage system. Still, I haven't found any reference to how it handles drive read errors.

It surely doesn't help that modern computers have many gigabytes of memory, but almost none have ECC on that memory.

That's because ECC adds an extra layer of complexity to solve a problem that doesn't occur very often in computers, and when it does, the most severe consequence is usually that the computer crashes or behaves abnormally. For residential, and even most commercial uses, ECC memory just isn't needed. But for a select few use scenarios where data integrity is absolutely critical -- such as, say, nuclear power plants, air traffic control systems, certain types of hospital equipment, or financial processing systems, the added cost is justified because they need high availability/high reliability of those systems.

What a horrible attitude to data integrity. Computer crashes, I lose data. Computer behaves abnormally, worst case scenario is it calculates some important thing wrong, say the root of an important filesystem B-tree, and the filesystem needs to go through an expensive repair. My data are important to me. I use my computer for my personal financial processing, and I know I'm not alone. My old computer had an extra 128kB of memory to provide parity checks for the other 1MB. I imagine that stupid traditions of cost-cutting are why my new computer does not have 2GB of memory to provide ECC for the other 16GB.

Your consumer-grade computer's memory is a piece of shit. It's made with commodity capacitors and ICs that are stamped out in bulk for super cheap.

And your server memory isn't? Back up a moment... I thought OP was talking about being able to detect bitrot in family photos, and now you're telling him he should buy a server with memory lovingly crafted for high reliability? Which reliability i

--
Have a nice time.

Tool for checking metadata by shani · 2013-12-11 00:41 · Score: 1

I know it's not really an answer to your question since it's not done, but I started a tool to save and check metadata of files:

https://github.com/shane-kerr/fileinfo

Right now it just outputs a file with all of the meta-data (including SHA-224 hash of the file contents). If you think this seems interesting, I can whip up the part that uses that file to check the meta-data this weekend.

Re:Excellent question by semi-extrinsic · 2013-12-11 01:33 · Score: 1

Since you're an experienced ZFS user, do you have any recommendations for how to sync the systems described below?

I have a setup simliar to the one you describe. One box at work with 2x3TB with ZFS and mirroring (raid1), similar box at home. The box at home is fairly recent, so I haven't gotten a good system for synchronizing them yet. My internet at home is 50/10 Mbps, work is much faster. The idea is that I backup both my personal photos (originates on home box, usually ~10 GB a month) and my work data (created on the work box, usually a steady stream of 1 GB per week and bursts of 10-50 GB occasionally). If possible I would like to have some directories on the work box that are not synchronized to the home box.

If the fact that both computers are sources of new data is a problem, I guess it's possible to modify that workflow.

And any other recommendations for ZFS? I scrub the pools weekly, but otherwise treat it as zero-maintenance.

--
for i in `facebook friends "=bday" 2>/dev/null | cut -d " " -f 3-`; do facebook wallpost $i "Happy birthday!"; done

Amazon Glacier by bwroga · 2013-12-11 02:05 · Score: 1

I use the MD5 solution mentioned above, but also back everything up to Amazon Glacier. From what I've read, retrieving your data can be a pain, but storage is only $1 a gigabyte per month and they say that they store multiple copies across multiple locations and periodically check for data integrity. If data integrity is lost, they repair it using the other copies. I asked them how often data is checked for integrity and they said:

"Amazon Glacier is designed to provide average annual durability of 99.999999999% for an archive. The service redundantly stores data in multiple facilities and on multiple devices within each facility. To increase durability, Amazon Glacier synchronously stores your data across multiple facilities before returning SUCCESS on uploading archives. Glacier performs regular, systematic data integrity checks and is built to be automatically self-healing. So, to address your first question, we performs checks frequently enough to ensure that we meet our design goal of 11 9s of average annual durability for an archive. In the very unlikely event that it is determined that one of your archives is not recoverable, we would contact you promptly."

Re:BTRFS filesystem by nhat11 · 2013-12-11 02:38 · Score: 1

I don't think it's justify to call someone a "dick" when someone didn't used a derogatory word at you. You need to calm down a bit there

Re:Excellent question by QuietLagoon · 2013-12-11 02:54 · Score: 1

Ahhh... a sample size of one. I understand now.

Re:Excellent question by sandytaru · 2013-12-11 03:11 · Score: 1

For someone who is simply storing large volumes of media, however, CrashPlan works out well. I forgot that we selected it for the backup system of the media server we installed for my senior project in my master's degree for our client. They needed to store about 600 GB of pictures and movies. A once daily backup is just fine for them - but I think we still negotiated a full Pro package for the other features.

--
Occasionally living proof of the Ballmer peak.

Re:uhuh by bwroga · 2013-12-11 03:46 · Score: 1

What command are you referring to?

Re:Excellent question by lgw · 2013-12-11 06:22 · Score: 1

If you don't trust the judgment of senior engineers, you won't get very far in life. When you need solutions that work in practice, turn to those who have been practicing for a while.

--
Socialism: a lie told by totalitarians and believed by fools.

Re:Excellent question by lgw · 2013-12-11 06:25 · Score: 1

Why so? Do you have contrary experience you'd like to share? Care to join in a discussion?

--
Socialism: a lie told by totalitarians and believed by fools.

Re:Excellent question by lgw · 2013-12-11 06:27 · Score: 1

There are certainly write errors - but that's not bitrot, that data was bad from the beginning (which is the true explanation for almost everything called bitrot). You can always get bus errors and whatnot, but those are transient errors, and the read is quite likely to succeed on the next try.

--
Socialism: a lie told by totalitarians and believed by fools.

rsync -c by patniemeyer · 2013-12-11 06:50 · Score: 1

I have a pair of 4TB disks that I keep cloned with rsync. Periodically I verify the contents using rsync -c, which forces rsync to do a full checksum on the files. A few times a year this will identify a file that is actually corrupt and I'll manually recover it from the good copy.

Re:BTRFS filesystem by jbo5112 · 2013-12-11 07:27 · Score: 1

I have seen both Dell RAID-5 and Sun RAID-6 arrays fail with 3+ simultaneous disk failures each. Google ran a Petabyte Sort benchmark in 2008 (6 hours to sort 10 trillion 100-byte records) and was not at all surprised that they had at least one hard drive failure on every attempt (4+ drive failures per day). I have seen enterprise tape systems fail to read their data (hopefully there was redundancy, but I don't know). I have seen backup systems have major performance glitches and fail to restore within their needed time frame. Facebook, for example, only has a few seconds to recover from a failed server before customers might get angry, and has built systems to handle it because it's necessary to provide a good service. The major players who are succeeding and profiting at giving away free services to hundreds of millions plan that all data storage will fail regularly, and plan accordingly.

A little primer for those of us who haven't kept up with new storage technologies since the 90's.

Google deals with enough data that they cannot consider any of your technologies reliable enough. Five years ago, they were already processing 20PB of data every single day with map reduce, and if you have to buy enough systems, even the best RAID6 SAN systems will break regularly. Statistically, a small chance repeated often enough gives you a virtual guarantee of probability. Google generally doesn't bother with expensive technologies like SAN's and RAID, or even bother with enterprise drives (spinning disks -- they probably use an enterprise PCIe flash). You can make what you want of the enterprise drive decision, but I'm pretty sure I've read from at least a couple of sources that enterprise drives are just as prone to failure as regular drives. The major differences are warranty and firmware (e.g. supporting RAID friendly reads). Numerous sources have substantiated that the manufacturers' MTBF numbers are pure marketing fiction. They probably boast a lower error rate, but I have not seen a comparison, only reports that they are off by several orders of magnitude.

What Google does is avoid any redundancy in their machines and take the "redundant array" to a whole new level: Redundant Array of Inexpensive Servers. Multiple copies of the data are written to different servers in different cabinets, and with each data block a checksum is stored. Every time the data is read, the checksum is verified. This way you know with 1 single read if you have bitrot, and can correct it with 1 good read. Now you no longer have to keep comparing 3 copies of the data to correct bitrot. The Hadoop project copied this with their HDFS, and many other large scale technologies have followed suit.

At a desktop level, ZFS, BTRFS and (I think) Windows Storage Spaces do something similar, combining RAID technology (0/1/5/6 maybe 1E) with checksums inside the file system. If a drive fails or even just that the checksum doesn't verify there can be redundancy to attempt to rebuild from automatically in the file system, giving you a better data guarantee than any RAID card I have seen. If the journaling is done correctly, it shouldn't be susceptible to losing data from a power loss either, but home battery backups aren't too expensive. The OP was asking specifically about bitrot. A lot of URE's (uncorrectable read errors) get labeled and treated as bitrot, but it sounds like data he has previously verified is now corrupt (actual rot), not that the reason for corrupt blocks matters once they are corrupt. Bitrot happens more frequently when you don't have such stringent environmental controls in your home as you would in a data center, and I have personally seen it with only 10's of GB of my data.

In my experience, data that is backed up and archived, isn't a prime target for user error nor gross negligence regarding data backups. The user is definitely experiences some sort of URE. In this case, a proper file system is quite important for protecting the data. I would recommend setting up a multi-drive NAS using

Re:Excellent question by MarkTina · 2013-12-11 08:18 · Score: 1

What's the point in joining in ? To me it's obvious you have no clue about data storage and magnetic media in general, so no matter what I say you won't agree, I only chipped in earlier because I thought your statement was so funny!

Re:Excellent question by lgw · 2013-12-11 09:35 · Score: 1

Come now, do explain the process by which GMR media loses its data integrity over time. I'm all ears.

Write errors happen, transient data transfer errors happen, bad sectors (bad from day 1) happen, mechanical failures happen, sure, but none of that is "bitrot".

--
Socialism: a lie told by totalitarians and believed by fools.

Re:Excellent question by rdnetto · 2013-12-12 00:07 · Score: 1

Which version of the kernel and btrfs-progs are you using? Some distros are still shipping ancient versions of the userspace tools, like 0.19 or 0.20. The latest is 3.12 (they recently started using the kernel version instead), so you may want to try compiling it from the source.
The two most helpful commands I've found are 'mount -o recovery', which can restore the superblock if it's missing/corrupted, and 'btrfs check --repair' (formerly btrfsck). Note that check doesn't actually fix the errors it finds without that flag, unlike fsck. If you have a multi-device file system, trying to mount one of the other drives can help, since copies of the metadata are stored on all of them (RAID1 style).
If that doesn't work, you can often get the data off by mounting it as readonly, or by using 'btrfs restore'.

Btrfs used to be quite buggy, but these days I've found it to be pretty stable and reliable. That only applies if you're using the latest packages though - otherwise, you might as well be using it back in the early days.

--
Most human behaviour can be explained in terms of identity.

BT by hicksw · 2013-12-12 02:36 · Score: 1

Bit torrent?
Set up your very own very private tracker(s).
Create a torrent of the file trees to be duplicated and protected on the original host.
Leech it at all the redundant sites.
Wait for them all to complete the download and become seeds.
From time to time, but not all at the same time, force a recheck on each member of the swarm, to detect corruption
A failure should trigger a download to correct the corrupted block from the swarm.

You can probably get better advice on how to handle a growing archive.
I would probably try to add another torrent of the added files, then
wait for the swarm to download those files.
Then create a new torrent file that includes the old and the new in a single torrent and use that for the next forced recheck cycle.

You probably want to have a few scripts to automate the rechecks and updates.
--
The world is coming to an end, but don't stop seeding

Re:BTRFS filesystem by DamnStupidElf · 2013-12-12 20:57 · Score: 1

Yes, but earlier systems, which the OP was suggesting could be used for this purpose, lacks that functionality. Also, please reset your sarcasm detector, it appears to be out of alignment -- a functional detector would have pinged on "Raid 9 Million(tm)".

Apparently ReFS will have data and metadata checksums which combined with storage spaces could detect and correct bit rot if implemented properly. While I have no idea if the OP researched the actual capabilities of ReFS, with checksums it is possible to detect bit rot without parity, and correct it with an extra (good) copy. Sarcasm is fun, but only if it's accurate. You might argue that checksums are just a form of parity and maybe I'd agree with you since apparently the error-correction codes for RAID-6 are generally referred to as parity despite actually being linear error-correction codes. But the sense I got from your comment was that you didn't believe it was possible to prevent bit rot with just two copies of checksummed data, or by storing a single copy with an error-correcting code.

Correct, and those that are aren't immune to human stupidity. No filesystem can save you from a guy who decides to pour beer into the storage array, or who goes to move a directory and misclicks sending it to the trash. Disaster recovery is not a simple matter of choosing the right filesystem and then patting yourself on the back. It requires careful planning and consideration... None of which the majority of the people on this thread seem to be capable of. At least you seem to have some grasp of the underlying technology.

Most of your other points were spot-on. Relying on single storage systems that aren't geographically distributed is just asking for trouble. Not keeping administratively separate backups or immutable version history (read-only snapshots, revision control, etc.) is also a quick way to lose your data. I don't think there are any foolproof solutions you can get at the moment. Replicated git repos are close, but there was that KDE fiasco with git not explicitly checking the cryptographic hashes during all of its operations and allowing bitrot to be replicated to other repositories. Dumb. I have never been a fan of the Linus/Linux philosophy of trusting the hardware to provide 0 bit errors per yottabyte. It's just not realistic. Of course that means that the next step will be implementing lock-step (or at least consistency-point comparison) processing in software to work around CPU/RAM errors...

Re:Look to the past by Venotar · 2013-12-17 15:08 · Score: 1

Tape MUST be sufficiently stable. Reading the reliability specs off the box in front of me and running a few calculations shows that

You didn't use sarcasm tags and sometimes the subtler jokes are a tad hard to discern in text.
You are joking, aren't you? Because if not, have I got a great deal for you - I just need your bank account to transfer the money my uncle, a Nigerian prince, is trying to export. PM me!

Slashdot Mirror

Ask Slashdot: Practical Bitrot Detection For Backups?

200 of 321 comments (clear)