Ask Slashdot: Practical Bitrot Detection For Backups?
An anonymous reader writes "There is a lot of advice about backing up data, but it seems to boil down to distributing it to several places (other local or network drives, off-site drives, in the cloud, etc.). We have hundreds of thousands of family pictures and videos we're trying to save using this advice. But in some sparse searching of our archives, we're seeing bitrot destroying our memories. With the quantity of data (~2 TB at present), it's not really practical for us to examine every one of these periodically so we can manually restore them from a different copy. We'd love it if the filesystem could detect this and try correcting first, and if it couldn't correct the problem, it could trigger the restoration. But that only seems to be an option for RAID type systems, where the drives are colocated. Is there a combination of tools that can automatically detect these failures and restore the data from other remote copies without us having to manually examine each image/video and restore them by hand? (It might also be reasonable to ask for the ability to detect a backup drive with enough errors that it needs replacing altogether.)"
http://www.quickpar.org.uk/
http://chuchusoft.com/par2_tbb/
One single cmd will do that,
zpool scrub
I don't know if there's a better solution, but you could store checksums of each archived file, and then periodically check the file against its checksum. It'd be a bit resource intensive to do, but it should work. I think some advanced filesystems can do automatic checksums (e.g. ZFS, BTRFS), but those may not be an option, and I'm not entirely sure how it works in practice.
If your physical media is dying, you'll get hardware errors so restore from a(nother) backup and replace the media.
If your files are being corrupted, what kind of crappy filesystem are you using to store these precious memories?!!
ZFS without RAID will still detect corrupt files, and more importantly tell you exactly which files are corrupt. So a distributed group of ZFS drives could be used to rebuild a complete backup by only copying uncorrupt files from each.
You still need redundancy, but you can get away without the RAID in each case.
There are, but you'll be paying a lot of $$$ for that kind of storage in the cloud. I get 4GB for free from DropBox. SkyDrive from Microsoft will set you back $1000/month for 2TB - DropBox is about twice that much. It's not really practical for media files.
A much better solution would be archival quality Blue-Rays. They can hold 25 GB apiece and they're supposed to last 100 years, but they really just need to last long enough until a new, even denser storage media comes along.
Occasionally living proof of the Ballmer peak.
Not all cloud storage is expensive. It's only $4 a month for unlimited backups to CrashPlan.
They also do checksums and versioning and can be set to never remove deleted files from the backup.
I have 12.8TB backed up to them and it's been working great.
Other than that, ZFS can't be beat. I use that as well.
Bitrot does happen.
When a disk has a bad block and detects that, it will try to read the data from it and put it on a block from the reserve-pool. However, the data might be bad and corrupt, so you lose data.
Disks do have a Reed-Solomon (aka par-files) index, so it can repair some damage, but it doesn't always succeed.
Anyway, what I do for important things, is have par2 blocks that go along with the data. All my photo-archives have par2 files attached to them.
I reckon you could even automate it. To have a script that traverses all directories and tries to repair the data if it's broken. If it fails, you get notified.
Well, don't worry about that. We can get you back before you leave. (Dr. Who)
First off, make sure you have a separate backup storage volume that doesn't get touched by normal applications and which keeps history. Backup doesn't protect you very much if accidental deletes or application bugs corrupt all your copies within one backup cycle. Use an appropriate backup tool to manage this, where appropriateness depends on your skill and willingness to tinker. You could use something as simple as an rsync --link-dest job, or rsync --inplace in combination with filesystem snapshots, or some backup suite that will store history in its own format.
For bit-rot protection of the stored backup data, make a backup volume using zfs or btrfs with at least two disks in a mirroring configuration (where the filesystem manages the duplicate data, not a separate raid layer). Set it to periodically scrub itself, perhaps weekly. It will validate checksums on individual file extents. If one copy of a file extent cannot be read successfully, it will rewrite it using the other valid mirror. This rewrite will allow the disk's block remapping to relocate a bad block and keep going. The ability to validate checksums is the value add beyond normal raid, where the typical raid system only notices a problem when the disk starts reporting errors.
Monitor overall disk health and preemptively replace drives that start to show many errors, just as with regular raid. Some people consider the first block remapping event to be a failure sign, but you may replace a lot of disks this way. Others will wait to see if it starts having many such events within days or weeks before considering the disk bad.
I don't think you understand what RAID is or what it does.
And thus the saga of that damned Frenchman continues
Warning for all UNIX newbies: that command will reset the file to 0 bytes. Just that you know.
(I've seen some cases when a rookie is setting up a Linux system and people jokingly throw him these "rm -rf /" commands and the poor guy actually ends up wrecking his system.)
Bitrot is a myth in modern times. Floppies and cheap-ass tape drives from the 90s had this problem, but anything reasonably modern (GMR) will read what you wrote until mechanical failure.
The key therefore is to verify as you write. Usually, verifying a sample of a few GB will let you know if everything went OK. DO your backups with checksums of some sort. A modern tape drive and backup software will do that automatically, and let you schedule a verify automatically as part of backups (2 TB? That's 1 tape - might want to consider that), though ideally you should verify a tape on a different drive than the one you wrote it on.
For disk-based backups, local or cloud, I strongly recommend archiving to a format with checksums (RAR etc) over some sort of raw file copy. Especially for anything going over the network: RAR a volume/file set locally first, then upload, then test the archive.
If you have a superstitious fear of bitrot, you can always do some random sampling of archive integrity, and keep multiple historical copies of files just in case (e.g., don't just delete backup N-1 when you do backup N, do a rotation scheme).
Socialism: a lie told by totalitarians and believed by fools.
I'm glad you're bringing this up. I haven't seen any backup software that addresses bitrot. And bitrot does happen, I lost a few pics to it. What I do: I have a monthly script that makes a RAR archive from my pictures directory. RAR checks file integrity but also has "recovery" options that allow you to recover files from a damaged archive (to a point)
{Science sans conscience n'est que ruine de l'âme}
If you really want hassle free and safe, it would be expensive, but this is what I would do:
ZFS for the main storage - Either using double parity via ZFS or on a raid 6 via hardware raid.
Second location - Same setup, but maybe with a little more space
Use rsync between them using the --backup switch so that any changes get put into a different folder.
What you get:
Pretty disaster tolerant
Easy to maintain/manage
A clear list of any files that may have been changed for *any* reason (Cryptolocker anyone?)
Upgradable - just change drives
Expense - You can build it for about $1800 per machine or $3600 total if you go full-on hardware raid. That would give you about 4TB storage after parity (4 2TB drives - $800, Raid Card - $500, basic server with room in the case - $500)
What you don't get: Lost baby pictures/videos. I've been there, and I'd pay a lot more than this to get them back at this point, and my wife would pay a lot more than I would..
Your current setup is going to be time consuming, and you're going to lose things here and there anyway.. If you just try to do the same thing but make it a little better, you're still going to have the same situation, just not as bad. In this setup you have to have like 5 catastrophic failures to lose anything, sometimes even more..
WinRAR isn't perfect, but it works on a number of platforms, be is OS X, Windows, Linux, or BSD. This provides not just CRC checking, but one can add recovery records for being able to repair damage. If storing data on a number of volumes (like optical media), one can make recovery volumes as well, so only four CDs out of a five CD set are needed to get everything back.
It isn't as easy as ZFS, but it does work fairly well for long term archiving, and one can tell if the archive has been damaged years to decades down the road.
Warning for all UNIX newbies: that command will reset the file to 0 bytes. Just that you know.
(I've seen some cases when a rookie is setting up a Linux system and people jokingly throw him these "rm -rf /" commands and the poor guy actually ends up wrecking his system.)
I think the general consensus is that if you're stupid enough to run a command you got from SomeRandomInternetAsshole420 without verifying what it will do first, you deserve to have your system wiped.
An enigma, wrapped in a riddle, shrouded in bacon and cheese
Cloud and complete security together is an oxymoron.
I want a list of atrocities done in your name - Recoil
BTRFS and ZFS both do checksumming and can detect bit-rot. If you create a RAID array with them (using their native RAID capabilities) they can automatically correct it too. Using rsync and unison I once found a file with a nice track of modified bytes in it -- spinning rust makes a great cosmic ray or nuclear recoil detector. Or maybe the cosmic ray hit the RAM and it got written to disk. So, use ECC RAM.
But "bit-rot" occurs far less frequently than this: I find is that on a semi-regular basis my entire filesystem gets trashed (about once every year or three). This happened to me just last week...my RAID1 BTRFS partitions (both of them) got trashed because one of my memory modules went bad. In the past I've had power supplies go bad causing this, or brown outs, and in other cases I never identified the cause. I've seen this happen across ext3, jfs, xfs, and btrfs so it's (probably) not the file system's fault. In such cases, fsck will often make the problem worse. (Use LVM and its "snapshot" feature to perform fsck on a snapshot without destroying the original). You'd think these advanced filesystems would have a way to rewind to a working copy (for instance in BTRFS -- mount a previous "generation") but this seems to not be the case.
Anyway, btrfs guys, your recovery tools could be a lot better. The COW enables some pretty fancy recovery techniques that you guys don't seem to be doing yet. If you've got a great btrfs or zfs recovery technique, please reply and tell us.
1^2=1; (-1)^2=1; 1^2=(-1)^2; 1=-1; 1=0.
And yet, one of FLOSS's selling points is our great community support...
You do not have a moral or legal right to do absolutely anything you want.
It depends on your storage needs. For things that you need to regularly access, Amazon S3 will cost you about $175/month for 2TB storage plus transfer fees, but is readily accessible at any time.
Amazon Glacier would only cost you $20/month for that amount of storage, but has various limitations on retrieval time (~4 hour minimum) and higher costs if you need to retrieve more data in a shorter amount of time. As the name suggests, it's designed for "cold storage".
Both offer extremely high degrees of reliability.
There's really no way around it. Storage media is not permanent. You can store your important stuff on RAID but keep the array backed-up often. RAID is there to keep a disk*N failure from borking your production storage and that's it. If you can afford cloud storage, encrypt your array contents (encfs is good) and mirror the contents with rsnapshot or rsync to amazon, dropbox, a friends raid array, whatever. SATA drives are cheap enough to keep a couple sitting around to just plug in and mirror to every weekend but you'll probably find a friend's cable modem and rsync+ssh a very handy alternative (hint: check out --bwlimit option) when run from cron.
Join the Slashcott! Feb 10 thru Feb 17!
In reality, Dropbox, Skydrive, and other cloud services should be treated as a type of media, just like BD-ROMs, tape, SDD, HDD, and even hard copy.
The trick is to use different media to protect against different things. My Blu-Ray disks protect an archive against tampering or CryptoLocker (barring a hack that flashes the BD burner's ROM to allow the laser to overwrite written sectors.) However, they have to be maintained in a good environment with a good indexing system. My files stashed on Dropbox bring me accessibility virtually anywhere... but malware that erases files could wipe that volume out in no time.
Similar with external HDDs. Those are great for dealing with a complete bare metal restore, but provide little to no protection against malware. Tape, OTOH, is expensive for the drive and requires a fast computer, but once the read-only tab is flipped or the WORM session is closed, the data is there until the tape is physically destroyed.
Of course, there is not just media... there are backup programs. This is why I use the KISS principle when it comes to backups. I use an archiving utility to break up a large backup into segments (with recovery segments to allow the archive to be repaired should media go bad), then burn the segments onto optical media.
I've found that using a backup utility can work well... until one has to restore, the company is out of business, and one can't find the CD key or serial number so the software will install. One major program I used for years worked excellently... then just refused to support new optical drives (as in ignoring them completely.) So, unless I can find a DVD drive on its antiquated hardware list on eBay, all my backups are inaccessible. I was lucky enough to find that and copy the data to a HDD, but using the lowest common denominator is a good thing.
Backups are the often neglected underbelly of the IT world. While storage, security, availability and other technologies have advanced significantly, backups on the non-enterprise level are still languishing behind in almost every way possible. It was only a few years ago that encryption became standard with backup utilities [1].
[1]: With encryption comes key management, and some backup programs make that easy, some make it incredibly hard.
So if someone doesn't have your level of expertise on a single isolated topic you automatically dismiss this person as unworthy of your company?
This is why people don't like you.
"We'd love it if the file-system could detect this and try correcting first, and if it couldn't correct the problem, it could trigger the restoration. But that only seems to be an option for RAID type systems, where the drives are colocated."
If you have ~2TB of irreplaceable memories set up a NAS with a RAID array. whilst bit-rot can be detected it can only correct itself if the file system knows what the bits should have been. To this end BTRFS and my recommendation ZFS can be set to say scan all data 1 a week/month etc and using the redundant data in the RAID array correct the 'Bit-Rot'.
I have a intel atom board in a old case with 4 drives(2x 500GB mirror and 2x 1TB mirror). I have FREENAS on this it is powered on every night by wake on lan. Backs up any new data and gets shut down. once a week it backs up new data then runs the command 'zfs scrub' this checks for bit-rot or inconsistencies in the file-system and corrects them if any are found.(can email you a warning if you want as well). This way if any files get damaged on a home pc/ laptop etc.. any user can turn on the NAS and recover there files from the shared folder.
1 point of warning ZFS is RAM hungry so 4GB is the minimum. something to keep in mind when ebaying for a old pc to use. others will also point out that file transfers are ~20-30MB/s with a low powered atom so use something with more grunt if its to be a 24/7 NAS.
A two-disk RAID1, or a RAID5, theoretically ought to be able to detect when there's corruption, but shouldn't be able to correct it. If you've got two different data values, you don't know which one is right.
But it occurs to me: RAID6 (or three-or-more disk RAID1) really ought to be able to correct. Imagine a three-disk RAID1: if two disks say a byte is 03 and one disk says 02, then 03 is probably right. RAID6, similarly, has enough information to be able to do the kinds of repairs that you could do with par2.
It'd be cool to find out this is already in the kernel's md device. Probably not so yet, though. ?
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
There's a reason so many shops have moved to disk based backups. Tape simply isn't reliable. Tape is cheap; but definitely NOT reliable.
Bitrot is a myth in modern times.
You state this without any substantiation as if it were a fact.
Bitrot is a myth in modern times. Floppies and cheap-ass tape drives from the 90s had this problem, but anything reasonably modern (GMR) will read what you wrote until mechanical failure.
This isn't just wrong, it's laughably wrong. ZFS has proven that a wide variety of chipset bugs, firmware bugs, actual mechanical failure, etc are still present and actively corrupting our data. It applies to HDDs and flash. Worse, this corruption in most cases appears randomly over time so your proposal to verify the written data immediately is useless.
Prior to the widespread deployment of this new generation of check-summing filesystems, I made the same faulty assumption you made: that data isn't subject to bit rot and will reproduce what was written.
ZFS or BTRFS will disabuse you of these notions very quickly. (Be sure to turn on idle scrubbing).
It also appears that the error rate is roughly constant but storage densities are increasing, so the bit errors per GB stored per month are increasing as well.
Microsoft needs to move ReFS down to consumer euro ducts ASAP. BTRFS needs to become the Linux default FS. Apple needs to get with the program already and adopt a modern filesystem.
Natural != (nontoxic || beneficial)
I have been going through this issue myself. In a single weekend of photo and video taking, I can easily fill up a 16 gig memory card, sometimes a 32 gig. About 10 years ago I lost about two years worth of pictures due to bitrot (ie my primary failed, and the backup DVD-Rs were unreadable after only a year - I was able to recover only a handfull of photos using disc-recovery software). Since then, I kept at least three backups, and reburning discs every couple of years. But if I can fill up two BD-Rs in a weekend, and given the high price of media, that wasn't an option. Extra harddrives?
I finally realized the best way was just to get a Carbonite account. They are about $70 a year for unlimited encrypted storage space (if you are really anal, I guess you could always put things into TrueCrypt encrypted file containers and upload them). The worst part is how long it takes to do a backup on a residental broadband line (it would also suck if your ISP has data caps). It has taken me about 2 weeks to do half a terrabyte.
The deal is, the peace of mind that comes from this is huge, and it is cheaper than buying another harddrive.
Yes, I know that is not the question you asked, but I feel like it is a much more practical alternative. I mean, as I continue backing stuff up, I am sure I will pass a terrabyte. How much are you going to pay for discs, for harddrives? Then trying to keep them safe and secure, and having to worry about bitrot?
Seriously, I've lost family pictures and videos before even though I had backups, and it sucked. Do yourself a favor and get a cloud backup. Yeah, it may take a while to do your backups and restorations, but it is worth it.
You really gotta be careful with that attitude. The photos seem worthless at the time you take them, and most of them remain worthless forever. Most of them. Then you see that old picture of when your now-grown-up dog used to be a cute little puppy, and awww!!!
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
it doesn't seem that way... http://forums.freenas.org/threads/ecc-vs-non-ecc-ram-and-zfs.15449/
M-DISC:
DVD format presently, BLU-RAY format in the future. Someday an electronic eye will just be able to look at the disc surface and see it all in one snapshot.
They aim for 1000 years. I expect 100. It may be reasonable. Just keep drives around.
http://www.mdisc.com/proving-ground/
I understood him to be commenting on the number, not the existence, of the photos. I'm the designated archivist for the family's (7 members in 2 households) photos. At last check , I have about 20k photos in the archive. It's hard to imagine having "hundreds of thousands" without having enormous amounts of redundant or irrelevant photos, which is what the parent post is poking fun of.
WARNING: DO NOT RUN ANY COMMAND IN THE PARENT, THIS COMMENT OR ANY OF THE SIBLING COMMENTS.
You really suck at being an asshole too, the right command for destroying files and being innocently obfuscated is:
dd if=/dev/zero|pv|dd bs=1024 count=$(ls -s 'filename'|awk '{print $1}' of='filename'|openssl sha1
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion. -- Spazmania (174582)
I'll be the heretic here, but on Windows 8.1 and Windows Server 2012 R2, there is a feature called Storage Spaces. It works similar to ZFS where you toss drives into a pool, then create a volume that is either simple, mirror, or with parity, and Windows does the rest. If a volume needs more space, toss some more drives in the pool.
To boot, it even offers autotiering so data can be stored on a SSD that is frequently used, or remain on the HDDs if it isn't. Deduplication is handled on the filesystem level [1].
No, this isn't a replacement for a SAN with RAID 6 and real-time deduplication, but it does get Windows at least in the same ballgame as Oracle with ZFS.
[1]: Not active deduplication. The data is initially stored duplicated, but a background task finds identical blocks and adds pointers. Of course, the made from scratch filesystem, ReFS (which has the ability to check for bit rot on reads like ZFS), doesn't have this, so one is still stuck with NTFS for this feature.
Second shoutout for Crashplan! I have eight computers backing up to one account with "unlimited" storage and versioning.
I'm curious how that is doable. Even Amazon Glacier would be about $10.24 per terabyte stored per month, so I'd be looking at about $130/month for that much info.
I am not passing judgement... just have not heard much about CrashPlan, good/bad other than a quick search on it.
Actually, that was a reply to THIS post, not the original question posted by timothy...
I really hope this discussion provides good answers, with practical solutions for Windows, IOS, and Linux... I think that this is the sort of thing that everyone could really use!
Are there cloud storage providers that can do this for the above example of an approx. 2 TB data set, and provide complete security?
I still think questions about basic data integrity, checksums, parity, ECC on disks etc. should be completely unnecessary and most certainly already be second nature to the slashdot crowd, but I guess I'm just living in the past.
Thanks for immediately jumping down my throat, though ;)
I just wish LTO drives were cheaper. Otherwise, they would be ideal for backups because they support encryption on the drives themselves. All LTO-4 tapes and newer support this, so any LTO-4 drive given the right key can decrypt another drive's tape.
Of course, WORM media is always nice, especially with malware being a constant threat.
So if someone doesn't have your level of expertise on a single isolated topic you automatically dismiss this person as unworthy of your company?
The Anonymous Cowards? Yes.
Please continue the technical discussion. Sorry for the noise.
CDRs suffer nasty bitrot. Usually most CDs made in the past 10 years. I suppose you could have vacuum sealed them, but how many people knew to do that?!! You can get medical grade gold disks, but those you have to special order (not found in your local computer store).
One of my clients geoscience data projects archived in CDRs. It's only when they went to pull them did they discover the bitrot problem. We used Nero DiskSpeed to performa surface scan. You can see entire segments where goes green (good), transitions into yellow (correctible), to red (damaged unreadable) and the back out to yellow and green again. It's the material that oxidizes. Since then, they pulled all data they could back onto disk and tape. God only knows how long that will last too.
Life is not for the lazy.
Well, BackBlaze is another similar backup company who is far more public about their costs and operations. I think they have said their customer break-even point is around 3-4TB. So if most customers have far less than that, then a few can have far more and it all works out.
http://www.wired.com/insights/wp-content/uploads/2011/10/backblaze-cost.png
It might be an overkill, but the open source backup software Bacula has a verify task, which you can schedule to run regularly. It can compare the contents of files to thir saved state in backup volumes, or it can compare the MD5 or SHA1 hashes which were saved in the previous run. I assume other backup software has similar features.
Oh, really? Is that why drive manufacturers specify a non-recoverable read error rate - typically on the order of 1 bit per 100 terabits? Let's see now. A single 4TB drive contains 32 terabits of data. So if you have three of them, either in a RAID or separately, and you try to read the entire contents, you can expect an average of one bit to be rotted permanently and lost forever. Or that bad bit could happen a lot earlier. Conceivably the first bit you try to read. Or the one millionth. And that is not considered a failed drive. You can't magically guard against these by verifying the recorded data one time, either a nominal portion or even in its entirety.
RAR's checksums will only detect errors that happen to occur when you test read the RAR archive. They won't repair it, and testing OK is no guarantee that it won't have an error the next time you read it. PAR2, on the other hand, does provide for repair.
ZFS can at least detect, and optionally repair (if you use the redundancy options) these isolated bad bits, without the necessity for any special file metadata like PAR2. Of course, there's nothing to say you can't use both ZFS and PAR2.
Bitrot is a myth in modern times.
You state this without any substantiation as if it were a fact.
And I'll counter the above. The last bitrot event I had to deal with - on current server grade (Windoze, tho) hardware was waaaay back last Friday.
Thank you. A thoughtful, concise Anonymous post... You've just restored some of my faith in the AC. ;)
Users that utilize large amounts of storage are relatively uncommon and are subsidized, in part, by users who utilize less storage. If everyone used terabytes of storage at $4/month, that wouldn't really be sustainable.
Although just a personal anecdote, I've used CrashPlan for ~4 years now (with 11 computers belonging to various family members all backing up to their service with a total of around 500GB being stored with them). Zero complaints. It's done everything I expected, always worked, and never had issues. When I had a laptop stolen and purchased a replacement, I was able to restore all the files from CrashPlan in about a day or two of downloading. I highly recommend it.
We have hundreds of thousands of family pictures and videos we're trying to save using this advice. But in some sparse searching of our archives, we're seeing bitrot destroying our memories. With the quantity of data (~2 TB at present),
As the proud owner of dozens of family photo albums, a stack of PhotoCDs etc which rarely see the light of day, the bigger challenge is whether anyone will ever voluntarily look at those terabytes of photos. Having been the victim of excruciating vacation slide shows that only consisted of 40-50 images on a number of occasions (not to mention the more modern version involving a phone/tablet waving in my face), I can only imagine the pain you could inflict on someone with the arsenal you are amassing.
"We're experiencing data going bad and not being restorable from back-ups because it just CORRUPTS itself for no visible reason" "That's a myth and doesn't actually happen."
HIV was created by racist bigots to slander blacks and homosexuals.
Support my political activism on Patreon.
Outsourced information services in general have known security concerns. That they come under a new buzzword doesn't make them less secure. Even contractors who come in and touch your systems can walk out with massive amounts of private data.
Support my political activism on Patreon.
Tape MUST be sufficiently stable. Reading the reliability specs off the box in front of me and running a few calculations shows that of all the tape operations ever done (at least for my brand of tape) there should be zero or at most one (1.3% chance) tape error in the history of all tape storage by humanity.
You hit the nail on the head. Apple should either get with Oracle and put ZFS back in the OS X kernel as the default filesystem, get with Microsoft and license ReFS. HFS+ was a good filesystem when OS X hit the market, but it has been over a decade, and everyone else has moved on.
One reason why the IT industry moved from RAID 5 to RAID 6 as a standard is because even though disk capacities are growing, but I/O is not keeping pace. So, it takes longer and longer to rebuild a drive. RAID 6 is now a must because of the length of a rebuild being so long that there is a good chance of another drive failing while the RAID array is in degraded mode. Of course, this is for tier 3 storage, but tier 2 storage is also having similar issues as well.
Don't forget the old-fashioned method: make archival prints of your photos and spread copies among your relatives. Although that isn't practical for "hundreds of thousands", it is practical for the hundreds of photos you or your descendants might really care about. The advantage of this method is that it is a simple technology that will make your photos accessible into the far future. And it has a proven track record.
Every other solution I've seen described here better addresses your specific question, but doesn't really address your basic problem. In fact, the more specific and exotic the technology (file systems, services, RAID, etc.) the less likely your data is to be accessible in the far future. At best, those sorts of solutions provide you a migration path to the next storage technology. One can imagine that such a large amount of data would need to be transported across systems and technologies multiple times to last even a few decades. But will someone care enough to do that when you're gone? Compare that to the humble black-and-white paper print, which if created and stored properly can last for well over a hundred years with no maintenance whatsoever.
Culling down to a few hundred photos may seem like a sacrifice, but those who receive your pictures in the future will thank you for it. In my experience, just a few photos of an ancestor, each taken at a different age or at a different stage of life, is all I really want anyway. It's also important to carefully label them on the back, where the information can't get lost, because a photo without context information is nearly meaningless. Names are especially important: a photo of an unknown person is of virtually no interest.
Sorry I don't have a low-tech answer for video, but video (or "home movies", as we used to call it) will be far less important to your descendants anyway.
A family archive maintained by the "tech guy/gal" in the family is also subject to failure from death or disability or the aforementioned maintainer. Any storage/backup solution should therefore be sufficiently documented (probably on paper, too) that the grieving loved ones can get things back after a year or two of zero maintenance and care of the system. That would also imply eschewing home-brew type systems in favor of using standard tools so a knowledgeable tech person not familiar with the creator's original design can salvage things in this tragic but possible scenario. Document the system so even if the family can't do it themselves, and an IT guy has to be contracted to resurrect the data, he'll have the information needed to do so.
Any system sufficiently dependent on regular maintenance by just one particular person is indistinguishable from a dead-man time-bomb.
I am not a crackpot.
100,000s -- like 300,000? More? How many of them will you actually ever look at again? Less 1% I'm guessing. Here's my advice (and it's what I do), step 1) when transferring pics to your computer, delete the ones that are out of focus, bad lighting, framed poorly, etc. This is about 15%. Step 2) once a month, go through the photos you have taken the previous month and delete those that just don't mean as much anymore (if they have decreased in emotional value in 30 days, just think how utterly worthless they would be in 5 years?). This takes care of another 30%. Step 3) once every 3 months, I and my wife pick the cream of the crop for physical prints. This is about 10%. These are stuck into photo albums, labeled and kept in a fire proof safe in our basement. So 200 photos a month, gets reduced to ~100, and then 10 per month are printed. YMMV
I've been surprised by the lack of reference of proper error checked data paths so far in these comments. I'm continually saddened by ever increasing aggressiveness in clocks and density of RAM in consumer level systems while stubbornly refusing to implement ECC. Many people are even hostile to the idea as if ECC RAM is somehow tainted.
This article points out something else I'd not even considered. A scenario where lack of ECC on a self healing file system can amplify a RAM failure to a catastrophic degree making such filesystems even riskier to run on consumer grade systems.
Thank you for sharing.
Convert photos to DNG in Adobe Lightroom and use the ability for it to check for file changes. Store on a Drobo with dual disk redundancy.
I work next to a moving and storage company. Occasionally the dumpster out back can be found unceremoniously overflowing with the contents of a forgotten storage locker. Anything of value has been teased out - you know what gets tossed? Everything else, especially photo albums, trophies, diplomas, etc.
“What is most personal is most general”— Carl Rogers
but there is a catch: to reliably detect bit-rot and other problems, you also need server-grade hardware with ECC.
ZFS (especially when your dataset-size increases and you add more RAM) is picky about that, too.
Bit-rot does not only occur in hard-disks or flash.
You should really, really take a hard look at every set of photos and select one or two from each "set", then have these printed (black and white, for extra longevity).
If this results in still too many images, only print a selection of the selection and let the rest die.
Windows 2000 - from the guys who brought us edlin
The solution to Bitrot and reading of old media is very simple and honestly I don't know why it comes up so much. Storage is DIRT CHEAP. 2TB of Data is NOTHING, you can get a 3TB+ external drive for $100 or even less on sale. Buy 3 drives, keep 1 in SAFELOCATION*, Back up to 1 drive every even week, and the second one every odd week, and once a month swap the one in the SAFELOCATION out for a local one and repeat the cycle. Increase or decrease frequency of SAFELOCATION swapping depending on level of paranoia.
There, the problem is simply and very cheaply solved and there is no level of bit rot that is going to cause all 3 of these backups to be destroyed within a 1 month time window.
* where SAFELOCATION is a off-premise location, either a close friend's house or a locked office desk or a family member's house or a safe deposit box
WARNING: DO NOT RUN ANY COMMAND IN THE PARENT, THIS COMMENT OR ANY OF THE SIBLING COMMENTS.
Unless you are working on the nsa's main database. Then you should run these commands several times, just To be sure the backup is complete. Then take a sledge hammer to the original files, for securit. And restore from the backup, to guarantee the backup worked.
Book a flight to Moscow first though
And yet, one of FLOSS's selling points is our great community support...
Every community with a notable population size is going to have its share of bad actors.
Besides, ever since you were a kid you've been taught to not trust strangers based on their word alone.
An enigma, wrapped in a riddle, shrouded in bacon and cheese
We wrote our own parallel filesystem to handle just that. It stores a checksum of the file in the metadata. We can (optionally) verify the checksum when a file is read, or run a weekly "scrubber" to detect errors.
We also have Reed-Solomon 6+3 redundancy, so fixing bitrot is usually pretty easy.
This doesn't even count the fact that optical media is still subject to the same degradation and bitrot that tape is.
And anyone who thinks electromagnetic tape is "dead" is naive or just ignorant. People have been predicting the death of tape for decades, and it's no more true today than it was in the 70's. Modern EM tape is typically rated for 15 to 30 years of retention, and as long as it is not over-exposed to moisture during storage, it has proven to be able to last that long: otherwise, the manufacturers would be out of business because the Fortune 500 and S&P 500 companies - the majority of whom backup to tape and send it off-site - would have sued them to extinction.
On the other hand, according to archives.gov:
"CD/DVD experiential life expectancy is 2 to 5 years even though published life expectancies are often cited as 10 years, 25 years, or longer. However, a variety of factors discussed in the sources cited in FAQ 15, below, may result in a much shorter life span for CDs/DVDs."
"Inveniemus Viam Aut Faciemus" 'We will find a way... Or we will make one!' --Hannibal of Carthage
It's only $4 a month for unlimited backups to CrashPlan.
Do they throttle? I looked into the one that advertises unlimited backups for $60/yr and they rate limit the connection down as you increase your data. I estimated 9 years for the first backup to complete based on published rates.
"Unlimited" - IDTIMWYTIM.
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
Not that I have seen. It maxes my 5Mbit upload and my downloads are 15-20Mbit.
ZFS has proven that a wide variety of chipset bugs, firmware bugs, actual mechanical failure, etc are still present and actively corrupting our data.
And I expect that defragging aggravates this. Read a perfectly good block of data from disk into flaky RAM, have a bit flip, and write out that corrupted data to its new location. Even if the software is verifying its likely to verify against RAM and it did successfully write what is in RAM.
And then there is over clocking. If a computer is just used for gaming, no problem. But if its used for more serious things or archiving things of value to you then you may want to pass on over clocking. Folks who say you can verify an over clocked CPU are mistaken. Its not a crash or no crash thing, at a certain unpredictable point in over clocking an unpredictable CPU instruction may simply give an incorrect result. This incorrect result could end up in your data or image. I've seen over clocked CPUs mess up a text string that is supplied by the CPU itself, CPUID's vendor string.
"IDTIMWYTIM." should be worked out to be SOMEIDIOT
There are two types of people in the world: Those who crave closure
As other people have mentioned, a lot of these errors can occur while you are actually copying the files. I have copied files and immediately executed md5sums on the source and dest files only to find differences. Unfortunately, I didn't start this practice until after I had to restore from backup only to find that some of the backup files were corrupted.
And given that this seems to be a common problem, why in the holiest of hells does the cp command not have a verify option? Yeah, it's easy enough to wrap the copy command with md5sums, but a verify option would be even easier. Throw in an auto-retry function on top of that and you'd be really cooking.
By the way, the submitter did not mention the current method of backup, but if they are using Linux with the cp command, they would be better served by moving over to something like rsync.
I've used ZFS under Linux for 5 years now for exactly this sort of thing. I picked ZFS because I was putting photos and other things on it for storage that I wasn't likely to be looking at actively and wouldn't be able to detect bit-rot until it was far too late. ZFS has detected and corrected numerous device corruption or unreadable issues over the years and corrected them, via monthly "zpool scrub" operations.
I have been backing these files up to another ZFS system off-site. But now I'm starting to look at other options because it's looking like I can begin doing it more cheaply than even my free hosting of a box I bought can provide.
Amazon Glacier reduces the cost of S3 storage by an order of magnitude, making 2TB of storage cost around $20/month. For a backup copy, it's hard to compete with this, even just buying a USB drive to stick somewhere... You do have to be careful about recovery though, they charge based on peak download speed (a very weird pricing).
What is the most practical way to maintain bitwise accuracy on a diverse set of binary data in an automated way using "diff and md5sum"?
Note that part where he was looking for an automated solution that will run itself without intervention, or a better means than hard drives...
You suggested... "Do some manual stuff using hard drives".
Right.
git annex is an open source project that lets you distribute files around various media (including external HDs, Amazon S3, SSH-connected computers, etc.). It has an fsck command for checking that your data still matches its checksums.
There's a GUI interface that makes it a lot like Dropbox, where you just add files to a folder, and they are sync'd.
It works on OS X and Linux, with an alpha for Windows.
-- rm -rf / tells you if you have root or not
Well, I did backup software and hardware for nearly 20 years. But I can't substantiate that with a link.
Socialism: a lie told by totalitarians and believed by fools.
I've investigated hundreds of cases of "bit rot" over the years in my job, and other than very weak magnetic media (or CD-Rs as someone upthread pointed out), corrupt backups were always corrupt when written. Had the poor SOB only verified his backups day 1, he'd not be in a world of shit. Every single time.
Socialism: a lie told by totalitarians and believed by fools.
The error rate from other sources (e.g. on the network copy) is far higher. If your backups are corrupt, it's almost certain they were corrupt day 1.
Test your backups after you make them: it's a cheap and easy 99% solution.
Socialism: a lie told by totalitarians and believed by fools.
Jesus Christ, take it easy, man. I was making a harmless joke that anyone who was ever forced to watch boring holiday slideshows would be able to understand. Now I'm being accused of mental health issues, not being able to procreate and whatever else.
If hundreds of thousands of family pictures doesn't seem a bit excessive to you, so be it. After all, it takes only a few weeks to sort through them. But please calm down a little and stop spamming AC troll posts.
You make a great point about CD-Rs, I guess I should have broadened my statement to "cheap-ass backup solutions from the 90s", not just floppies and tape.
Socialism: a lie told by totalitarians and believed by fools.
I used to fancy a girl who worked as a data recovery engineer. You wouldn't believe how many people hear the RAID controller alarming and get up to close the case instead of hot swapping a spare drive.. then a week later the second drive goes. She had a fanciful story about how spinning disks used to occasionally fail in such a way that a random sector would go bad, report incorrect data, and a RAID-1 mirror would "fix" it by destroying data on the other drive. She also used to tell me software RAID options had a tendency to actually beat hardware RAID options for data integrity outside of other inline failures--that is, when the system is operating under optimal circumstances, most hardware RAID systems more often self-corrupt than software RAID systems. Just an odd statistic, and I never got overall risk performance stats out of her.
Support my political activism on Patreon.
Here's a cheap easy solution (assuming you can write some basic scripts)
1. Start by taking an MD5 of all your pics.Save the results.
2. Backup everything to a 2nd drive. Take MD5s and be sure they match using basic scripts.
3. Perioducally scan drive 1 and 2 and compare against their expected MD5 value. If one has changed, copy it from the other (assuming it is still correct)
You could expand this with more drives if you are extra paranoid. You could do this cheap, check regularly, and know when bitrot is happening.
Ninjas don't carry tic tacs
I think that when writable CDs first came out, we thought that they would last forever. And in some sense they do last long enough. The other day I found a CD binder full of games and a few backups from 1996. The most surprising of all was a collection of photos that I thought had been long lost, and with a little rsync running over and over and over, I got all the files off intact and saved them to my Flickr account.
The most important thing to understand, I think, is that we have to look at digital storage as a convenient and temporary medium and that anything longer lasting would need to be hard copied. It’s not a guarantee, but it’s a better likelihood of survival. Pictures can survive by pure chance for a couple hundred years. We’re lucky if our current stuff will handle a few years, much less natural disasters and history itself.
For many, the cloud seems to be a utopia, but corporate and national politics can make all your treasured media disappear without warning, and none of the free services give you a guarantee of safety if something craps out on their systems. And as for paid cloud services, ask yourself if anyone will bother to take care of it after you’re gone, or if anyone will bother to archive it, or if your family will just toss it aside even if they are able to get them as part of your estate. Ask yourself who you’re saving all that for. Are we just digital hoarders?
"Beware of he who would deny you access to information, for in his heart, he dreams himself your master."
I was thinking that bitrot is the computer god's way to protect our descendants...
Excuse me, but please get off my Pennisetum Clandestinum, eh!
"Thanks for immediately jumping down my throat, though ;)"
Yeah. 'Cause you're the victim. WTF? Someone calls you out for being dickish, and they're jumping down your throat?
I don't know of any long-term backup solutions aside from gold CDs to be quite honest. If they're not prone to bit-rot, they media reader's interface will be obsolete on new equipment. It's doable, but not without first creating bridge solutions and data migration. The way I see it, migrate from media to media as technology progresses, or face an entire migration project later.
I suppose you could archive on flash drives, but I haven't a clue as to what the life expectancy of the flash chips are before bits start flipping randomly (gates change on die).
Life is not for the lazy.
If you're noticing data corruption on only 2TB it's probably not what we normally call bit-rot. A bit that changes state for no apparent reason within a very large set of data can be described as bit-rot, otherwise it's general data corruption which has many causes which all are understood: Poor media, poor transmission of data, overwriting of data etc. Once you've got the system sorted out so you don't get data corruption, start thinking about the nature of your data. How much redundancy is in it? If it's jpegs the almost none, so a single bit error could be serious to a file. If uncompressed TIFFs then there is a lot of data redundancy and the single bit error might only be an error of a single pixel, which you might not even notice. And finally, don't expect optical media to be safe from errors. Only use it as part of a DR plan.
As someone who has 100's of TB's of data stored in ZFS I couldn't agree more. In most cases if ZFS spits out a drive because it's convinced it's writing bad blocks, I believe it. In most cases (if it's a seagate drive) seatools backs me up on this... in several cases sea tools doing a quick check says the drive is fine... it never fails if I do a "full" scan of the drive it'll eventually throw an error.
I've found damaged SAS cables, JBOD enclosures with dodgy bridges, etc. because of ZFS.
With that all said, now that you've gone out and bought a small PC, stuffed 4, 4TB drives into it and set it up as a raid10 using ZFS you now need to ask the next question... what's more likely... I'm going to have two drives fail simultaneously or that my house is going to get hit with a {flood, lightning, fire, thieves, etc}
Honestly, I'd build two of these devices, one for local backups and I'd put one at a buddies house and do remote backups from your local device.
Yes Francis, the world has gone crazy.
Snapraid (free!) might be an option: http://snapraid.sourceforge.net/
It snapshots your data to some parity files on a separate drive. All you would have to do is occasionally copy those files offsite. Snapraid includes commands that allows you to check and fix bitrot as well.
CrashPlan could help you a lot. First, CrashPlan is a backup system, so it makes and manages a copy of your data, including every version of every file. CrashPlan addresses the bitrot problem on their side by running their own checksums on the stored files : if they detect an issue with a stored file, they will replace it with the original version, still stored on their computer. If some files get corrupted on your computer, you can restore them from CrashPlan, but you will need something on your side to tell you that something went wrong. Now, even if you realize that the file is corrupted years after it happens, you can still recover the previous non-corrupted version from CrashPlan.
Now, 2TB is a bit much to store on CrashPlan's cloud : unless you have a very fast connection (at least 100MB) it's going to take you a while to upload your data. The solution is to run your own CrashPlan PRO Enterprise server onsite (with periodical offsite backups of course). Don't be fooled by the name, it's pretty easy to set up and administer, and the licenses are fairly affordable (75$/user/year).
I've supporting CrashPlan PRO Enterprise in my company for 3 years, with 25 clients and about 1TB of data. While I'm not super-happy with the way the Code42 people run their CrashPlan business, the tech is solid. I'm kind of thinking that other backup systems work in similar ways.
Now, I hope that you'll excuse me for asking this question, but which kind of crappy file systems and hard drives are you using that generate significant levels of "bitrot" in files which are basically just sitting there?
Nobox: Only simple products.
You are missing a key ingredient: encryption.
The only way to truly prevent bitrot is by maintaining at least three complete copies of the data, and regularly compare between them.
There you go again. Acting like you know what you're talking about, but you don't.
ZFS and BTRFS have a much more efficient way to ensure correctness: CRC of everything written. That is what is checked when you do a zpool scrub or a btrfs scrub. Random errors are very unlikely to produce the same checksum, so then you only need a second copy that doesn't produce CRC errors.
Hard drives are nowhere near as reliable as their manufacturers claim. Modern drives don't store the bits that you feed them exactly as you give them. Instead, they use CRC and error correcting codes, so they only need most of the data to be correct. Usually, if the data doesn't match the CRC, and it cannot be corrected by ECC, then you get a read error instead of corrupted data. Which, I guess, is better than getting a corrupted picture. Ideally, a RAID would be able to recreate the missing block, but I can't find any reference to a RAID doing that.
But I've seen enough errors that I suspect something else is going on. It surely doesn't help that modern computers have many gigabytes of memory, but almost none have ECC on that memory. Your computer can be corrupting your data, and you have no warning that it's happening. In addition, hard drives lie. I'm not optimistic about the long-term storage of electronic data.
Have a nice time.
There is also rsbep, which uses Reed Solomon FEC. This is a classic filter, so you can use it together with tar, gzip and gpg to protect archives against NSA snooping and bit rot simultaneously.
Something like:
$ tar -cz indirectory | rsbep | gpg -e > out.tar.gz.rs.gpg
La voila!
Excuse me, but please get off my Pennisetum Clandestinum, eh!
Got to carve those pics in stone, in Egypt, else nobody will care about them later.
Excuse me, but please get off my Pennisetum Clandestinum, eh!
Some highlights:
When ZFS dies, it dies in a big and fairly comprehensive way, and ZFS will die if you under-provide it. In any event, you should RTFM before contemplating a build, and know the trade-offs you're getting in to.
Schwab
Editor, A1-AAA AmeriCaptions
http://git-annex.branchable.com/
Sounds right to me - and there are sadly still people who need to be told "RAID is not backup".
Socialism: a lie told by totalitarians and believed by fools.
LTO has 30-year media easily available, and there's a lot of basis for tape for judging the real lifetime, since the technology has been around forever. For modern archive-quality tape, the backing will fail before the magnetic media. For normal LTO tape different manufacturers make different claims, but more than 10 years is normal. Insuring you can still read the tape is of course a different challenge, but the drives try to be backwards compatible for a while (and the drives are fairly robust when in limited use). Fortunately, connection interfaces seem to be slowing their rate of change - a PCIe card will likely find a slot in servers for years to come, and SAS will also likely be around for quite some time, though the cards may get pricey if they become legacy-only.
Socialism: a lie told by totalitarians and believed by fools.
There you go again. Acting like you know what you're talking about, but you don't. ZFS and BTRFS have ...
Exactly dick to do with what I said. The filesystem doesn't matter. The operating system doesn't even matter.
Modern drives don't store the bits that you feed them exactly as you give them. Instead, they use CRC and error correcting codes, so they
... Which again counts for exactly dick. I'm talking about infrastructure and architecture, while you're blubbering on about the hardware.
Which, I guess, is better than getting a corrupted picture. Ideally, a RAID would be able to recreate the missing block, but I can't find any reference to a RAID doing that.
That's because you have no experience as a network administrator in a professional environment. Because then you'd know that's the very thing RAID was designed to do: Recover from hardware failure, which includes sectors becoming unreadable. You are clearly confused both which what level of abstraction is being discussed (architecture versus hardware), as well as the different types of failure modes each of these solutions presents. Bit rot is a physical process that occurs in all magnetic media, and at sufficiently small-scale, can also affect non-persistent storage such as RAM.
It surely doesn't help that modern computers have many gigabytes of memory, but almost none have ECC on that memory.
That's because ECC adds an extra layer of complexity to solve a problem that doesn't occur very often in computers, and when it does, the most severe consequence is usually that the computer crashes or behaves abnormally. For residential, and even most commercial uses, ECC memory just isn't needed. But for a select few use scenarios where data integrity is absolutely critical -- such as, say, nuclear power plants, air traffic control systems, certain types of hospital equipment, or financial processing systems, the added cost is justified because they need high availability/high reliability of those systems. It's also used in certain aerospace applications because the physical mechanism that causes bitrot -- high energy radiation, increases quite a bit at higher altitudes, and in space increases several orders of magnitude -- and if you're going to put something in geostationary orbit, it then takes the full brunt of solar radiation with no mitigation. Correcting for memory problems in these situations is better done at the hardware level; hence ECC memory.
Your consumer-grade computer's memory is a piece of shit. It's made with commodity capacitors and ICs that are stamped out in bulk for super cheap. And, big surprise -- super cheap doesn't mean super reliable. But we don't need super reliability -- when our system shows obvious signs of a failing memory stick, we just drive to the store, plunk down a $20 and abscond with a new one. Problem solved.
I'm not optimistic about the long-term storage of electronic data.
That's because, as previously pointed out, your experience comes from consumer-grade hardware that you don't fully understand the design considerations made. NASA has had great success in the long-term storage of magnetic media -- in fact there was an article not long ago about how they had to reverse-engineer equipment designed during the 1960s for the Apollo program to recover data on tape reels, when they lacked the original equipment it was recorded from. They discussed how the tapes themselves had become brittle and the ferrous oxide would actually peel off in chunks while reading, much like how paint peels off a house, but they were able to recover this data anyway. The technology we have today is far more sophisticated and unlike old tape-technology doesn't require physical contact with the source media to read it. There are companies like OnTrack that specialize in data recovery from harddrives and boast a rema
#fuckbeta #iamslashdot #dicemustdie
Try again, but this time with subdirectories
PAR2 with subs: Multipar and alternate
I've been using it for well over a year, it works great. Was using this for a while -- it's OK, but Multipar is much better.
Or just continue to use PAR on single directories with subs placed in some type of archive (zip, 7z, tar) file.
None of these holds a candle to ZFS as a live file system, but these all work great when archiving files to DVD/BD.
Heck, I'm currently copying multiple dirs to BD and using Multipar as "only" a checksumming and renaming repair tool -- not even bothering with the file content recovery option. For that matter, I've even created a (single) disc with 300% recovery -- if I lose all of the primary files and over half of the recovery content bits, I can STILL recover the contents. (I've tested this by manually damaging the file contents. I have multiple copies in different places, too -- there are just a few static files that I do *NOT* want to lose.)
If the universe is someone's simulation -- does that mean the stars are just stuck pixels?
Oh my god she said that FIVE TIMES EVERY DAY!
Support my political activism on Patreon.
You might try to backing up with http://www.mdisc.com/what-is-mdisc/ I've been using them since they came out and all my backups still work. It is supposed to last a thousand years. I don't know about that, but they do seem to be better than backing up to regular dvd which I have had go bad in as little as a year.
No sigs in BETA. Beta SUCKS.
A rite of passage? You must be joking! I've never met anyone stupid enough to have actually run that command with those parms. The first time someone tried that on me, I did a 'man rm' and looked the doc. I always thought that was the lesson; RTFM.
The only way to truly prevent bitrot is by maintaining at least three complete copies of the data, and regularly compare between them.
Erm, no. Hamming(7,4) doesn't even need double the space, and that was 60 years ago.
ID: the nose did not occur naturally, how would we wear glasses otherwise? (apologies to Voltaire)
And I have my FreeBSD server acting as a local backup with ZFS backed storage.
So if I do need something, I just grab it back local.
Never answer an anonymous letter. - Yogi Berra
In general ISPs didn't ever have unlimited. They advertised unlimited and then knocked people off if they passed some secret unpublished limit.
The difference now is that they no longer advertise a lie and they have published and trackable limits. The only issue is that the limits are in many cases absurdly low but otherwise it's a better practice than what they were doing before.
RAID10 and similar systems are two RAID5 systems which are independent and regularly compare data; These can detect which system is inconsistent, so you will always have at least one copy of your data in a consistent state.
You were doing quite well up until you said that sentance .....
Without parity checking, you simply aren't addressing bit rot. Period. It could be Raid 9 Million(tm) and if all it's doing is copying the data, and not comparing it, bit rot will still proceed apace, silently eating your data. But let's say you're a good administrator that has enabled parity. Great! But there's still a problem: parity cannot restore data that has become corrupted due to bit rot -- it is a detection-only mechanism.
This is incorrect for Reed-Solomon based RAID (levels 6 and higher such as RAID Z3). RAID6 can correct bit rot on a single disk and in general for t parity disks, floor(t/2) random errors per RS code can be corrected. All the RS-based RAID systems I've seen essentially store the RS code across devices using a GF(2^8) code, meaning that up to an entire byte could be corrupted by bit rot at a given logical address across all the stripes and still be corrected. All the details are on Wikipedia. Not all RAID-6+ implementations actually check the parity when reading, and I have no idea how many can solve the error locator polynomial for each RS code to actually identify and correct bit rot in multiple locations in different codes versus just dealing with known bulk errors (e.g. failed disks).
Now that I've explained all the ways that you're wrong, let me say that bit rot is probably not the cause of the OPs problems. Infact, USB devices are well-known for corrupting filesystems because of spontanious disconnects, power loss events, etc., and this is simply what can be expected in a typical residential environment. Even a RAID configuration in a residential environment isn't invulnerable to the "write hole" problem -- where data is partially committed to disk, but then the array suffers a power loss event.
Any proper file system will have a large enough transaction/intent log that can be replayed to correct partial data/metadata writes due to power failure and the RAID write hole, etc.. Most file systems in use are not proper, of course, but at least a few are available.
Bitrot is a myth in modern times. Floppies and cheap-ass tape drives from the 90s had this problem, but anything reasonably modern (GMR) will read what you wrote until mechanical failure.
Well fingers crossed you are not the storage admin for anyone I deal with!
My understanding is that Storage Spaces is (as he says) MS's version of ZFS - does it not have the same data-checking features/ performance hit that 'regular' ZFS does?
No, it does not have the same data-checking features. Yes, it has a performance hit. Worst of both worlds. I've used it, and junked it as it was literally an order of magnitude slower than RAID5 via mdadm on Linux and didn't actually add any resiliency over RAID5 or flexibility as to grow an existing pool, you need to add multiple similarly sized drives since it doesn't rebalance. This is despite their marketing claims that you can add mismatched drives in an ad hoc fashion and have it "just work".
The only way to get Microsofts unproven resiliency benefits is to use ReFS in conjunction with mirroring (not parity) on the expensive server editions. Windows 8/8.1 does not support ReFS.
Honestly, I'd build two of these devices, one for local backups and I'd put one at a buddies house and do remote backups from your local device.
Oh what I'd do for usable upload bandwidth and reasonable data caps...
Test your backups after you make them
Obviously.
it's a cheap and easy 99% solution
It's not a solution. It's a bare minimum requirement that doesn't solve for bitrot.
Well, maybe I don't understand what you mean by "bitrot". GMR media doesn't "rot" in the classic sense of bits flipping over time (well, not in human-scale time), the way that happened with floppies and QUIC tape. If you're adding some new meaning to that term, you'll need to explain it.
But if your talking about odd disk failures: as I said at the top of the thread, if you're using disk, archive stuff in RARs (or other checksummed archives), test those checksums from time to time, and don't purge old backups the moment you make new ones. Or just use tape and you're fine, at least until it gets hard to find a drive old enough to accept the tape (10+ years).
Socialism: a lie told by totalitarians and believed by fools.
One quick note: a mirrored space running ReFS will do automatic checksumming and scrubbing. This isn't done for parity spaces, though I'm not sure why this is.
http://blogs.msdn.com/b/b8/archive/2012/01/16/building-the-next-generation-file-system-for-windows-refs.aspx
This is incorrect for Reed-Solomon based RAID (levels 6 and higher such as RAID Z3). RAID6 can correct
... Yes, but earlier systems, which the OP was suggesting could be used for this purpose, lacks that functionality. Also, please reset your sarcasm detector, it appears to be out of alignment -- a functional detector would have pinged on "Raid 9 Million(tm)".
Any proper file system will have a large enough transaction/intent log that can be replayed to correct partial data/metadata writes due to power failure and the RAID write hole, etc.. Most file systems in use are not proper, of course, but at least a few are available.
Correct, and those that are aren't immune to human stupidity. No filesystem can save you from a guy who decides to pour beer into the storage array, or who goes to move a directory and misclicks sending it to the trash. Disaster recovery is not a simple matter of choosing the right filesystem and then patting yourself on the back. It requires careful planning and consideration... None of which the majority of the people on this thread seem to be capable of. At least you seem to have some grasp of the underlying technology.
#fuckbeta #iamslashdot #dicemustdie
Right. The physical structure and materials used for stamped vs "burned" DVD/BR media are completely different. The photosensitive "burned" media can't be considered to have any useful permanence.
However, the biggest problem we face with any of these discs, is what hardware we will use to gain access to the encoded data on them? PATA is effectively dead, yet not even 10 years since then we'd have some difficulty reading data from a PATA drive just because the connector is uncommon. What about in another 10 years? In 20 years will there be any mainstream computers using USB at all? What about in 50 years? If we need to keep weird ancient junk around just to extract data from disks or discs, then the plan has failed. Pretty much from the outset for mortal consumers, a do it yourself digital archive is a recipe for a data recovery project in the future.
The next time you want to slam someone for "acting like you know what you're talking about", don't respond with a bunch of links to Wikipedia. Links, I might add, that are only marginally-relevant to the topic at hand. That shit wouldn't fly in college, so why do you think it's going to hold weight in a professional environment?
Slashdot, a 'professional environment'? As if we needed more proof that you're a fucking lunatic...
Jesus was all right but his disciples were thick and ordinary. -John Lennon
BTRFS needs to become the Linux default FS.
I just lost my wife's BTRFS partition yesterday after a hard-reset. Consulted Google for btrfs repair options and discovered they are lacking. Kept reporting root->node assertion failed, whatever that's supposed to mean. I don't recall the last time I've lost a partition like this, I assumed fsck would have done the trick.
See https://btrfs.wiki.kernel.org/index.php/Btrfsck :
Note that while this tool should be able to repair broken filesystems, it is still relatively new code, and has not seen widespread testing on a large range of real-life breakage. It is possible that it may cause additional damage in the process of repair.
The only way to truly prevent bitrot is by maintaining at least three complete copies of the data, and regularly compare between them
Disagree. I've had an idea for a while that I'm surprised backup vendors don't do: two copies with a check sum and automatic restore*. The two copies and a check sum are a variation of the three-copy idea, but without the third copy. (I'd write a backup program myself with this idea except it would take too long to implement all of the ideas I have that I think every home backup program should have. The backup programs on the market are getting better, but they could still stand a few more improvements like this idea.)
My idea: On the first backup, the original copy on the hard drive gets backed up to the USB backup drive along with a check sum. (Despite your concerns about USB, I believe the original poster is talking about home use and can't really avoid this without significant costs.) When the backup is run a second time (like a day or week later), the original on the hard drive is compared to what is on the backup. Check sums are also performed. If something doesn't match, then you know you have bit rot. The check sum will determine whether the backup or the original is invalid and the program will then take appropriate action all without asking the user.
*Of course, Microsoft had to monkey up the works with using this idea. When you merely open an Excel file, it will modify the contents of the file. Very little can be found on this phenomenon, but here is something about it from Microsoft. Through personal experience, I have found it does not change the modified date and time after the file is closed, but it does modify contents. (I discovered this while playing with a prototype of my idea.) This fits with what they say in the link I provide, but it's not exactly the thing that jumps out at you after the first or second read. When only a single user uses the file, this phenomenon is not seen, although I suspect that Microsoft writes to the file then as well -- an idea which I absolutely hate. Truecrypt is also guilty of this, but at least it does it on purpose, it is documented, and you can turn it off. For security reasons, there is a setting that allows changes to a truecrypt container without changing the modified date and time marks of the truecrypt container file.
I know, I shouldn't respond to a troll, but I'm feeling generous today.
There you go again. Acting like you know what you're talking about, but you don't. ZFS and BTRFS have ...
Exactly dick to do with what I said. The filesystem doesn't matter. The operating system doesn't even matter.
Um, excuse me? The filesystem absolutely does matter. Traditionally, the filesystem assumes that any data retrieved from the drive has been put there, earlier. Obviously, drives don't do that 100% reliably. It's an important innovation, that these newer filesystems will add their own checksums to the data that they write, so they can detect and sometimes fix corrupted reads.
I'm talking about infrastructure and architecture, while you're blubbering on about the hardware.
Get your head out of the clouds. Everything does come down to hardware. In fact, given your other posts about hardware, I sometimes doubt that you actually interact with the hardware that you talk about.
Ideally, a RAID would be able to recreate the missing block, but I can't find any reference to a RAID doing that.
That's because you have no experience as a network administrator in a professional environment. Because then you'd know that's the very thing RAID was designed to do: Recover from hardware failure, which includes sectors becoming unreadable.
That's an aspect of software. Of course a RAID with sufficient parity will recover from a total drive failure. It's much harder to find reference to how a particular RAID will respond to intermittent errors. But if you're not just a blowhard, I'd like to see some of your links to documents describing how the RAIDs that you know will handle drive read errors. Not total failures. Just read errors.
Speaking of RAID, ZFS has its own concept of RAID that supports up to triple parity, with a different architecture than a normal storage system. Still, I haven't found any reference to how it handles drive read errors.
It surely doesn't help that modern computers have many gigabytes of memory, but almost none have ECC on that memory.
That's because ECC adds an extra layer of complexity to solve a problem that doesn't occur very often in computers, and when it does, the most severe consequence is usually that the computer crashes or behaves abnormally. For residential, and even most commercial uses, ECC memory just isn't needed. But for a select few use scenarios where data integrity is absolutely critical -- such as, say, nuclear power plants, air traffic control systems, certain types of hospital equipment, or financial processing systems, the added cost is justified because they need high availability/high reliability of those systems.
What a horrible attitude to data integrity. Computer crashes, I lose data. Computer behaves abnormally, worst case scenario is it calculates some important thing wrong, say the root of an important filesystem B-tree, and the filesystem needs to go through an expensive repair. My data are important to me. I use my computer for my personal financial processing, and I know I'm not alone. My old computer had an extra 128kB of memory to provide parity checks for the other 1MB. I imagine that stupid traditions of cost-cutting are why my new computer does not have 2GB of memory to provide ECC for the other 16GB.
Your consumer-grade computer's memory is a piece of shit. It's made with commodity capacitors and ICs that are stamped out in bulk for super cheap.
And your server memory isn't? Back up a moment... I thought OP was talking about being able to detect bitrot in family photos, and now you're telling him he should buy a server with memory lovingly crafted for high reliability? Which reliability i
Have a nice time.
I know it's not really an answer to your question since it's not done, but I started a tool to save and check metadata of files:
https://github.com/shane-kerr/fileinfo
Right now it just outputs a file with all of the meta-data (including SHA-224 hash of the file contents). If you think this seems interesting, I can whip up the part that uses that file to check the meta-data this weekend.
Since you're an experienced ZFS user, do you have any recommendations for how to sync the systems described below?
I have a setup simliar to the one you describe. One box at work with 2x3TB with ZFS and mirroring (raid1), similar box at home. The box at home is fairly recent, so I haven't gotten a good system for synchronizing them yet. My internet at home is 50/10 Mbps, work is much faster. The idea is that I backup both my personal photos (originates on home box, usually ~10 GB a month) and my work data (created on the work box, usually a steady stream of 1 GB per week and bursts of 10-50 GB occasionally). If possible I would like to have some directories on the work box that are not synchronized to the home box.
If the fact that both computers are sources of new data is a problem, I guess it's possible to modify that workflow.
And any other recommendations for ZFS? I scrub the pools weekly, but otherwise treat it as zero-maintenance.
for i in `facebook friends "=bday" 2>/dev/null | cut -d " " -f 3-`; do facebook wallpost $i "Happy birthday!"; done
I use the MD5 solution mentioned above, but also back everything up to Amazon Glacier. From what I've read, retrieving your data can be a pain, but storage is only $1 a gigabyte per month and they say that they store multiple copies across multiple locations and periodically check for data integrity. If data integrity is lost, they repair it using the other copies. I asked them how often data is checked for integrity and they said:
"Amazon Glacier is designed to provide average annual durability of 99.999999999% for an archive. The service redundantly stores data in multiple facilities and on multiple devices within each facility. To increase durability, Amazon Glacier synchronously stores your data across multiple facilities before returning SUCCESS on uploading archives. Glacier performs regular, systematic data integrity checks and is built to be automatically self-healing. So, to address your first question, we performs checks frequently enough to ensure that we meet our design goal of 11 9s of average annual durability for an archive. In the very unlikely event that it is determined that one of your archives is not recoverable, we would contact you promptly."
I don't think it's justify to call someone a "dick" when someone didn't used a derogatory word at you. You need to calm down a bit there
Ahhh... a sample size of one. I understand now.
For someone who is simply storing large volumes of media, however, CrashPlan works out well. I forgot that we selected it for the backup system of the media server we installed for my senior project in my master's degree for our client. They needed to store about 600 GB of pictures and movies. A once daily backup is just fine for them - but I think we still negotiated a full Pro package for the other features.
Occasionally living proof of the Ballmer peak.
What command are you referring to?
If you don't trust the judgment of senior engineers, you won't get very far in life. When you need solutions that work in practice, turn to those who have been practicing for a while.
Socialism: a lie told by totalitarians and believed by fools.
Why so? Do you have contrary experience you'd like to share? Care to join in a discussion?
Socialism: a lie told by totalitarians and believed by fools.
There are certainly write errors - but that's not bitrot, that data was bad from the beginning (which is the true explanation for almost everything called bitrot). You can always get bus errors and whatnot, but those are transient errors, and the read is quite likely to succeed on the next try.
Socialism: a lie told by totalitarians and believed by fools.
I have a pair of 4TB disks that I keep cloned with rsync. Periodically I verify the contents using rsync -c, which forces rsync to do a full checksum on the files. A few times a year this will identify a file that is actually corrupt and I'll manually recover it from the good copy.
I have seen both Dell RAID-5 and Sun RAID-6 arrays fail with 3+ simultaneous disk failures each. Google ran a Petabyte Sort benchmark in 2008 (6 hours to sort 10 trillion 100-byte records) and was not at all surprised that they had at least one hard drive failure on every attempt (4+ drive failures per day). I have seen enterprise tape systems fail to read their data (hopefully there was redundancy, but I don't know). I have seen backup systems have major performance glitches and fail to restore within their needed time frame. Facebook, for example, only has a few seconds to recover from a failed server before customers might get angry, and has built systems to handle it because it's necessary to provide a good service. The major players who are succeeding and profiting at giving away free services to hundreds of millions plan that all data storage will fail regularly, and plan accordingly.
A little primer for those of us who haven't kept up with new storage technologies since the 90's.
Google deals with enough data that they cannot consider any of your technologies reliable enough. Five years ago, they were already processing 20PB of data every single day with map reduce, and if you have to buy enough systems, even the best RAID6 SAN systems will break regularly. Statistically, a small chance repeated often enough gives you a virtual guarantee of probability. Google generally doesn't bother with expensive technologies like SAN's and RAID, or even bother with enterprise drives (spinning disks -- they probably use an enterprise PCIe flash). You can make what you want of the enterprise drive decision, but I'm pretty sure I've read from at least a couple of sources that enterprise drives are just as prone to failure as regular drives. The major differences are warranty and firmware (e.g. supporting RAID friendly reads). Numerous sources have substantiated that the manufacturers' MTBF numbers are pure marketing fiction. They probably boast a lower error rate, but I have not seen a comparison, only reports that they are off by several orders of magnitude.
What Google does is avoid any redundancy in their machines and take the "redundant array" to a whole new level: Redundant Array of Inexpensive Servers. Multiple copies of the data are written to different servers in different cabinets, and with each data block a checksum is stored. Every time the data is read, the checksum is verified. This way you know with 1 single read if you have bitrot, and can correct it with 1 good read. Now you no longer have to keep comparing 3 copies of the data to correct bitrot. The Hadoop project copied this with their HDFS, and many other large scale technologies have followed suit.
At a desktop level, ZFS, BTRFS and (I think) Windows Storage Spaces do something similar, combining RAID technology (0/1/5/6 maybe 1E) with checksums inside the file system. If a drive fails or even just that the checksum doesn't verify there can be redundancy to attempt to rebuild from automatically in the file system, giving you a better data guarantee than any RAID card I have seen. If the journaling is done correctly, it shouldn't be susceptible to losing data from a power loss either, but home battery backups aren't too expensive. The OP was asking specifically about bitrot. A lot of URE's (uncorrectable read errors) get labeled and treated as bitrot, but it sounds like data he has previously verified is now corrupt (actual rot), not that the reason for corrupt blocks matters once they are corrupt. Bitrot happens more frequently when you don't have such stringent environmental controls in your home as you would in a data center, and I have personally seen it with only 10's of GB of my data.
In my experience, data that is backed up and archived, isn't a prime target for user error nor gross negligence regarding data backups. The user is definitely experiences some sort of URE. In this case, a proper file system is quite important for protecting the data. I would recommend setting up a multi-drive NAS using
What's the point in joining in ? To me it's obvious you have no clue about data storage and magnetic media in general, so no matter what I say you won't agree, I only chipped in earlier because I thought your statement was so funny!
Come now, do explain the process by which GMR media loses its data integrity over time. I'm all ears.
Write errors happen, transient data transfer errors happen, bad sectors (bad from day 1) happen, mechanical failures happen, sure, but none of that is "bitrot".
Socialism: a lie told by totalitarians and believed by fools.
Which version of the kernel and btrfs-progs are you using? Some distros are still shipping ancient versions of the userspace tools, like 0.19 or 0.20. The latest is 3.12 (they recently started using the kernel version instead), so you may want to try compiling it from the source.
The two most helpful commands I've found are 'mount -o recovery', which can restore the superblock if it's missing/corrupted, and 'btrfs check --repair' (formerly btrfsck). Note that check doesn't actually fix the errors it finds without that flag, unlike fsck. If you have a multi-device file system, trying to mount one of the other drives can help, since copies of the metadata are stored on all of them (RAID1 style).
If that doesn't work, you can often get the data off by mounting it as readonly, or by using 'btrfs restore'.
Btrfs used to be quite buggy, but these days I've found it to be pretty stable and reliable. That only applies if you're using the latest packages though - otherwise, you might as well be using it back in the early days.
Most human behaviour can be explained in terms of identity.
Bit torrent?
Set up your very own very private tracker(s).
Create a torrent of the file trees to be duplicated and protected on the original host.
Leech it at all the redundant sites.
Wait for them all to complete the download and become seeds.
From time to time, but not all at the same time, force a recheck on each member of the swarm, to detect corruption
A failure should trigger a download to correct the corrupted block from the swarm.
You can probably get better advice on how to handle a growing archive.
I would probably try to add another torrent of the added files, then
wait for the swarm to download those files.
Then create a new torrent file that includes the old and the new in a single torrent and use that for the next forced recheck cycle.
You probably want to have a few scripts to automate the rechecks and updates.
--
The world is coming to an end, but don't stop seeding
Yes, but earlier systems, which the OP was suggesting could be used for this purpose, lacks that functionality. Also, please reset your sarcasm detector, it appears to be out of alignment -- a functional detector would have pinged on "Raid 9 Million(tm)".
Apparently ReFS will have data and metadata checksums which combined with storage spaces could detect and correct bit rot if implemented properly. While I have no idea if the OP researched the actual capabilities of ReFS, with checksums it is possible to detect bit rot without parity, and correct it with an extra (good) copy. Sarcasm is fun, but only if it's accurate. You might argue that checksums are just a form of parity and maybe I'd agree with you since apparently the error-correction codes for RAID-6 are generally referred to as parity despite actually being linear error-correction codes. But the sense I got from your comment was that you didn't believe it was possible to prevent bit rot with just two copies of checksummed data, or by storing a single copy with an error-correcting code.
Correct, and those that are aren't immune to human stupidity. No filesystem can save you from a guy who decides to pour beer into the storage array, or who goes to move a directory and misclicks sending it to the trash. Disaster recovery is not a simple matter of choosing the right filesystem and then patting yourself on the back. It requires careful planning and consideration... None of which the majority of the people on this thread seem to be capable of. At least you seem to have some grasp of the underlying technology.
Most of your other points were spot-on. Relying on single storage systems that aren't geographically distributed is just asking for trouble. Not keeping administratively separate backups or immutable version history (read-only snapshots, revision control, etc.) is also a quick way to lose your data. I don't think there are any foolproof solutions you can get at the moment. Replicated git repos are close, but there was that KDE fiasco with git not explicitly checking the cryptographic hashes during all of its operations and allowing bitrot to be replicated to other repositories. Dumb. I have never been a fan of the Linus/Linux philosophy of trusting the hardware to provide 0 bit errors per yottabyte. It's just not realistic. Of course that means that the next step will be implementing lock-step (or at least consistency-point comparison) processing in software to work around CPU/RAM errors...
You didn't use sarcasm tags and sometimes the subtler jokes are a tad hard to discern in text.
You are joking, aren't you? Because if not, have I got a great deal for you - I just need your bank account to transfer the money my uncle, a Nigerian prince, is trying to export. PM me!