Slashdot Mirror


Ask Slashdot: Practical Bitrot Detection For Backups?

An anonymous reader writes "There is a lot of advice about backing up data, but it seems to boil down to distributing it to several places (other local or network drives, off-site drives, in the cloud, etc.). We have hundreds of thousands of family pictures and videos we're trying to save using this advice. But in some sparse searching of our archives, we're seeing bitrot destroying our memories. With the quantity of data (~2 TB at present), it's not really practical for us to examine every one of these periodically so we can manually restore them from a different copy. We'd love it if the filesystem could detect this and try correcting first, and if it couldn't correct the problem, it could trigger the restoration. But that only seems to be an option for RAID type systems, where the drives are colocated. Is there a combination of tools that can automatically detect these failures and restore the data from other remote copies without us having to manually examine each image/video and restore them by hand? (It might also be reasonable to ask for the ability to detect a backup drive with enough errors that it needs replacing altogether.)"

321 comments

  1. PAR2 by Anonymous Coward · · Score: 5, Informative
    1. Re:PAR2 by Anonymous Coward · · Score: 1

      dvdisaster also uses Reed Solomon Codes, but lets you burn the data to a CD/DVD/BD in a single image:

      http://dvdisaster.net/en/index.html

    2. Re: PAR2 by Anonymous Coward · · Score: 0

      Well, I'm glad one person remembers optical media and its lack of this side effect. I usually archive stuff, such as photos and work. Its easy to use a marker to write a date range on the actual disc.

      Doing so on-the-fly prevents the need for doing this in bulk.

    3. Re: PAR2 by Qzukk · · Score: 1

      I'm glad one person remembers optical media and its lack of this side effect

      Funny, I remember optical media being unreadable just months after it was burned. Sure, you can say don't use cheap media, but how do you know your media is good?

      --
      If I have been able to see further than others, it is because I bought a pair of binoculars.
    4. Re: PAR2 by djsmiley · · Score: 1

      yes, because rewritable disks have never gone wrong, right?

      --
      - http://www.milkme.co.uk
    5. Re: PAR2 by Miamicanes · · Score: 4, Informative

      Use non-LTH BD-R media. It's seriously the best media we've ever had for long-term archival storage, hands-down, no contest. Unlike DVD+/-R, it's phase-change magneto-optical WORM... the laser liquefies the plastic, the magnet orients little shiny planar mirrors, the plastic solidifies, and the bits are about as close to 'carved in stone' as you're likely to ever get. As a technology, it's not cheap... but it definitely minimizes the number of things that can go wrong over a ~25-year timeframe:

      * decouples media from its player... the achilles heel of hard drive-based backup schemes. A broken hard drive means a spectacularly expensive data-recovery job. A broken BD drive means buying a new one.

      * phase-change MO media doesn't bleach or darken with age... and if it's going to delaminate or anything (like early optical discs often do), it's overwhelmingly likely to happen sooner rather than later (while you still have the originals available to re-archive if necessary).

      * I think we can safely accept that future evolution to optical discs will remain downwards-compatible with reading older media. Seriously, CDs are THIRTY YEARS OLD, and any Blu-Ray player from China can still play them just fine (plus everything that's ever been commonly burned/stamped into them). A 2037 Apple Eve might have the masses drooling over its legacy-free minimalist purity, but the rest of us will have a 600 petabyte optical drive manufactured by a sweatshop in Uganda or Haiti that can read old BD-R discs just fine (at least, after opening it up and soldering a wire across two pads on the circuit board to make it think it's supposed to be their $6,000 enterprise version instead).

    6. Re: PAR2 by egarland · · Score: 1

      Optical has had a good run, but I'm betting that in 2037, optical will be dying or dead.

      There's a lot of theoretical improvements left in optical disk technology, but they're unlikely to become common or cheap. I see possibly one generation after Blu-ray before the consumer standards stop and the access to cheap technology to drive advancements in optical storage disappears. Spinning disk is largely thought of as the primary competitor, but what's going to give optical the biggest headache is flash.

      Flash storage's non-existent power requirements, extremely high density, naturally long read-only lifespan, re-usability, and flexible expansion options make it poised to take over the world of archival storage if it can come anywhere near cost-parity. My bet is that it will make it.

      --
      set softtabstop=4 shiftwidth=4 expandtab nocp worlddomination
    7. Re:PAR2 by JoshRosenbaum · · Score: 1

      Multipar has superseded Quickpar. It allows multiple directories to be handled and is actually still being developed unlike most of the other old par programs.

      http://multipar.eu/
      http://hp.vector.co.jp/authors/VA021385/

    8. Re: PAR2 by Anonymous Coward · · Score: 1

      Flash isn't an archival medium. Once the electrons leave the gates, Elvis has left the building, the data is gone and gone for good. Maybe is someone makes a flash drive that has the ability to constantly check and repair itself (how much ECC is enough can be debated), with the capability to alert users that the drive is about to tank, and to plug in another one so the data can be copied to another drive before the lights go out.

      Optical isn't going anywhere. Yes, one can stream Netflix, but bandwidth isn't increasing in a lot of areas around the globe. CDs, DVDs, and BD media will always have/need players. Plus, there is a good chance that a 10 year old CD will play and rip. Flash media has not been out long enough for us to know if in 10 years that the SD card with pictures sitting on the shelf will be usable or if the data will be completely gone.

      Optical has a good ways to go before it is dead. Holographic storage has been a flash in the pan from the days of Tamarak to InPhase technologies. However, it is only a matter of time before we see the technology in the second generation after Blu-Ray (the generation after Blu-Ray has been finalized by Sony and Panasonic with 300GB disks initially.) From there, who knows... holographic storage can go into the terabytes without issue in theory, but I've yet to see a HVD in the wild.

    9. Re: PAR2 by egarland · · Score: 1

      > Flash isn't an archival medium.

      Anything can be an archival medium, it's just a question of if it's good at it. Fash has been in large-scale use for 20 years now. It became the primary way computers stored their BIOS back in 95 so I think we have a fairly good understanding of it's long-term storage characteristics. Optical won't die today, and I expect the market to stay strong for 5 or 10 years, but in 20, it will be all but gone.

      --
      set softtabstop=4 shiftwidth=4 expandtab nocp worlddomination
    10. Re: PAR2 by Anonymous Coward · · Score: 0

      ^This.
      I go a little overboard* and also build ECC data with DVDisaster into the ISO images before I burn them to disk, but the fact that non-LTH** BD-R media uses non-dye phase change writing tech is a game changer for the utility of optical archiving.

      Even better, Millenniata recently released a BD-R media called M-DSIC that has a claimed life expectancy of 1,000 years. We'll see in 1,000 years if that claim holds up. ;-)

      *as some would see it.

      ** LTH dye-based BD media. I want to throttle the person who came up with that idea. "Hey, let's make the media really crappy and we can pass on the 10% savings! Data integrity? Who cares about that!"

    11. Re: PAR2 by Joce640k · · Score: 1

      . We'll see in 1,000 years if that claim holds up. ;-)

      *as some would see it.

      Nobody will ever know because there won't be a drive capable of reading it.

      --
      No sig today...
    12. Re: PAR2 by Miamicanes · · Score: 3, Informative

      EEPROM also happens to be the ancestor of SLC flash, not MLC, TLC or worse.

      Flash is like a leaky bucket that starts out full of water, and gets drained to some level when a cell's value is set:

      SLC == "The bucket is either totally empty (0), or has some water in it (1)"

      MLC == "The bucket can be totally empty (00), non-empty to ~33% full (01), 33%-~66% full (10), or 66-100% full (10). After 1/3 the water leaks out, the cell's value is corrupt.

      TLC == same idea as MLC, but the bucket has EIGHT levels instead of four. Do the math to figure out how much metaphorical water can leak out before the cell's value becomes corrupted.

      BIOS eeproms are also a larger process than high-density flash, so the buckets themselves are larger while the leaks remain relatively constant in size. In other words, you're comparing a metaphorical 55 gallon drum with a slow drip that has to be completely empty to change from 1 to 0 to a thimble with 8 tick marks on the side and a leak of the same size.

    13. Re: PAR2 by kesuki · · Score: 1

      they said that about paper every time it's been invented, but the problem with paper is it's inability to handle too little humidity (dry rot) too much humidity (mildew etc) and it's tempting nesting site for insects that routinely eat tree leaves. oh and it's bitrate per energy put into it is atrocious especially if you throw in modern hermetically sealed deoxygenated and humidity controlled environs. but it is easy to copy, any schooled child can copy letters from one piece of paper to another. but computers are even more awesome for data sharing and copying. even if laws against it exist. but i digress. having an optical backup is fine, there are times where optical is necessary, but it doesn't prevent accidental damage of discs or make sorting them easier. bitrot detection is an underserved market. raid has it, bluray doesn't and some filesystems actively have deduplication to reduce the number of copies left undeleted are few. anyways, the best way to check for bitrot is by scanning the md5sum on them with a script that runs automatically on the server that sends out the files to the offsite and if bitrot is detected it simply requests the data from the off site storage. before it is lost.

    14. Re: PAR2 by Anonymous Coward · · Score: 0

      Flash storage's non-existent power requirements, extremely high density, naturally long read-only lifespan,

      As far as I know, flash has absolutely no "naturally long read-only lifespan". The charge in flash is naturally lost over time, *requiring* reads (which trigger rewrites) in ordet to keep the flash cells charged. And guess what, the re-writes can trigger the usual write problems, leading to the interesting effect where reading a flash cell can make it unusable There's also the "read disturb" case, where reading repeatedly a flash cell can affect nearby flash cells (see http://www.snia.org/sites/default/files/SSSI_NAND_Reliability_White_Paper_0.pdf), leading again to pre-emptive re-writes.

      So no, I don't see flash memory as it is today useful for long-term archival, because lots of reads mean writes. There's no "pure read-only use case", so to say.

      (Posting AC 'cause I don't have my password handy)

    15. Re: PAR2 by Anonymous Coward · · Score: 0

      FLASH is about to be superseded by memristors next year, which use resistance to save bit state, which is a form of phase change. We should be finding out quite soon about long term memristors storage.

    16. Re: PAR2 by V+for+Vendetta · · Score: 1

      In thousand years? Just ask your replicator (aka "3D printer" these days) to create one ...

    17. Re: PAR2 by Gen_Music · · Score: 1

      Flash's high density still can't hold a candle to holographic storage.

  2. ZFS filesystem by Anonymous Coward · · Score: 5, Informative

    One single cmd will do that,

    zpool scrub

    1. Re:ZFS filesystem by ravenswood1000 · · Score: 1

      Yep, ZFS

    2. Re:ZFS filesystem by vecctor · · Score: 5, Informative

      Agreed, ZFS does exactly this, though without the remote file retrieval portion.

      To elaborate:

      http://en.wikipedia.org/wiki/ZFS#ZFS_data_integrity

      End-to-end file system checksumming is built in, but by itself this will only tell you the files are corrupt. To get the automatic correction, you also need to use one of the RAID-Z modes (multiple drives in a software raid). OP said they wanted to avoid that, but for this kind of data I think it should be done. Having both RAID and an offsite copy is the best course.

      You could combine it with some scripts inside a storage appliance (or old PC) using something like Nas4Free (http://www.nas4free.org/), but I'm not sure what it has "out of the box" for doing something like the remote file retrieval. What it would give is the drive health checks that OP was talking about; this can be done with both S.M.A.R.T. info and emailing error reports every time the system does a scrub of the data (which can be scheduled).

      Building something like this may cost a bit more than for just an external drive, but for this kind of irreplaceable data it is worth it. A small atom server board with 3-4 drives attached would be plenty, would take minimal power, and would allow access to the data from anywhere (for automated offsite backup pushes, viewing files from other devices in the house, etc).

      I run a nas4free box at home with RAID-Z3 and have been very happy with the capabilities. In this configuration you can lose 3 drives completely and not lose any data.

      --
      Why, yes I have been touched by His noodly appendage. And I plan to sue.
    3. Re:ZFS filesystem by Guspaz · · Score: 5, Informative

      You don't need raidz or multiple drives to get protection against corrupt blocks with ZFS. It supports ditto blocks, which basically just means mirrored copies of blocks. It tries to keep ditto blocks as far apart from eachother on the disk as possible.

      By default, ZFS only uses ditto blocks for important filesystem metadata (the more important the data, the more copies). But you can tell it that you want to use ditto blocks on user data too. All you do is set the "copies" property:

      # zfs set copies=2 tank

    4. Re:ZFS filesystem by Mike+Kirk · · Score: 2, Informative

      I'm another fan of backups to disks stitched together with ZFS. In the last year I've had two cases where "zfs scrub" started to report and correct errors in files one to two months in advance of a physical hard drive failure (I have it scheduled to run weekly). Eventually the drives faulted and were replaced, but I had plenty of warning, and RAIDZ2 kept everything humming along perfectly while I sourced replacements.

      For offsite backups I currently rotate offline HDD's, but I should move to Cloud storage. Give a bit of my surplus space and bandwidth to someone like Symform, and in turn they give me a free little slice of the Cloud to have TrueCrypt archives mirrored into. Win-win!

    5. Re:ZFS filesystem by Anonymous Coward · · Score: 0

      until it says "restore from backup".

    6. Re:ZFS filesystem by x_t0ken_407 · · Score: 1

      ZFS immediately came to mind when I read the summary.

    7. Re:ZFS filesystem by cas2000 · · Score: 2

      true, but you do need multiple disks (mirrored or raidz) to protect against drive failure.

      two or more copies of your data on the one disk won't help at all if that disk dies.

      fortunately, zfs can give you both raid-like multiple disk storage (mirroring and/or raidz) as well as errror detection and correction.

      That ZFS_data_integrity link in the post you were replying to gives a pretty good summary of how it works.

      The paragraphs immediately above that (titled 'Data integrity', 'Error rates in hard disks', and 'Silent data corruption') also give a good summary of why error-correcting filesystems like ZFS (and btrfs) are necessary, especially with the huge sizes of modern drives.

      In fact, anyone interested should read the entire wikipedia article.

      ps: neither raid nor ZFS is a substitute for backups. you still need backups of your data (preferably with off-site copies) to protect against accidental deletion or overwrite (snapshots can help with this if used intelligently prior to the event) or burglary or catastrophic damage like fire or flood.

    8. Re:ZFS filesystem by Anonymous Coward · · Score: 0

      The rule of thumb that I have read for backups is at least two back-ups where you use at least one different file system from the original and one different media.

    9. Re:ZFS filesystem by Anonymous Coward · · Score: 0

      Good luck when a drive dies. I'd mirror instead.

    10. Re:ZFS filesystem by Guspaz · · Score: 1

      I agree, which is why I'm using raidz2. But that's not the problem I was suggesting a solution to. I was suggesting a solution to the problem of "data on single hard drive eventually goes corrupt, and I don't want to buy a second hard drive."

  3. Excellent question by Anonymous Coward · · Score: 0

    I really hope this discussion provides good answers, with practical solutions for Windows, IOS, and Linux... I think that this is the sort of thing that everyone could really use!

    Are there cloud storage providers that can do this for the above example of an approx. 2 TB data set, and provide complete security?

    1. Re:Excellent question by sandytaru · · Score: 2

      There are, but you'll be paying a lot of $$$ for that kind of storage in the cloud. I get 4GB for free from DropBox. SkyDrive from Microsoft will set you back $1000/month for 2TB - DropBox is about twice that much. It's not really practical for media files.

      A much better solution would be archival quality Blue-Rays. They can hold 25 GB apiece and they're supposed to last 100 years, but they really just need to last long enough until a new, even denser storage media comes along.

      --
      Occasionally living proof of the Ballmer peak.
    2. Re:Excellent question by SirMasterboy · · Score: 5, Informative

      Not all cloud storage is expensive. It's only $4 a month for unlimited backups to CrashPlan.

      They also do checksums and versioning and can be set to never remove deleted files from the backup.

      I have 12.8TB backed up to them and it's been working great.

      Other than that, ZFS can't be beat. I use that as well.

    3. Re:Excellent question by lgw · · Score: 3, Insightful

      Bitrot is a myth in modern times. Floppies and cheap-ass tape drives from the 90s had this problem, but anything reasonably modern (GMR) will read what you wrote until mechanical failure.

      The key therefore is to verify as you write. Usually, verifying a sample of a few GB will let you know if everything went OK. DO your backups with checksums of some sort. A modern tape drive and backup software will do that automatically, and let you schedule a verify automatically as part of backups (2 TB? That's 1 tape - might want to consider that), though ideally you should verify a tape on a different drive than the one you wrote it on.

      For disk-based backups, local or cloud, I strongly recommend archiving to a format with checksums (RAR etc) over some sort of raw file copy. Especially for anything going over the network: RAR a volume/file set locally first, then upload, then test the archive.

      If you have a superstitious fear of bitrot, you can always do some random sampling of archive integrity, and keep multiple historical copies of files just in case (e.g., don't just delete backup N-1 when you do backup N, do a rotation scheme).

      --
      Socialism: a lie told by totalitarians and believed by fools.
    4. Re:Excellent question by Anonymous Coward · · Score: 0

      He has already expressed knowledge regarding how to STORE his files. He is asking for expert help on how to store them BETTER. Read his post again, and offer some constructive advise, or go back to your basement.

    5. Re:Excellent question by clickclickdrone · · Score: 2

      Are there cloud storage providers that can do this for the above example of an approx. 2 TB data set, and provide complete security?

      Cloud and complete security together is an oxymoron.

      --
      I want a list of atrocities done in your name - Recoil
    6. Re:Excellent question by heypete · · Score: 1

      It depends on your storage needs. For things that you need to regularly access, Amazon S3 will cost you about $175/month for 2TB storage plus transfer fees, but is readily accessible at any time.

      Amazon Glacier would only cost you $20/month for that amount of storage, but has various limitations on retrieval time (~4 hour minimum) and higher costs if you need to retrieve more data in a shorter amount of time. As the name suggests, it's designed for "cold storage".

      Both offer extremely high degrees of reliability.

    7. Re:Excellent question by mlts · · Score: 3, Interesting

      In reality, Dropbox, Skydrive, and other cloud services should be treated as a type of media, just like BD-ROMs, tape, SDD, HDD, and even hard copy.

      The trick is to use different media to protect against different things. My Blu-Ray disks protect an archive against tampering or CryptoLocker (barring a hack that flashes the BD burner's ROM to allow the laser to overwrite written sectors.) However, they have to be maintained in a good environment with a good indexing system. My files stashed on Dropbox bring me accessibility virtually anywhere... but malware that erases files could wipe that volume out in no time.

      Similar with external HDDs. Those are great for dealing with a complete bare metal restore, but provide little to no protection against malware. Tape, OTOH, is expensive for the drive and requires a fast computer, but once the read-only tab is flipped or the WORM session is closed, the data is there until the tape is physically destroyed.

      Of course, there is not just media... there are backup programs. This is why I use the KISS principle when it comes to backups. I use an archiving utility to break up a large backup into segments (with recovery segments to allow the archive to be repaired should media go bad), then burn the segments onto optical media.

      I've found that using a backup utility can work well... until one has to restore, the company is out of business, and one can't find the CD key or serial number so the software will install. One major program I used for years worked excellently... then just refused to support new optical drives (as in ignoring them completely.) So, unless I can find a DVD drive on its antiquated hardware list on eBay, all my backups are inaccessible. I was lucky enough to find that and copy the data to a HDD, but using the lowest common denominator is a good thing.

      Backups are the often neglected underbelly of the IT world. While storage, security, availability and other technologies have advanced significantly, backups on the non-enterprise level are still languishing behind in almost every way possible. It was only a few years ago that encryption became standard with backup utilities [1].

      [1]: With encryption comes key management, and some backup programs make that easy, some make it incredibly hard.

    8. Re:Excellent question by lxs · · Score: 1

      So if someone doesn't have your level of expertise on a single isolated topic you automatically dismiss this person as unworthy of your company?
      This is why people don't like you.

    9. Re:Excellent question by QuietLagoon · · Score: 1

      Bitrot is a myth in modern times.

      You state this without any substantiation as if it were a fact.

    10. Re:Excellent question by rabtech · · Score: 5, Interesting

      Bitrot is a myth in modern times. Floppies and cheap-ass tape drives from the 90s had this problem, but anything reasonably modern (GMR) will read what you wrote until mechanical failure.

      This isn't just wrong, it's laughably wrong. ZFS has proven that a wide variety of chipset bugs, firmware bugs, actual mechanical failure, etc are still present and actively corrupting our data. It applies to HDDs and flash. Worse, this corruption in most cases appears randomly over time so your proposal to verify the written data immediately is useless.

      Prior to the widespread deployment of this new generation of check-summing filesystems, I made the same faulty assumption you made: that data isn't subject to bit rot and will reproduce what was written.

      ZFS or BTRFS will disabuse you of these notions very quickly. (Be sure to turn on idle scrubbing).

      It also appears that the error rate is roughly constant but storage densities are increasing, so the bit errors per GB stored per month are increasing as well.

      Microsoft needs to move ReFS down to consumer euro ducts ASAP. BTRFS needs to become the Linux default FS. Apple needs to get with the program already and adopt a modern filesystem.

      --
      Natural != (nontoxic || beneficial)
    11. Re:Excellent question by Anonymous Coward · · Score: 2, Insightful

      it doesn't seem that way... http://forums.freenas.org/threads/ecc-vs-non-ecc-ram-and-zfs.15449/

    12. Re:Excellent question by Anonymous Coward · · Score: 0

      So if someone doesn't have your level of expertise on a single isolated topic you automatically dismiss this person as unworthy of your company?
      This is why people don't like you.

      In fairness the real context of the question is "what is the cheapest way for me to safely store my files without doing any work" since a basic solution involving local hard drives, md5sum, and diff _should_ be obvious to any slashdotter, or else they should GTFO. The way the submitter worded it makes it sound like they genuinely don't know, and the reality is they either know and want an easier/cheaper way, or are dumb and should GTFO. We really should be demanding more from approved submissions.

    13. Re:Excellent question by yakatz · · Score: 1

      Second shoutout for Crashplan! I have eight computers backing up to one account with "unlimited" storage and versioning.

    14. Re:Excellent question by mlts · · Score: 1

      I'm curious how that is doable. Even Amazon Glacier would be about $10.24 per terabyte stored per month, so I'd be looking at about $130/month for that much info.

      I am not passing judgement... just have not heard much about CrashPlan, good/bad other than a quick search on it.

    15. Re:Excellent question by drussell · · Score: 1

      Actually, that was a reply to THIS post, not the original question posted by timothy...

      I really hope this discussion provides good answers, with practical solutions for Windows, IOS, and Linux... I think that this is the sort of thing that everyone could really use!

      Are there cloud storage providers that can do this for the above example of an approx. 2 TB data set, and provide complete security?

      I still think questions about basic data integrity, checksums, parity, ECC on disks etc. should be completely unnecessary and most certainly already be second nature to the slashdot crowd, but I guess I'm just living in the past.

      Thanks for immediately jumping down my throat, though ;)

    16. Re:Excellent question by drussell · · Score: 1

      So if someone doesn't have your level of expertise on a single isolated topic you automatically dismiss this person as unworthy of your company?

      The Anonymous Cowards? Yes.

      Please continue the technical discussion. Sorry for the noise.

    17. Re:Excellent question by DigiShaman · · Score: 1

      CDRs suffer nasty bitrot. Usually most CDs made in the past 10 years. I suppose you could have vacuum sealed them, but how many people knew to do that?!! You can get medical grade gold disks, but those you have to special order (not found in your local computer store).

      One of my clients geoscience data projects archived in CDRs. It's only when they went to pull them did they discover the bitrot problem. We used Nero DiskSpeed to performa surface scan. You can see entire segments where goes green (good), transitions into yellow (correctible), to red (damaged unreadable) and the back out to yellow and green again. It's the material that oxidizes. Since then, they pulled all data they could back onto disk and tape. God only knows how long that will last too.

      --
      Life is not for the lazy.
    18. Re:Excellent question by Anonymous Coward · · Score: 0

      GMR refers to the technology of the read head. It has nothing to do with the quality of the long term storage of the media. Perhaps you could educate yourself?

    19. Re:Excellent question by SirMasterboy · · Score: 1

      Well, BackBlaze is another similar backup company who is far more public about their costs and operations. I think they have said their customer break-even point is around 3-4TB. So if most customers have far less than that, then a few can have far more and it all works out.

      http://www.wired.com/insights/wp-content/uploads/2011/10/backblaze-cost.png

    20. Re:Excellent question by fnj · · Score: 1

      Bitrot is a myth in modern times. Floppies and cheap-ass tape drives from the 90s had this problem, but anything reasonably modern (GMR) will read what you wrote until mechanical failure.

      Oh, really? Is that why drive manufacturers specify a non-recoverable read error rate - typically on the order of 1 bit per 100 terabits? Let's see now. A single 4TB drive contains 32 terabits of data. So if you have three of them, either in a RAID or separately, and you try to read the entire contents, you can expect an average of one bit to be rotted permanently and lost forever. Or that bad bit could happen a lot earlier. Conceivably the first bit you try to read. Or the one millionth. And that is not considered a failed drive. You can't magically guard against these by verifying the recorded data one time, either a nominal portion or even in its entirety.

      RAR's checksums will only detect errors that happen to occur when you test read the RAR archive. They won't repair it, and testing OK is no guarantee that it won't have an error the next time you read it. PAR2, on the other hand, does provide for repair.

      ZFS can at least detect, and optionally repair (if you use the redundancy options) these isolated bad bits, without the necessity for any special file metadata like PAR2. Of course, there's nothing to say you can't use both ZFS and PAR2.

    21. Re:Excellent question by linear+a · · Score: 1

      Bitrot is a myth in modern times.

      You state this without any substantiation as if it were a fact.

      And I'll counter the above. The last bitrot event I had to deal with - on current server grade (Windoze, tho) hardware was waaaay back last Friday.

    22. Re:Excellent question by drussell · · Score: 1

      Thank you. A thoughtful, concise Anonymous post... You've just restored some of my faith in the AC. ;)

    23. Re:Excellent question by heypete · · Score: 1

      Users that utilize large amounts of storage are relatively uncommon and are subsidized, in part, by users who utilize less storage. If everyone used terabytes of storage at $4/month, that wouldn't really be sustainable.

      Although just a personal anecdote, I've used CrashPlan for ~4 years now (with 11 computers belonging to various family members all backing up to their service with a total of around 500GB being stored with them). Zero complaints. It's done everything I expected, always worked, and never had issues. When I had a laptop stolen and purchased a replacement, I was able to restore all the files from CrashPlan in about a day or two of downloading. I highly recommend it.

    24. Re:Excellent question by bluefoxlucid · · Score: 1

      "We're experiencing data going bad and not being restorable from back-ups because it just CORRUPTS itself for no visible reason" "That's a myth and doesn't actually happen."

      HIV was created by racist bigots to slander blacks and homosexuals.

    25. Re:Excellent question by bluefoxlucid · · Score: 1

      Outsourced information services in general have known security concerns. That they come under a new buzzword doesn't make them less secure. Even contractors who come in and touch your systems can walk out with massive amounts of private data.

    26. Re:Excellent question by Anonymous Coward · · Score: 0

      Just wait until they get bought out and the new parent company drops you.

    27. Re:Excellent question by mlts · · Score: 1

      You hit the nail on the head. Apple should either get with Oracle and put ZFS back in the OS X kernel as the default filesystem, get with Microsoft and license ReFS. HFS+ was a good filesystem when OS X hit the market, but it has been over a decade, and everyone else has moved on.

      One reason why the IT industry moved from RAID 5 to RAID 6 as a standard is because even though disk capacities are growing, but I/O is not keeping pace. So, it takes longer and longer to rebuild a drive. RAID 6 is now a must because of the length of a rebuild being so long that there is a good chance of another drive failing while the RAID array is in degraded mode. Of course, this is for tier 3 storage, but tier 2 storage is also having similar issues as well.

    28. Re:Excellent question by entrigant · · Score: 3, Interesting

      I've been surprised by the lack of reference of proper error checked data paths so far in these comments. I'm continually saddened by ever increasing aggressiveness in clocks and density of RAM in consumer level systems while stubbornly refusing to implement ECC. Many people are even hostile to the idea as if ECC RAM is somehow tainted.

      This article points out something else I'd not even considered. A scenario where lack of ECC on a self healing file system can amplify a RAM failure to a catastrophic degree making such filesystems even riskier to run on consumer grade systems.

      Thank you for sharing.

    29. Re:Excellent question by s13g3 · · Score: 1

      This doesn't even count the fact that optical media is still subject to the same degradation and bitrot that tape is.

      And anyone who thinks electromagnetic tape is "dead" is naive or just ignorant. People have been predicting the death of tape for decades, and it's no more true today than it was in the 70's. Modern EM tape is typically rated for 15 to 30 years of retention, and as long as it is not over-exposed to moisture during storage, it has proven to be able to last that long: otherwise, the manufacturers would be out of business because the Fortune 500 and S&P 500 companies - the majority of whom backup to tape and send it off-site - would have sued them to extinction.

      On the other hand, according to archives.gov:

      "CD/DVD experiential life expectancy is 2 to 5 years even though published life expectancies are often cited as 10 years, 25 years, or longer. However, a variety of factors discussed in the sources cited in FAQ 15, below, may result in a much shorter life span for CDs/DVDs."

      --
      "Inveniemus Viam Aut Faciemus" 'We will find a way... Or we will make one!' --Hannibal of Carthage
    30. Re:Excellent question by Anonymous Coward · · Score: 0

      Just wait until they get bought out and the new parent company drops you.

      Whens the last time that happened? Has it ever happened? There have been a few rare cases of photo upload sites blacking out but I havent heard of any clouds going dark without any warning. Worst case is, if you go with the cheapest provider and they send you a letter saying your plan isnt up for renewal, just move your files to the second cheapest solution.

    31. Re:Excellent question by bill_mcgonigle · · Score: 1

      It's only $4 a month for unlimited backups to CrashPlan.

      Do they throttle? I looked into the one that advertises unlimited backups for $60/yr and they rate limit the connection down as you increase your data. I estimated 9 years for the first backup to complete based on published rates.

      "Unlimited" - IDTIMWYTIM.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    32. Re:Excellent question by SirMasterboy · · Score: 1

      Not that I have seen. It maxes my 5Mbit upload and my downloads are 15-20Mbit.

    33. Re:Excellent question by NatasRevol · · Score: 1

      "IDTIMWYTIM." should be worked out to be SOMEIDIOT

      --
      There are two types of people in the world: Those who crave closure
    34. Re:Excellent question by SecurityTheatre · · Score: 1

      What is the most practical way to maintain bitwise accuracy on a diverse set of binary data in an automated way using "diff and md5sum"?

      Note that part where he was looking for an automated solution that will run itself without intervention, or a better means than hard drives...

      You suggested... "Do some manual stuff using hard drives".

      Right.

    35. Re:Excellent question by Anonymous Coward · · Score: 0

      I have 12.8TB backed up to them and it's been working great.

      God I hope your not on their $4 a month (or whatever it is now) plan. If so I guess I don't feel bad about the 1TB I've uploaded....

    36. Re:Excellent question by lgw · · Score: 2

      Well, I did backup software and hardware for nearly 20 years. But I can't substantiate that with a link.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    37. Re:Excellent question by Anonymous Coward · · Score: 0

      What idiot modded this +1 informative? BITROT is a MYTH, says lgw! Wow. Well, I guess all of us are just wasting our time not listening to you!

    38. Re:Excellent question by lgw · · Score: 1

      I've investigated hundreds of cases of "bit rot" over the years in my job, and other than very weak magnetic media (or CD-Rs as someone upthread pointed out), corrupt backups were always corrupt when written. Had the poor SOB only verified his backups day 1, he'd not be in a world of shit. Every single time.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    39. Re:Excellent question by Anonymous Coward · · Score: 0

      I think what is different with modern drives it they tell you that the data is bad, rather than returning bogus data. Plus with your favorite form of block protection the drive firmware, buses errors/etc can be detected. Frankly, filesystem level protection is helpful, but I would trade it for applications that wrote some form of error detection (CRC at a minimum) to the end of their files. Bad ram and lots of other machine level problems can cause silent data corruption even with ZFS.

    40. Re:Excellent question by lgw · · Score: 1

      The error rate from other sources (e.g. on the network copy) is far higher. If your backups are corrupt, it's almost certain they were corrupt day 1.

      Test your backups after you make them: it's a cheap and easy 99% solution.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    41. Re:Excellent question by lgw · · Score: 1

      You make a great point about CD-Rs, I guess I should have broadened my statement to "cheap-ass backup solutions from the 90s", not just floppies and tape.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    42. Re:Excellent question by Anonymous Coward · · Score: 0

      I don't know if they throttle, but DO READ their terms and conditions carefully.

      In case others find this informative, the CrashPlan program has enforced over the air updates. This means that you have to be running their latest version. Over the past two years they have removed some features from their basic version. I may have that wrong, as when you first install you get their "Pro" version as a taster, and after a month it rolls back to basic. But, the point is, you are 100% beholden to the CrashPlan company. If they decide to shut down the program or remove features then you are stuck. This scares me a little.

      Second to this, their "Pro" version is not available as a single purchase. You can only get the "Pro" version by paying a monthly fee. Now, for that fee you will get online storage. But I wrote to them and asked if I could simply buy the "Pro" version and I was told it was only available with a subscription.

      These two concerns make me very hesitant about CrashPlan, which is otherwise a very good program. It has some quirks (backing up to network shares is painful, but the workaround is to run the client on the network server itself, or mount the drive as the SYSTEM user via a script at startup) and the interface sometimes get confused, but in general these are minor.

      I do believe that the major limitation of the basic version, "single daily snapshot and backup" is much too restrictive and means it can't really be considered by professionals who create data all day long. Which is the whole point, they want you to buy that "Pro" subscription and keep paying month after month to use their program.

      So far I don't know of any open source backup solutions that offer a similar mix of features (firewall punching, backing up to other clients as well as friend's computers, complete encryption and de-duplication) but if there are I would love to hear about them.

    43. Re:Excellent question by bluefoxlucid · · Score: 3, Interesting

      I used to fancy a girl who worked as a data recovery engineer. You wouldn't believe how many people hear the RAID controller alarming and get up to close the case instead of hot swapping a spare drive.. then a week later the second drive goes. She had a fanciful story about how spinning disks used to occasionally fail in such a way that a random sector would go bad, report incorrect data, and a RAID-1 mirror would "fix" it by destroying data on the other drive. She also used to tell me software RAID options had a tendency to actually beat hardware RAID options for data integrity outside of other inline failures--that is, when the system is operating under optimal circumstances, most hardware RAID systems more often self-corrupt than software RAID systems. Just an odd statistic, and I never got overall risk performance stats out of her.

    44. Re:Excellent question by doggo · · Score: 1

      "Thanks for immediately jumping down my throat, though ;)"

      Yeah. 'Cause you're the victim. WTF? Someone calls you out for being dickish, and they're jumping down your throat?

    45. Re:Excellent question by DigiShaman · · Score: 1

      I don't know of any long-term backup solutions aside from gold CDs to be quite honest. If they're not prone to bit-rot, they media reader's interface will be obsolete on new equipment. It's doable, but not without first creating bridge solutions and data migration. The way I see it, migrate from media to media as technology progresses, or face an entire migration project later.

      I suppose you could archive on flash drives, but I haven't a clue as to what the life expectancy of the flash chips are before bits start flipping randomly (gates change on die).

      --
      Life is not for the lazy.
    46. Re:Excellent question by Mysticalfruit · · Score: 1

      As someone who has 100's of TB's of data stored in ZFS I couldn't agree more. In most cases if ZFS spits out a drive because it's convinced it's writing bad blocks, I believe it. In most cases (if it's a seagate drive) seatools backs me up on this... in several cases sea tools doing a quick check says the drive is fine... it never fails if I do a "full" scan of the drive it'll eventually throw an error.

      I've found damaged SAS cables, JBOD enclosures with dodgy bridges, etc. because of ZFS.

      With that all said, now that you've gone out and bought a small PC, stuffed 4, 4TB drives into it and set it up as a raid10 using ZFS you now need to ask the next question... what's more likely... I'm going to have two drives fail simultaneously or that my house is going to get hit with a {flood, lightning, fire, thieves, etc}

      Honestly, I'd build two of these devices, one for local backups and I'd put one at a buddies house and do remote backups from your local device.

      --
      Yes Francis, the world has gone crazy.
    47. Re:Excellent question by ThatsMyNick · · Score: 1

      You are missing a key ingredient: encryption.

    48. Re:Excellent question by Anonymous Coward · · Score: 0

      Thanks for immediately jumping down my throat, though ;)

      Pot, meet kettle.

    49. Re:Excellent question by Anonymous Coward · · Score: 0

      Replying as AC and the complimenting yourself is a little desperate.

    50. Re:Excellent question by lgw · · Score: 1

      Sounds right to me - and there are sadly still people who need to be told "RAID is not backup".

      --
      Socialism: a lie told by totalitarians and believed by fools.
    51. Re:Excellent question by lgw · · Score: 1

      LTO has 30-year media easily available, and there's a lot of basis for tape for judging the real lifetime, since the technology has been around forever. For modern archive-quality tape, the backing will fail before the magnetic media. For normal LTO tape different manufacturers make different claims, but more than 10 years is normal. Insuring you can still read the tape is of course a different challenge, but the drives try to be backwards compatible for a while (and the drives are fairly robust when in limited use). Fortunately, connection interfaces seem to be slowing their rate of change - a PCIe card will likely find a slot in servers for years to come, and SAS will also likely be around for quite some time, though the cards may get pricey if they become legacy-only.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    52. Re:Excellent question by bluefoxlucid · · Score: 1

      Oh my god she said that FIVE TIMES EVERY DAY!

    53. Re:Excellent question by Anonymous Coward · · Score: 0

      Whens the last time that happened? Has it ever happened?

      I'd say ISPs backpeddling on "unlimited" counts.

    54. Re:Excellent question by funwithBSD · · Score: 1

      And I have my FreeBSD server acting as a local backup with ZFS backed storage.

      So if I do need something, I just grab it back local.

      --
      Never answer an anonymous letter. - Yogi Berra
    55. Re:Excellent question by DarkTempes · · Score: 1

      In general ISPs didn't ever have unlimited. They advertised unlimited and then knocked people off if they passed some secret unpublished limit.

      The difference now is that they no longer advertise a lie and they have published and trackable limits. The only issue is that the limits are in many cases absurdly low but otherwise it's a better practice than what they were doing before.

    56. Re:Excellent question by MarkTina · · Score: 1

      Bitrot is a myth in modern times. Floppies and cheap-ass tape drives from the 90s had this problem, but anything reasonably modern (GMR) will read what you wrote until mechanical failure.

      Well fingers crossed you are not the storage admin for anyone I deal with!

    57. Re:Excellent question by GigaplexNZ · · Score: 1

      Honestly, I'd build two of these devices, one for local backups and I'd put one at a buddies house and do remote backups from your local device.

      Oh what I'd do for usable upload bandwidth and reasonable data caps...

    58. Re:Excellent question by GigaplexNZ · · Score: 1

      Test your backups after you make them

      Obviously.

      it's a cheap and easy 99% solution

      It's not a solution. It's a bare minimum requirement that doesn't solve for bitrot.

    59. Re:Excellent question by lgw · · Score: 1

      Well, maybe I don't understand what you mean by "bitrot". GMR media doesn't "rot" in the classic sense of bits flipping over time (well, not in human-scale time), the way that happened with floppies and QUIC tape. If you're adding some new meaning to that term, you'll need to explain it.

      But if your talking about odd disk failures: as I said at the top of the thread, if you're using disk, archive stuff in RARs (or other checksummed archives), test those checksums from time to time, and don't purge old backups the moment you make new ones. Or just use tape and you're fine, at least until it gets hard to find a drive old enough to accept the tape (10+ years).

      --
      Socialism: a lie told by totalitarians and believed by fools.
    60. Re:Excellent question by baffled · · Score: 1

      BTRFS needs to become the Linux default FS.

      I just lost my wife's BTRFS partition yesterday after a hard-reset. Consulted Google for btrfs repair options and discovered they are lacking. Kept reporting root->node assertion failed, whatever that's supposed to mean. I don't recall the last time I've lost a partition like this, I assumed fsck would have done the trick.

      See https://btrfs.wiki.kernel.org/index.php/Btrfsck :

      Note that while this tool should be able to repair broken filesystems, it is still relatively new code, and has not seen widespread testing on a large range of real-life breakage. It is possible that it may cause additional damage in the process of repair.

    61. Re:Excellent question by Anonymous Coward · · Score: 0

      Many people are even hostile to the idea as if ECC RAM is somehow tainted.

      If they had ECC then they could detect their taint.

    62. Re:Excellent question by semi-extrinsic · · Score: 1

      Since you're an experienced ZFS user, do you have any recommendations for how to sync the systems described below?

      I have a setup simliar to the one you describe. One box at work with 2x3TB with ZFS and mirroring (raid1), similar box at home. The box at home is fairly recent, so I haven't gotten a good system for synchronizing them yet. My internet at home is 50/10 Mbps, work is much faster. The idea is that I backup both my personal photos (originates on home box, usually ~10 GB a month) and my work data (created on the work box, usually a steady stream of 1 GB per week and bursts of 10-50 GB occasionally). If possible I would like to have some directories on the work box that are not synchronized to the home box.

      If the fact that both computers are sources of new data is a problem, I guess it's possible to modify that workflow.

      And any other recommendations for ZFS? I scrub the pools weekly, but otherwise treat it as zero-maintenance.

      --
      for i in `facebook friends "=bday" 2>/dev/null | cut -d " " -f 3-`; do facebook wallpost $i "Happy birthday!"; done
    63. Re:Excellent question by QuietLagoon · · Score: 1

      Ahhh... a sample size of one. I understand now.

    64. Re:Excellent question by sandytaru · · Score: 1

      For someone who is simply storing large volumes of media, however, CrashPlan works out well. I forgot that we selected it for the backup system of the media server we installed for my senior project in my master's degree for our client. They needed to store about 600 GB of pictures and movies. A once daily backup is just fine for them - but I think we still negotiated a full Pro package for the other features.

      --
      Occasionally living proof of the Ballmer peak.
    65. Re:Excellent question by Anonymous Coward · · Score: 0

      Bitrot is a myth in modern times. Floppies and cheap-ass tape drives from the 90s had this problem, but anything reasonably modern (GMR) will read what you wrote until mechanical failure.

      The key therefore is to verify as you write. Usually, verifying a sample of a few GB will let you know if everything went OK. DO your backups with checksums of some sort. A modern tape drive and backup software will do that automatically, and let you schedule a verify automatically as part of backups (2 TB? That's 1 tape - might want to consider that), though ideally you should verify a tape on a different drive than the one you wrote it on.

      For disk-based backups, local or cloud, I strongly recommend archiving to a format with checksums (RAR etc) over some sort of raw file copy. Especially for anything going over the network: RAR a volume/file set locally first, then upload, then test the archive.

      If you have a superstitious fear of bitrot, you can always do some random sampling of archive integrity, and keep multiple historical copies of files just in case (e.g., don't just delete backup N-1 when you do backup N, do a rotation scheme).

      Bitrotting is very real and can happen in most of the hardware. You should look up bitsquatting: http://dinaburg.org/bitsquatting.html

    66. Re:Excellent question by Anonymous Coward · · Score: 0

      Bit rot itself is low for magnetic media, but there are plenty of other errors, like phantom writes, miss-placed writes(I forgot the proper term), errors at one of the many layers between the CPU and HD. The rate of these errors are quite static.

    67. Re:Excellent question by Anonymous Coward · · Score: 0

      Apple has joined up with OpenZFS. The group has made mention that they are interested to see what angle Apple is coming from and assume this means Apple is looking to incorporate ZFS into OS X.

    68. Re:Excellent question by lgw · · Score: 1

      If you don't trust the judgment of senior engineers, you won't get very far in life. When you need solutions that work in practice, turn to those who have been practicing for a while.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    69. Re:Excellent question by lgw · · Score: 1

      Why so? Do you have contrary experience you'd like to share? Care to join in a discussion?

      --
      Socialism: a lie told by totalitarians and believed by fools.
    70. Re:Excellent question by lgw · · Score: 1

      There are certainly write errors - but that's not bitrot, that data was bad from the beginning (which is the true explanation for almost everything called bitrot). You can always get bus errors and whatnot, but those are transient errors, and the read is quite likely to succeed on the next try.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    71. Re:Excellent question by MarkTina · · Score: 1

      What's the point in joining in ? To me it's obvious you have no clue about data storage and magnetic media in general, so no matter what I say you won't agree, I only chipped in earlier because I thought your statement was so funny!

    72. Re:Excellent question by Anonymous Coward · · Score: 0

      Linux software RAID-1 has this feature. The subsystem maintainer doesn't consider it reasonable to fix even for the multi-mirror case where voting will give you reasonable odds of restoring the right data. Because this is below the FS level you can't correct it in the filesystem. One more reason for a checksumming storage layer like ZFS.

    73. Re:Excellent question by lgw · · Score: 1

      Come now, do explain the process by which GMR media loses its data integrity over time. I'm all ears.

      Write errors happen, transient data transfer errors happen, bad sectors (bad from day 1) happen, mechanical failures happen, sure, but none of that is "bitrot".

      --
      Socialism: a lie told by totalitarians and believed by fools.
    74. Re:Excellent question by rdnetto · · Score: 1

      Which version of the kernel and btrfs-progs are you using? Some distros are still shipping ancient versions of the userspace tools, like 0.19 or 0.20. The latest is 3.12 (they recently started using the kernel version instead), so you may want to try compiling it from the source.
      The two most helpful commands I've found are 'mount -o recovery', which can restore the superblock if it's missing/corrupted, and 'btrfs check --repair' (formerly btrfsck). Note that check doesn't actually fix the errors it finds without that flag, unlike fsck. If you have a multi-device file system, trying to mount one of the other drives can help, since copies of the metadata are stored on all of them (RAID1 style).
      If that doesn't work, you can often get the data off by mounting it as readonly, or by using 'btrfs restore'.

      Btrfs used to be quite buggy, but these days I've found it to be pretty stable and reliable. That only applies if you're using the latest packages though - otherwise, you might as well be using it back in the early days.

      --
      Most human behaviour can be explained in terms of identity.
  4. Checksums? by nine-times · · Score: 1

    I don't know if there's a better solution, but you could store checksums of each archived file, and then periodically check the file against its checksum. It'd be a bit resource intensive to do, but it should work. I think some advanced filesystems can do automatic checksums (e.g. ZFS, BTRFS), but those may not be an option, and I'm not entirely sure how it works in practice.

    1. Re:Checksums? by QuietLagoon · · Score: 2
      I use checksums to check for bitrot.

      .
      Once a week, I use openssl to calculate a checksum for each file; and I write that checksum, along with the path/filename, to a file. The next week, I do the same thing, and I compare (diff) the prior checksum file with the current checksum file.

      With about a terabyte of data, I've not seen any bitrot yet.

      Long term, I plan to move to ZFS, as the server's disk capacity will be rising significantly.

    2. Re:Checksums? by Anonymous Coward · · Score: 2, Interesting

      Periodically checking them is the important part that no one seems to want to do.

      A few years back we had a massive system failure and once we recovered the underlying problems and began recovery we found that most of the server image backup tapes for 6 months+ could not be loaded. The ops guys took a severe beating for it.

      You think this stuff will never happen but it always does. We had triple redundancy with our own power backups but even that wasn't on a regular test cycle. Some maintenance guy left the switch open between floors for some reno job over a year prior and while the generators were running the power didn't make it to infrastructure.... it was as if hundreds of UPSs screamed at once and were silenced when failover didn't happen.

      You really can't beat Murphy's Law, but with regular testing you can soften the effects.

    3. Re:Checksums? by Waffle+Iron · · Score: 5, Informative

      I never archive any significant amount of data without first running this script at the top:

      find -type f -not -name md5sum.txt -print0|xargs -0 md5sum >> md5sum.txt

      It's always good to run md5sum --check right after copying or burning the data. In the past, at least a couple of percent of all the DVDs that I've burned had some kind of immediate data error

      (A while back, I rescanned a couple of hundred old DVDs that I burned ranging up to 10 years old, and I didn't find a single additional data error. I think that a lot of cases where people report that DVDs deteriorate over time, they never had good data on them in the first place and only discover it later.)

    4. Re:Checksums? by Anonymous Coward · · Score: 0

      You are assuming you started with good files. In the submitter's case, he started with some good files, some unknown number of bad files, etc. So this would just confirm that the bad file hasn't gotten worse. He wants to find the bad files and fix them as well.

    5. Re:Checksums? by QuietLagoon · · Score: 1

      You are assuming you started with good files.

      No assumption on my part. I did start with good files. :)

      In the submitter's case, he started with some good files, some unknown number of bad files, etc.

      That's not how I read the comment. From the OP:

      With the quantity of data (~2 TB at present), it's not really practical for us to examine every one of these periodically so we can manually restore them from a different copy.

      That sound to me as if he wants to check the files from time to time and locate ones that have gone bad.

    6. Re:Checksums? by Anonymous Coward · · Score: 0

      Or maybe you just didn't buy the absolute cheapest DVDs available. Also the quality of the burner may play a role.

    7. Re:Checksums? by failedlogic · · Score: 1

      I don't have a large amount of critical data to backup (mostly documents for research). I've been using PAR (or rather relying on it) to verify and correct errors when recovering data.

      That said, I realize I should probably also have a checksum. Should one consider a different algorithm then MD5, for example to prevent collisions of the hashes?

    8. Re:Checksums? by Waffle+Iron · · Score: 1

      While MD5 isn't really secure against intentional attacks any more, the probability of an random collision is still negligible.

      I originally started using MD5 for this purpose because in a test I did many years ago one some machine, md5sum actually ran faster than cksum. The shorter cksum data also does have a chance to generate hash collisions on reasonable sized data sets, although that probably doesn't matter too much for just disk error checking. I don't use the newer algorithms because they're overkill and their hash strings just look too long.

    9. Re:Checksums? by NatasRevol · · Score: 1

      weekly zfs scrub does the checks for you.

      --
      There are two types of people in the world: Those who crave closure
    10. Re:Checksums? by Anonymous Coward · · Score: 0

      You might want to do a memory check to see if there are any failures, this can cause the occasional random error when copying.

    11. Re:Checksums? by Anonymous Coward · · Score: 0

      Or just use md5deep: http://md5deep.sourceforge.net/
      It basically goes into subdirectories and calculates the md5sum of all the files.

      example:

      cd stuffToBackup/
      md5deep -r -l * > ../stuffToBackup.sums.txt

    12. Re:Checksums? by Anonymous Coward · · Score: 1

      (A while back, I rescanned a couple of hundred old DVDs that I burned ranging up to 10 years old, and I didn't find a single additional data error. I think that a lot of cases where people report that DVDs deteriorate over time, they never had good data on them in the first place and only discover it later.)

      I burnt a bunch of MD5 hashes on my cd-rs with the data nine or so years back _and_ checked them back right after burning, on a different drive, too. (I'm paranoid about data integrity.) Each passed the MD5 check. Today, I get an unreadable sector about once every 500-600 MB. Most of my data was on Verbatim cd-rs; another brand fared a bit better (one corruption every 900 MB or so).

      (Later on, I started burning par2 files to accompany dvds but soon gave up since the calculation took way too long.)

    13. Re:Checksums? by hippo · · Score: 1

      I run a weekly cron job that calculates md5sums for all the files on the media drive. Then it compares it to the previous weeks and emails the diff. If anything goes wrong I restore the file from one of my backups and check the MD5 again. I did have one drive that was slowly losing data. Turned out to be a dodgy sata port/cable but I've not lost a file yet.

    14. Re:Checksums? by nctritech · · Score: 1

      Or use sha1deep from the md5deep package. It's made specifically for hashing and comparing file trees and has heaps of behavior-modifying options.

    15. Re:Checksums? by Anonymous Coward · · Score: 0

      PAR & PAR2 already checksums your files (or blocks in PAR2) and it's even possible to store just the checksums with no redundancy. I don't see much point in adding another checksum unless you're truly paranoid and even less point storing a PAR checksum without recovery blocks. Use PAR2 and scatter the files across media.

    16. Re:Checksums? by Anonymous Coward · · Score: 0

      I do the same - I wanted something that'd work class platform, lightweight, and easy to remember. md5sum does the trick.

      I built a convenient wrapper script around md5sum that's crucial to my backup scheme. I run it on the source, backup to destination, and run check on the destination - where destination could be a usb drive, a remote rsync or scp host, a local drive, a windows machine, a os x machine, etc.

      My wrapper script is called md5tool.sh - it's not rocket science, but it provides a few functions that simplify things for me.

      md5tool.sh CREATE . - creates a checksum.md5 file for all files below this directory

      md5tool.sh CREATEFOREACH . - creates a checksum file in each sub directory of this directory (useful for organizaing a handful of archived/independent directories)

      md5tool.sh CHECKALL . - checks all checksum files found below this directory (useful for checking a whole drive)

      Here's the tool: https://github.com/codercowboy/scripts/blob/master/md5tool.sh

    17. Re:Checksums? by Anonymous Coward · · Score: 0

      We have a few users who insist on doing this sort of thing on our SAN regularly. It really disrupts data progression on a Compellent SAN.

    18. Re:Checksums? by Anonymous Coward · · Score: 0

      I don't know if there's a better solution, but you could store checksums of each archived file, and then periodically check the file against its checksum.

      I agree, wasn't transmission and storage errors what CRC was created to detect? I also routinely erase and re-write all my backups every six months or so, not only does this include new data, but it refreshes the media for the older data. I am using MicroSD cards to backup family photos and such, but I only have about 64gb stored on 8, 16bg chips (double redundant backup). I'm not sure if this would be practical for 2tb of data.

    19. Re:Checksums? by Anonymous Coward · · Score: 0

      You have a good chance that the data your originally wrote was bad. You're only checking the checksums after the data has passed through several layers of hardware to the HD. Good job, you have a checksum of bad data.

      But really, what you're doing is still good, I'm just blowing up out of proportion to exaggerate the potential issues. ZFS can do this transparently and much faster.

  5. That's what RAID is /for/ by Anonymous Coward · · Score: 0

    It seems to me that you've already identified an easy solution: RAID. A simple mirror of 2x 2-4TB drives is pretty cheap these days, so it would seem to be an ideal solution for one of your copies. Keep one "live" copy on your normal desktop, one backup on an off-site RAID, and if you feel like it, another copy on a cloud service or other media--Tape backups aren't sexy, but they're pretty cheap and very effective at long-term cold-storage. BluRay disks aren't terribly expensive anymore, though the jury is still out on their long term (decade+) durability.

    1. Re:That's what RAID is /for/ by Anonymous Coward · · Score: 1

      I don't think you understand what RAID is or what it does.

    2. Re: That's what RAID is /for/ by Anonymous Coward · · Score: 0

      There's cheaper storage for archiving and different requirements.

  6. Perforce by Anonymous Coward · · Score: 0

    You can install Perforce which is a CM system like SVN etc. but it's very good w/large binary data. You can have it run a verify command nightly (or as often as you like) and it will compare the MD5SUM of ever version of every file (which was computed when that version of the file was commited to the CM system) with the current MD5SUM and let you know which ones have changed (bitrot).

    If you don't CM your data, you can just do an MD5SUM recursively and store it off, then periodically repeat the procedure and diff the 2.

    If you like GUIs Beyond Compare is an excellent program and it does snapshots (CRCs of directory trees) and then lets you compare the snapshot with an updated / recomputed version.

  7. How are you getting bitrot? by drussell · · Score: 1

    If your physical media is dying, you'll get hardware errors so restore from a(nother) backup and replace the media.

    If your files are being corrupted, what kind of crappy filesystem are you using to store these precious memories?!!

    1. Re:How are you getting bitrot? by Anonymous Coward · · Score: 1

      Cosmic rays, magnetic data corruption. If you do not re-write the bits they decay.

    2. Re:How are you getting bitrot? by Anonymous Coward · · Score: 0

      Hard disks use very strong error correcting codes. The odds of randomly flipped bits resulting in undetected corruption are virtually nil. If bad data is stored on a disk, it's far more likely that bad data was written to the disk in the first place (OS bugs, or random corruption since most people use cheap non-ECC memory).

    3. Re:How are you getting bitrot? by Anonymous Coward · · Score: 0

      I'd like to know how this is happening also. I have many TBs of data on a variety of mediums and I cannot remember ever seeing a corrupt file over the past twenty years. I also have a Baby Croc account with many TBs and never had a problem. I still use recovery, though (par and rar with recovery record). If you cannot periodically examine family photos and videos, why do you have them? I have shitloads of family photos and I look at them all the time. That's why I took the photos.

    4. Re:How are you getting bitrot? by Anonymous Coward · · Score: 0

      I don't think using expensive non-ECC memory will help much.

    5. Re:How are you getting bitrot? by Anonymous Coward · · Score: 0

      Hard disks use very strong error correcting codes.

      Because they have to, to reach even the low reliability standards of huge consumer-grade drives.

    6. Re:How are you getting bitrot? by Anonymous Coward · · Score: 0

      You probably simply aren't noticing them, they are relatively rare and detection by checksums doesn't always work.

    7. Re:How are you getting bitrot? by Anonymous Coward · · Score: 0

      Cosmic rays, magnetic data corruption. If you do not re-write the bits they decay.

      That means every computer will eventually fail to boot up because the operating system files (which most likely never get re-written) will get corrupted.

    8. Re:How are you getting bitrot? by Anonymous Coward · · Score: 0

      The odds of randomly flipped bits resulting in undetected corruption are virtually nil.

      Happens about once a day if you have enough storage. There are lots of other errors that can occur. In-flight errors prior to reaching the disk, the disk writing to the wrong sector, yes this happens. Firmware is a fun one. My cousin found an issue with an Intel NIC drive because every so often the driver would corrupt the data in the packet, but it would do so before it checksummed the packet. That's good he was using ZFS, otherwise this would have not been caught and the HD would be writing ad checksumming wrong data.

  8. Pars by Anonymous Coward · · Score: 0, Insightful

    Should have parred your data to begin with.

  9. Look to the past by Anonymous Coward · · Score: 0

    Tape. LTO is error-correcting and extremely stable and reliable.

    It it expensive? Yes, for small datasets - for large datasets it can be much cheaper.
    Is it a pain in the ass? Yes, but what's your data really worth?

    1. Re:Look to the past by Venotar · · Score: 2
      The tapes may be stable (I'm suspicious of that claim: their temperature tolerances aren't as high as modern hard drives, they actually care about dust, and I would expect them to be more susceptible to magnetic interference); but the tape drives are not. Over time drive heads become misaligned. They continue to write fine and can read what they write; but sufficient misalignment prevents other drives of the same type from reading the tape. That tape then becomes only as useful as the drive that wrote it. Lose the drive, you lose the use of the data on the tape. Unless you test reading the tape in a different drive than it was written from (while the writing drive is still available for pulling the data out), this condition's effectively undetectable until you actually need the data.

      There's a reason so many shops have moved to disk based backups. Tape simply isn't reliable. Tape is cheap; but definitely NOT reliable.

    2. Re:Look to the past by Anonymous Coward · · Score: 0

      For values of "large" much, much greater than the size of a home movie/music/photo collection. OP could build 2 more servers full of 2TB consumer hard drives for the price of an LTO-5 drive.

    3. Re:Look to the past by mlts · · Score: 1

      I just wish LTO drives were cheaper. Otherwise, they would be ideal for backups because they support encryption on the drives themselves. All LTO-4 tapes and newer support this, so any LTO-4 drive given the right key can decrypt another drive's tape.

      Of course, WORM media is always nice, especially with malware being a constant threat.

    4. Re:Look to the past by linear+a · · Score: 1

      Tape MUST be sufficiently stable. Reading the reliability specs off the box in front of me and running a few calculations shows that of all the tape operations ever done (at least for my brand of tape) there should be zero or at most one (1.3% chance) tape error in the history of all tape storage by humanity.

    5. Re:Look to the past by Venotar · · Score: 1

      Tape MUST be sufficiently stable. Reading the reliability specs off the box in front of me and running a few calculations shows that

      You didn't use sarcasm tags and sometimes the subtler jokes are a tad hard to discern in text.
      You are joking, aren't you? Because if not, have I got a great deal for you - I just need your bank account to transfer the money my uncle, a Nigerian prince, is trying to export. PM me!

  10. Rsync & ZFS by Anonymous Coward · · Score: 0

    rsync --checksum for the remote copies.

    ZFS is a good filesystem for bitrot protection, you don't want to propagate the errors.

  11. ZFS by Electricity+Likes+Me · · Score: 4, Interesting

    ZFS without RAID will still detect corrupt files, and more importantly tell you exactly which files are corrupt. So a distributed group of ZFS drives could be used to rebuild a complete backup by only copying uncorrupt files from each.

    You still need redundancy, but you can get away without the RAID in each case.

  12. geez what brand of drives are you using by Anonymous Coward · · Score: 0

    Windows 8.1 and 2012 support this if you setup a storage space in a mirror (could be 2 standard external usb disks) and format with the resilient file system (ReFS) rather than NTFS, it will do background scans that correct for bitrot, it's new and not well proven but that's microsoft's claim anyway

  13. Rewritten for /. by Anonymous Coward · · Score: 0

    [seen in the margins of a book on data backup owned by someone claiming to be Fermat reincarnated]

    "I have found an elegant solution to the problem of self-healing distributed backups which are neither co-located nor in constantly aware of each other's state. The details are too long to fit in this space."

    1. Re: Rewritten for /. by techprophet · · Score: 1

      And thus the saga of that damned Frenchman continues

  14. BTRFS filesystem by Anonymous Coward · · Score: 0

    Or, similarly for BTRFS:

    btrfs scrub start /btrfs

    1. Re:BTRFS filesystem by mlts · · Score: 4, Informative

      I'll be the heretic here, but on Windows 8.1 and Windows Server 2012 R2, there is a feature called Storage Spaces. It works similar to ZFS where you toss drives into a pool, then create a volume that is either simple, mirror, or with parity, and Windows does the rest. If a volume needs more space, toss some more drives in the pool.

      To boot, it even offers autotiering so data can be stored on a SSD that is frequently used, or remain on the HDDs if it isn't. Deduplication is handled on the filesystem level [1].

      No, this isn't a replacement for a SAN with RAID 6 and real-time deduplication, but it does get Windows at least in the same ballgame as Oracle with ZFS.

      [1]: Not active deduplication. The data is initially stored duplicated, but a background task finds identical blocks and adds pointers. Of course, the made from scratch filesystem, ReFS (which has the ability to check for bit rot on reads like ZFS), doesn't have this, so one is still stuck with NTFS for this feature.

    2. Re:BTRFS filesystem by girlintraining · · Score: 0, Flamebait

      I'll be the heretic here, but on Windows 8.1 and Windows Server 2012 R2, there is a feature called Storage Spaces. It works similar to ZFS where you toss drives into a pool, then create a volume that is either simple, mirror, or with parity, and Windows does the rest. If a volume needs more space, toss some more drives in the pool.

      You have no idea what you're talking about, sir. A mirror only duplicates the data. The writes are made syncronously to both sources, the reads are interleaved between devices to improve speed. In RAID-0, if either drive fails, the array is lost. In RAID-1, mirroring, data is written to two drives at the same time, and read back in an interleaved format. Unless the device itself reports a hardware error, the cluster will continue to read data back from every device on the chain. Mirroring can introduce silent bit rot because the data is read back from only one source at a time. RAID-1 (mirroring) is meant to prevent data loss due to hardware failure. It does not prevent corruption of the filesystem or your data via bit rot, and in fact under most usage scenarios, increases it.

      Without parity checking, you simply aren't addressing bit rot. Period. It could be Raid 9 Million(tm) and if all it's doing is copying the data, and not comparing it, bit rot will still proceed apace, silently eating your data. But let's say you're a good administrator that has enabled parity. Great! But there's still a problem: parity cannot restore data that has become corrupted due to bit rot -- it is a detection-only mechanism. So if you have two drives in a RAID-1 with parity configuration, as you also suggest... it will detect the file corruption, but as it cannot correct it, it will then promptly seize up and fall over dead. This is because for every N clusters written, a parity cluster is also written; This allows the array to detect if that data chunk was correctly committed; But if the data on any of the clusters within the chunk are altered later, the RAID array will only know that this chunk of data (known as a stripe in RAID), is invalid. It cannot correct it.

      The only way to truly prevent bitrot is by maintaining at least three complete copies of the data, and regularly compare between them. If one of them shows an inconsistency, the other two should still remain in agreement and that data chunk is then discarded and rewritten to the inconsistent device. This is how the Space Shuttle was designed with it's landing computer -- three fully independent computers, and each with three complete sets of sensors independently connected along main buses. Because bit rot is a major problem in space due to radiation, the system is designed at every level with 3x redundancy (or more; there are 7 gyrostabilizer systems on the ISS, for example).

      RAID10 and similar systems are two RAID5 systems which are independent and regularly compare data; These can detect which system is inconsistent, so you will always have at least one copy of your data in a consistent state. But if the RAID ever becomes non-operational and has to be rebuilt, there will be a period of time where only one known good copy is available -- bit rot could occur during this time, and all you could do is detect it, not repair it. This is why you want triple redundancy -- so you can remove one of the systems for maintenance and still have two remaining copies, thus maintaining the ability to detect bit rot.

      Now that I've explained all the ways that you're wrong, let me say that bit rot is probably not the cause of the OPs problems. Infact, USB devices are well-known for corrupting filesystems because of spontanious disconnects, power loss events, etc., and this is simply what can be expected in a typical residential environment. Even a RAID configuration in a residential environment isn't invulnerable to the "write hole" problem -- where data is partially committed to disk, but then the array suffers a power loss event.

      This is what usually causes da

      --
      #fuckbeta #iamslashdot #dicemustdie
    3. Re:BTRFS filesystem by Anonymous Coward · · Score: 0

      I thought he was concentrating on the Storage Spaces element that simply stating 'RAID will do it'. My understanding is that Storage Spaces is (as he says) MS's version of ZFS - does it not have the same data-checking features/ performance hit that 'regular' ZFS does?

      (Also, there's no need to be rude to people)

    4. Re:BTRFS filesystem by Anonymous Coward · · Score: 0

      ZFS and ReFS both have forward error correction, as every physical storage media at low level, it's just an extra waste of space, why they still do it is, because it can be spread between multiple drives to make it recoverable when one failes.

    5. Re:BTRFS filesystem by RR · · Score: 3, Informative

      The only way to truly prevent bitrot is by maintaining at least three complete copies of the data, and regularly compare between them.

      There you go again. Acting like you know what you're talking about, but you don't.

      ZFS and BTRFS have a much more efficient way to ensure correctness: CRC of everything written. That is what is checked when you do a zpool scrub or a btrfs scrub. Random errors are very unlikely to produce the same checksum, so then you only need a second copy that doesn't produce CRC errors.

      Hard drives are nowhere near as reliable as their manufacturers claim. Modern drives don't store the bits that you feed them exactly as you give them. Instead, they use CRC and error correcting codes, so they only need most of the data to be correct. Usually, if the data doesn't match the CRC, and it cannot be corrected by ECC, then you get a read error instead of corrupted data. Which, I guess, is better than getting a corrupted picture. Ideally, a RAID would be able to recreate the missing block, but I can't find any reference to a RAID doing that.

      But I've seen enough errors that I suspect something else is going on. It surely doesn't help that modern computers have many gigabytes of memory, but almost none have ECC on that memory. Your computer can be corrupting your data, and you have no warning that it's happening. In addition, hard drives lie. I'm not optimistic about the long-term storage of electronic data.

      --
      Have a nice time.
    6. Re:BTRFS filesystem by girlintraining · · Score: 2

      There you go again. Acting like you know what you're talking about, but you don't. ZFS and BTRFS have ...

      Exactly dick to do with what I said. The filesystem doesn't matter. The operating system doesn't even matter.

      Modern drives don't store the bits that you feed them exactly as you give them. Instead, they use CRC and error correcting codes, so they

      ... Which again counts for exactly dick. I'm talking about infrastructure and architecture, while you're blubbering on about the hardware.

      Which, I guess, is better than getting a corrupted picture. Ideally, a RAID would be able to recreate the missing block, but I can't find any reference to a RAID doing that.

      That's because you have no experience as a network administrator in a professional environment. Because then you'd know that's the very thing RAID was designed to do: Recover from hardware failure, which includes sectors becoming unreadable. You are clearly confused both which what level of abstraction is being discussed (architecture versus hardware), as well as the different types of failure modes each of these solutions presents. Bit rot is a physical process that occurs in all magnetic media, and at sufficiently small-scale, can also affect non-persistent storage such as RAM.

      It surely doesn't help that modern computers have many gigabytes of memory, but almost none have ECC on that memory.

      That's because ECC adds an extra layer of complexity to solve a problem that doesn't occur very often in computers, and when it does, the most severe consequence is usually that the computer crashes or behaves abnormally. For residential, and even most commercial uses, ECC memory just isn't needed. But for a select few use scenarios where data integrity is absolutely critical -- such as, say, nuclear power plants, air traffic control systems, certain types of hospital equipment, or financial processing systems, the added cost is justified because they need high availability/high reliability of those systems. It's also used in certain aerospace applications because the physical mechanism that causes bitrot -- high energy radiation, increases quite a bit at higher altitudes, and in space increases several orders of magnitude -- and if you're going to put something in geostationary orbit, it then takes the full brunt of solar radiation with no mitigation. Correcting for memory problems in these situations is better done at the hardware level; hence ECC memory.

      Your consumer-grade computer's memory is a piece of shit. It's made with commodity capacitors and ICs that are stamped out in bulk for super cheap. And, big surprise -- super cheap doesn't mean super reliable. But we don't need super reliability -- when our system shows obvious signs of a failing memory stick, we just drive to the store, plunk down a $20 and abscond with a new one. Problem solved.

      I'm not optimistic about the long-term storage of electronic data.

      That's because, as previously pointed out, your experience comes from consumer-grade hardware that you don't fully understand the design considerations made. NASA has had great success in the long-term storage of magnetic media -- in fact there was an article not long ago about how they had to reverse-engineer equipment designed during the 1960s for the Apollo program to recover data on tape reels, when they lacked the original equipment it was recorded from. They discussed how the tapes themselves had become brittle and the ferrous oxide would actually peel off in chunks while reading, much like how paint peels off a house, but they were able to recover this data anyway. The technology we have today is far more sophisticated and unlike old tape-technology doesn't require physical contact with the source media to read it. There are companies like OnTrack that specialize in data recovery from harddrives and boast a rema

      --
      #fuckbeta #iamslashdot #dicemustdie
    7. Re:BTRFS filesystem by girlintraining · · Score: 0

      P.S., and included separately because I didn't want to detract from the informative nature of my post;

      The next time you want to slam someone for "acting like you know what you're talking about", don't respond with a bunch of links to Wikipedia. Links, I might add, that are only marginally-relevant to the topic at hand. That shit wouldn't fly in college, so why do you think it's going to hold weight in a professional environment? As well, making personal attacks on someone in such an inept fashion doesn't earn you any points in the workplace either. That only works here on the internet, and even then only when someone tells the fanboys their favorite band sucks.

      --
      #fuckbeta #iamslashdot #dicemustdie
    8. Re:BTRFS filesystem by Anonymous Coward · · Score: 0

      ...Didn't he say "or with Parity"...

      Isn't your whole rant on parity?

      You sound like a real douche

      Someone please take the keyboard from that Nazi.

      Have a great day

      - The internet

    9. Re:BTRFS filesystem by bidule · · Score: 1

      The only way to truly prevent bitrot is by maintaining at least three complete copies of the data, and regularly compare between them.

      Erm, no. Hamming(7,4) doesn't even need double the space, and that was 60 years ago.

      --
      ID: the nose did not occur naturally, how would we wear glasses otherwise? (apologies to Voltaire)
    10. Re:BTRFS filesystem by MarkTina · · Score: 2, Informative

      RAID10 and similar systems are two RAID5 systems which are independent and regularly compare data; These can detect which system is inconsistent, so you will always have at least one copy of your data in a consistent state.

      You were doing quite well up until you said that sentance .....

    11. Re:BTRFS filesystem by DamnStupidElf · · Score: 1

      Without parity checking, you simply aren't addressing bit rot. Period. It could be Raid 9 Million(tm) and if all it's doing is copying the data, and not comparing it, bit rot will still proceed apace, silently eating your data. But let's say you're a good administrator that has enabled parity. Great! But there's still a problem: parity cannot restore data that has become corrupted due to bit rot -- it is a detection-only mechanism.

      This is incorrect for Reed-Solomon based RAID (levels 6 and higher such as RAID Z3). RAID6 can correct bit rot on a single disk and in general for t parity disks, floor(t/2) random errors per RS code can be corrected. All the RS-based RAID systems I've seen essentially store the RS code across devices using a GF(2^8) code, meaning that up to an entire byte could be corrupted by bit rot at a given logical address across all the stripes and still be corrected. All the details are on Wikipedia. Not all RAID-6+ implementations actually check the parity when reading, and I have no idea how many can solve the error locator polynomial for each RS code to actually identify and correct bit rot in multiple locations in different codes versus just dealing with known bulk errors (e.g. failed disks).

      Now that I've explained all the ways that you're wrong, let me say that bit rot is probably not the cause of the OPs problems. Infact, USB devices are well-known for corrupting filesystems because of spontanious disconnects, power loss events, etc., and this is simply what can be expected in a typical residential environment. Even a RAID configuration in a residential environment isn't invulnerable to the "write hole" problem -- where data is partially committed to disk, but then the array suffers a power loss event.

      Any proper file system will have a large enough transaction/intent log that can be replayed to correct partial data/metadata writes due to power failure and the RAID write hole, etc.. Most file systems in use are not proper, of course, but at least a few are available.

    12. Re:BTRFS filesystem by GigaplexNZ · · Score: 1

      My understanding is that Storage Spaces is (as he says) MS's version of ZFS - does it not have the same data-checking features/ performance hit that 'regular' ZFS does?

      No, it does not have the same data-checking features. Yes, it has a performance hit. Worst of both worlds. I've used it, and junked it as it was literally an order of magnitude slower than RAID5 via mdadm on Linux and didn't actually add any resiliency over RAID5 or flexibility as to grow an existing pool, you need to add multiple similarly sized drives since it doesn't rebalance. This is despite their marketing claims that you can add mismatched drives in an ad hoc fashion and have it "just work".

      The only way to get Microsofts unproven resiliency benefits is to use ReFS in conjunction with mirroring (not parity) on the expensive server editions. Windows 8/8.1 does not support ReFS.

    13. Re: BTRFS filesystem by Anonymous Coward · · Score: 0

      Close! RAID10 is a pair of mirrored striped arrays (RAID1+RAID0), not mirrored RAID5 arrays. If there were such a RAID level it would be RAID15 (or 51?). There is such a thing as RAID0+1, being of course a striped array of individually mirrored drives, but I'm not sure why you'd go for it over RAID10. RAID0+1 is rare.

    14. Re:BTRFS filesystem by rsmith-mac · · Score: 1

      Without parity checking, you simply aren't addressing bit rot. Period. It could be Raid 9 Million(tm) and if all it's doing is copying the data, and not comparing it, bit rot will still proceed apace, silently eating your data. But let's say you're a good administrator that has enabled parity. Great! But there's still a problem: parity cannot restore data that has become corrupted due to bit rot -- it is a detection-only mechanism. So if you have two drives in a RAID-1 with parity configuration, as you also suggest... it will detect the file corruption, but as it cannot correct it, it will then promptly seize up and fall over dead. This is because for every N clusters written, a parity cluster is also written; This allows the array to detect if that data chunk was correctly committed; But if the data on any of the clusters within the chunk are altered later, the RAID array will only know that this chunk of data (known as a stripe in RAID), is invalid. It cannot correct it.

      One quick note: a mirrored space running ReFS will do automatic checksumming and scrubbing. This isn't done for parity spaces, though I'm not sure why this is.

      http://blogs.msdn.com/b/b8/archive/2012/01/16/building-the-next-generation-file-system-for-windows-refs.aspx

    15. Re:BTRFS filesystem by girlintraining · · Score: 1

      This is incorrect for Reed-Solomon based RAID (levels 6 and higher such as RAID Z3). RAID6 can correct

      ... Yes, but earlier systems, which the OP was suggesting could be used for this purpose, lacks that functionality. Also, please reset your sarcasm detector, it appears to be out of alignment -- a functional detector would have pinged on "Raid 9 Million(tm)".

      Any proper file system will have a large enough transaction/intent log that can be replayed to correct partial data/metadata writes due to power failure and the RAID write hole, etc.. Most file systems in use are not proper, of course, but at least a few are available.

      Correct, and those that are aren't immune to human stupidity. No filesystem can save you from a guy who decides to pour beer into the storage array, or who goes to move a directory and misclicks sending it to the trash. Disaster recovery is not a simple matter of choosing the right filesystem and then patting yourself on the back. It requires careful planning and consideration... None of which the majority of the people on this thread seem to be capable of. At least you seem to have some grasp of the underlying technology.

      --
      #fuckbeta #iamslashdot #dicemustdie
    16. Re:BTRFS filesystem by gmhowell · · Score: 1

      The next time you want to slam someone for "acting like you know what you're talking about", don't respond with a bunch of links to Wikipedia. Links, I might add, that are only marginally-relevant to the topic at hand. That shit wouldn't fly in college, so why do you think it's going to hold weight in a professional environment?

      Slashdot, a 'professional environment'? As if we needed more proof that you're a fucking lunatic...

      --
      Jesus was all right but his disciples were thick and ordinary. -John Lennon
    17. Re:BTRFS filesystem by Common+Joe · · Score: 1

      The only way to truly prevent bitrot is by maintaining at least three complete copies of the data, and regularly compare between them

      Disagree. I've had an idea for a while that I'm surprised backup vendors don't do: two copies with a check sum and automatic restore*. The two copies and a check sum are a variation of the three-copy idea, but without the third copy. (I'd write a backup program myself with this idea except it would take too long to implement all of the ideas I have that I think every home backup program should have. The backup programs on the market are getting better, but they could still stand a few more improvements like this idea.)

      My idea: On the first backup, the original copy on the hard drive gets backed up to the USB backup drive along with a check sum. (Despite your concerns about USB, I believe the original poster is talking about home use and can't really avoid this without significant costs.) When the backup is run a second time (like a day or week later), the original on the hard drive is compared to what is on the backup. Check sums are also performed. If something doesn't match, then you know you have bit rot. The check sum will determine whether the backup or the original is invalid and the program will then take appropriate action all without asking the user.

      *Of course, Microsoft had to monkey up the works with using this idea. When you merely open an Excel file, it will modify the contents of the file. Very little can be found on this phenomenon, but here is something about it from Microsoft. Through personal experience, I have found it does not change the modified date and time after the file is closed, but it does modify contents. (I discovered this while playing with a prototype of my idea.) This fits with what they say in the link I provide, but it's not exactly the thing that jumps out at you after the first or second read. When only a single user uses the file, this phenomenon is not seen, although I suspect that Microsoft writes to the file then as well -- an idea which I absolutely hate. Truecrypt is also guilty of this, but at least it does it on purpose, it is documented, and you can turn it off. For security reasons, there is a setting that allows changes to a truecrypt container without changing the modified date and time marks of the truecrypt container file.

    18. Re:BTRFS filesystem by RR · · Score: 1

      I know, I shouldn't respond to a troll, but I'm feeling generous today.

      There you go again. Acting like you know what you're talking about, but you don't. ZFS and BTRFS have ...

      Exactly dick to do with what I said. The filesystem doesn't matter. The operating system doesn't even matter.

      Um, excuse me? The filesystem absolutely does matter. Traditionally, the filesystem assumes that any data retrieved from the drive has been put there, earlier. Obviously, drives don't do that 100% reliably. It's an important innovation, that these newer filesystems will add their own checksums to the data that they write, so they can detect and sometimes fix corrupted reads.

      I'm talking about infrastructure and architecture, while you're blubbering on about the hardware.

      Get your head out of the clouds. Everything does come down to hardware. In fact, given your other posts about hardware, I sometimes doubt that you actually interact with the hardware that you talk about.

      Ideally, a RAID would be able to recreate the missing block, but I can't find any reference to a RAID doing that.

      That's because you have no experience as a network administrator in a professional environment. Because then you'd know that's the very thing RAID was designed to do: Recover from hardware failure, which includes sectors becoming unreadable.

      That's an aspect of software. Of course a RAID with sufficient parity will recover from a total drive failure. It's much harder to find reference to how a particular RAID will respond to intermittent errors. But if you're not just a blowhard, I'd like to see some of your links to documents describing how the RAIDs that you know will handle drive read errors. Not total failures. Just read errors.

      Speaking of RAID, ZFS has its own concept of RAID that supports up to triple parity, with a different architecture than a normal storage system. Still, I haven't found any reference to how it handles drive read errors.

      It surely doesn't help that modern computers have many gigabytes of memory, but almost none have ECC on that memory.

      That's because ECC adds an extra layer of complexity to solve a problem that doesn't occur very often in computers, and when it does, the most severe consequence is usually that the computer crashes or behaves abnormally. For residential, and even most commercial uses, ECC memory just isn't needed. But for a select few use scenarios where data integrity is absolutely critical -- such as, say, nuclear power plants, air traffic control systems, certain types of hospital equipment, or financial processing systems, the added cost is justified because they need high availability/high reliability of those systems.

      What a horrible attitude to data integrity. Computer crashes, I lose data. Computer behaves abnormally, worst case scenario is it calculates some important thing wrong, say the root of an important filesystem B-tree, and the filesystem needs to go through an expensive repair. My data are important to me. I use my computer for my personal financial processing, and I know I'm not alone. My old computer had an extra 128kB of memory to provide parity checks for the other 1MB. I imagine that stupid traditions of cost-cutting are why my new computer does not have 2GB of memory to provide ECC for the other 16GB.

      Your consumer-grade computer's memory is a piece of shit. It's made with commodity capacitors and ICs that are stamped out in bulk for super cheap.

      And your server memory isn't? Back up a moment... I thought OP was talking about being able to detect bitrot in family photos, and now you're telling him he should buy a server with memory lovingly crafted for high reliability? Which reliability i

      --
      Have a nice time.
    19. Re:BTRFS filesystem by nhat11 · · Score: 1

      I don't think it's justify to call someone a "dick" when someone didn't used a derogatory word at you. You need to calm down a bit there

    20. Re:BTRFS filesystem by Anonymous Coward · · Score: 0

      This is wrong. Parity checking is not bit rot detection. XOR-parity calculations is for rebuild a broken drive in a raid. It is not designed to detect and counter bit rot. To do this, you need checksums (not parity) designed to detect and react to changes of the data. Parity does not do that, it only restores data, not detects changes in data.

    21. Re:BTRFS filesystem by jbo5112 · · Score: 1

      I have seen both Dell RAID-5 and Sun RAID-6 arrays fail with 3+ simultaneous disk failures each. Google ran a Petabyte Sort benchmark in 2008 (6 hours to sort 10 trillion 100-byte records) and was not at all surprised that they had at least one hard drive failure on every attempt (4+ drive failures per day). I have seen enterprise tape systems fail to read their data (hopefully there was redundancy, but I don't know). I have seen backup systems have major performance glitches and fail to restore within their needed time frame. Facebook, for example, only has a few seconds to recover from a failed server before customers might get angry, and has built systems to handle it because it's necessary to provide a good service. The major players who are succeeding and profiting at giving away free services to hundreds of millions plan that all data storage will fail regularly, and plan accordingly.

      A little primer for those of us who haven't kept up with new storage technologies since the 90's.

      Google deals with enough data that they cannot consider any of your technologies reliable enough. Five years ago, they were already processing 20PB of data every single day with map reduce, and if you have to buy enough systems, even the best RAID6 SAN systems will break regularly. Statistically, a small chance repeated often enough gives you a virtual guarantee of probability. Google generally doesn't bother with expensive technologies like SAN's and RAID, or even bother with enterprise drives (spinning disks -- they probably use an enterprise PCIe flash). You can make what you want of the enterprise drive decision, but I'm pretty sure I've read from at least a couple of sources that enterprise drives are just as prone to failure as regular drives. The major differences are warranty and firmware (e.g. supporting RAID friendly reads). Numerous sources have substantiated that the manufacturers' MTBF numbers are pure marketing fiction. They probably boast a lower error rate, but I have not seen a comparison, only reports that they are off by several orders of magnitude.

      What Google does is avoid any redundancy in their machines and take the "redundant array" to a whole new level: Redundant Array of Inexpensive Servers. Multiple copies of the data are written to different servers in different cabinets, and with each data block a checksum is stored. Every time the data is read, the checksum is verified. This way you know with 1 single read if you have bitrot, and can correct it with 1 good read. Now you no longer have to keep comparing 3 copies of the data to correct bitrot. The Hadoop project copied this with their HDFS, and many other large scale technologies have followed suit.

      At a desktop level, ZFS, BTRFS and (I think) Windows Storage Spaces do something similar, combining RAID technology (0/1/5/6 maybe 1E) with checksums inside the file system. If a drive fails or even just that the checksum doesn't verify there can be redundancy to attempt to rebuild from automatically in the file system, giving you a better data guarantee than any RAID card I have seen. If the journaling is done correctly, it shouldn't be susceptible to losing data from a power loss either, but home battery backups aren't too expensive. The OP was asking specifically about bitrot. A lot of URE's (uncorrectable read errors) get labeled and treated as bitrot, but it sounds like data he has previously verified is now corrupt (actual rot), not that the reason for corrupt blocks matters once they are corrupt. Bitrot happens more frequently when you don't have such stringent environmental controls in your home as you would in a data center, and I have personally seen it with only 10's of GB of my data.

      In my experience, data that is backed up and archived, isn't a prime target for user error nor gross negligence regarding data backups. The user is definitely experiences some sort of URE. In this case, a proper file system is quite important for protecting the data. I would recommend setting up a multi-drive NAS using

    22. Re:BTRFS filesystem by DamnStupidElf · · Score: 1

      Yes, but earlier systems, which the OP was suggesting could be used for this purpose, lacks that functionality. Also, please reset your sarcasm detector, it appears to be out of alignment -- a functional detector would have pinged on "Raid 9 Million(tm)".

      Apparently ReFS will have data and metadata checksums which combined with storage spaces could detect and correct bit rot if implemented properly. While I have no idea if the OP researched the actual capabilities of ReFS, with checksums it is possible to detect bit rot without parity, and correct it with an extra (good) copy. Sarcasm is fun, but only if it's accurate. You might argue that checksums are just a form of parity and maybe I'd agree with you since apparently the error-correction codes for RAID-6 are generally referred to as parity despite actually being linear error-correction codes. But the sense I got from your comment was that you didn't believe it was possible to prevent bit rot with just two copies of checksummed data, or by storing a single copy with an error-correcting code.

      Correct, and those that are aren't immune to human stupidity. No filesystem can save you from a guy who decides to pour beer into the storage array, or who goes to move a directory and misclicks sending it to the trash. Disaster recovery is not a simple matter of choosing the right filesystem and then patting yourself on the back. It requires careful planning and consideration... None of which the majority of the people on this thread seem to be capable of. At least you seem to have some grasp of the underlying technology.

      Most of your other points were spot-on. Relying on single storage systems that aren't geographically distributed is just asking for trouble. Not keeping administratively separate backups or immutable version history (read-only snapshots, revision control, etc.) is also a quick way to lose your data. I don't think there are any foolproof solutions you can get at the moment. Replicated git repos are close, but there was that KDE fiasco with git not explicitly checking the cryptographic hashes during all of its operations and allowing bitrot to be replicated to other repositories. Dumb. I have never been a fan of the Linus/Linux philosophy of trusting the hardware to provide 0 bit errors per yottabyte. It's just not realistic. Of course that means that the next step will be implementing lock-step (or at least consistency-point comparison) processing in software to work around CPU/RAM errors...

  15. It has been done by Anonymous Coward · · Score: 0

    The functionality isn't new. Large robotic tape libraries would pull out tapes periodically and verify health of the media, copying to new if unwell.

    About 5 years ago the RAID chip vendor folks were touting RAID 6, as required for sets of 1+ TB drives as the potential for experiencing a read fault on recovery of a failed RAID 5 becomes much more likely as drive volumes increase.

    So first line of defense RAID storage with health monitoring, then are backups ( offsite as well )

  16. Par2 and Reed-Solomon by mpol · · Score: 1

    Bitrot does happen.
    When a disk has a bad block and detects that, it will try to read the data from it and put it on a block from the reserve-pool. However, the data might be bad and corrupt, so you lose data.
    Disks do have a Reed-Solomon (aka par-files) index, so it can repair some damage, but it doesn't always succeed.

    Anyway, what I do for important things, is have par2 blocks that go along with the data. All my photo-archives have par2 files attached to them.

    I reckon you could even automate it. To have a script that traverses all directories and tries to repair the data if it's broken. If it fails, you get notified.

    --

    Well, don't worry about that. We can get you back before you leave. (Dr. Who)
    1. Re:Par2 and Reed-Solomon by Anonymous Coward · · Score: 0

      If anything, it should notify you if it has to repair, thereby alerting you to potential issues with the storage media. If you get notified when the repair fails, it would have been too late.

  17. zfs or btrfs by Anonymous Coward · · Score: 1, Interesting

    First off, make sure you have a separate backup storage volume that doesn't get touched by normal applications and which keeps history. Backup doesn't protect you very much if accidental deletes or application bugs corrupt all your copies within one backup cycle. Use an appropriate backup tool to manage this, where appropriateness depends on your skill and willingness to tinker. You could use something as simple as an rsync --link-dest job, or rsync --inplace in combination with filesystem snapshots, or some backup suite that will store history in its own format.

    For bit-rot protection of the stored backup data, make a backup volume using zfs or btrfs with at least two disks in a mirroring configuration (where the filesystem manages the duplicate data, not a separate raid layer). Set it to periodically scrub itself, perhaps weekly. It will validate checksums on individual file extents. If one copy of a file extent cannot be read successfully, it will rewrite it using the other valid mirror. This rewrite will allow the disk's block remapping to relocate a bad block and keep going. The ability to validate checksums is the value add beyond normal raid, where the typical raid system only notices a problem when the disk starts reporting errors.

    Monitor overall disk health and preemptively replace drives that start to show many errors, just as with regular raid. Some people consider the first block remapping event to be a failure sign, but you may replace a lot of disks this way. Others will wait to see if it starts having many such events within days or weeks before considering the disk bad.

  18. Re:uhuh by Anonymous Coward · · Score: 2, Informative

    Warning for all UNIX newbies: that command will reset the file to 0 bytes. Just that you know.

    (I've seen some cases when a rookie is setting up a Linux system and people jokingly throw him these "rm -rf /" commands and the poor guy actually ends up wrecking his system.)

  19. Splendid by aaaaaaargh! · · Score: 0

    We have hundreds of thousands of family pictures and videos we're trying to save

    Yes, you've got to save them! Your children will be so thankful for countless extended family diashow evenings!

    "Look, here is little Tim vomiting when he was 12 years old! How sweet! -- Another vomiting picture. -- Another one. -- I'll skip the next 11 images, still 12,371 to go after all..."

    1. Re:Splendid by Anonymous Coward · · Score: 0

      "If you can't say something nice, don't say nothing at all."
      Who are you to belittle the things which others enjoy?

    2. Re:Splendid by Anonymous Coward · · Score: 0

      If you are ever able to procreate, you will understand the desire to preserve your photos.

    3. Re:Splendid by Sloppy · · Score: 1

      You really gotta be careful with that attitude. The photos seem worthless at the time you take them, and most of them remain worthless forever. Most of them. Then you see that old picture of when your now-grown-up dog used to be a cute little puppy, and awww!!!

      --
      As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
    4. Re:Splendid by F.+Lynx+Pardinus · · Score: 1

      I understood him to be commenting on the number, not the existence, of the photos. I'm the designated archivist for the family's (7 members in 2 households) photos. At last check , I have about 20k photos in the archive. It's hard to imagine having "hundreds of thousands" without having enormous amounts of redundant or irrelevant photos, which is what the parent post is poking fun of.

    5. Re:Splendid by Anonymous Coward · · Score: 0

      If you are ever able to procreate, you will understand the desire to preserve your photos.

      I'd think even a person with ancestors would grasp this, but the GP has proved me wrong. I suspect we're dealing not with naivete or lack of experience, but a more fundamental personality/mental health issue.

    6. Re:Splendid by Anonymous Coward · · Score: 0

      The point is that they do NOT enjoy it.

    7. Re:Splendid by Anonymous Coward · · Score: 0

      Someone who keeps hundreds of thousands of family pictures has mental health issues.

    8. Re:Splendid by Anonymous Coward · · Score: 0

      Just because you do things one way does not mean that everyone else must do so, nor does it justify ridicule.

      Where does the OP say that this is for their main family photo albums? They mentioned archives, which according to Merriam-Webster means:

      1 : a place in which public records or historical documents are preserved; also : the material preserved —often used in plural
      2 : a repository or collection especially of information

      You might have taken an entire roll of film, but only stuck 3 images in a photo album for display. The rest go into your archive.

      Supposing that the OP is archiving every exposure, even those that didn't make it into the family photo album, both the raw image and an easy-to-read processed version of the raw file, and perhaps a metadata file as well, it's very simple to have reached the number of images cited. Add in video and reaching the disk space used is also quite feasible.

      There is merit to this behavior. While an image might not be interesting in the time immediately following its capture, events may occur which cause it to be of interest in the future, and having it available in the archive is then beneficial.

    9. Re:Splendid by aaaaaaargh! · · Score: 1

      Jesus Christ, take it easy, man. I was making a harmless joke that anyone who was ever forced to watch boring holiday slideshows would be able to understand. Now I'm being accused of mental health issues, not being able to procreate and whatever else.

      If hundreds of thousands of family pictures doesn't seem a bit excessive to you, so be it. After all, it takes only a few weeks to sort through them. But please calm down a little and stop spamming AC troll posts.

    10. Re:Splendid by Anonymous Coward · · Score: 0

      I must have missed the part where the OP said they didn't like keeping archives of their images. I thought the point was that they did like it, and are interested in ensuring that the results of their hobby were preserved.

    11. Re:Splendid by flyingfsck · · Score: 1

      I was thinking that bitrot is the computer god's way to protect our descendants...

      --
      Excuse me, but please get off my Pennisetum Clandestinum, eh!
    12. Re:Splendid by cas2000 · · Score: 0

      think of the children!

      they're the ones who'll be forced to watch the slideshows.

      and eventually suffer the indignity of having "cute" baby photos of them brought out to embarass them if they're ever dumb enough to bring a gf or bf home.

      that photo archive isn't an innocent collection of memories, it's malicious forward-planning by parents.

  20. parity archives by Anonymous Coward · · Score: 0

    Add 20% par2 files.

  21. freenas with zfs partition by Anonymous Coward · · Score: 0

    zfs will detect and correct the bitrot. freenas is probably the easier solution for providing that file system to a household.

  22. Rar by rava · · Score: 1

    I'm glad you're bringing this up. I haven't seen any backup software that addresses bitrot. And bitrot does happen, I lost a few pics to it. What I do: I have a monthly script that makes a RAR archive from my pictures directory. RAR checks file integrity but also has "recovery" options that allow you to recover files from a damaged archive (to a point)

    --
    {Science sans conscience n'est que ruine de l'âme}
  23. For single drives by Anonymous Coward · · Score: 0

    For single disk setups use ZFS with copies=2. You will lose half of your storage, but you will gain error correction. With the default copies=1, you get error detection, but not correction.

  24. A paranoid setup by brokenin2 · · Score: 4, Interesting

    If you really want hassle free and safe, it would be expensive, but this is what I would do:

    ZFS for the main storage - Either using double parity via ZFS or on a raid 6 via hardware raid.

    Second location - Same setup, but maybe with a little more space

    Use rsync between them using the --backup switch so that any changes get put into a different folder.

    What you get:

    Pretty disaster tolerant
    Easy to maintain/manage
    A clear list of any files that may have been changed for *any* reason (Cryptolocker anyone?)
    Upgradable - just change drives
    Expense - You can build it for about $1800 per machine or $3600 total if you go full-on hardware raid. That would give you about 4TB storage after parity (4 2TB drives - $800, Raid Card - $500, basic server with room in the case - $500)

    What you don't get: Lost baby pictures/videos. I've been there, and I'd pay a lot more than this to get them back at this point, and my wife would pay a lot more than I would..

    Your current setup is going to be time consuming, and you're going to lose things here and there anyway.. If you just try to do the same thing but make it a little better, you're still going to have the same situation, just not as bad. In this setup you have to have like 5 catastrophic failures to lose anything, sometimes even more..

    1. Re:A paranoid setup by Minwee · · Score: 1

      Expense - You can build it for about $1800 per machine or $3600 total if you go full-on hardware raid. That would give you about 4TB storage after parity (4 2TB drives - $800, Raid Card - $500, basic server with room in the case - $500)

      Either use a RAID controller or use ZFS. It's not a good idea to use both at the same time.

    2. Re:A paranoid setup by brokenin2 · · Score: 1

      I've used them together. Seems to work just fine.. Just don't let ZFS know that there's more than 1 drive. You can't have them both trying to manage the redundant storage.

      ZFS has some great features besides it's redundant storage. You can get them from other filesystems too though I suppose, but I like snapshots built into the filesystem. It *is* overkill to have the filesystem doing checksums and the raid card detecting errors as well, but that's why this is the paranoia setup... Not really looking for the performance king..

      ZFS certainly isn't necessary though, if you've got hardware raid.

    3. Re:A paranoid setup by fnj · · Score: 1

      Never use a RAID controller, period. ZFS builtin RAIDZ is far superior in every way.

    4. Re:A paranoid setup by Anonymous Coward · · Score: 1

      If you really want hassle free and safe, it would be expensive, but this is what I would do:

      ZFS for the main storage - Either using double parity via ZFS or on a raid 6 via hardware raid.

      Second location - Same setup, but maybe with a little more space

      Use rsync between them using the --backup switch so that any changes get put into a different folder.

      What you get:

      Pretty disaster tolerant
      Easy to maintain/manage
      A clear list of any files that may have been changed for *any* reason (Cryptolocker anyone?)
      Upgradable - just change drives
      Expense - You can build it for about $1800 per machine or $3600 total if you go full-on hardware raid. That would give you about 4TB storage after parity (4 2TB drives - $800, Raid Card - $500, basic server with room in the case - $500)

      What you don't get: Lost baby pictures/videos. I've been there, and I'd pay a lot more than this to get them back at this point, and my wife would pay a lot more than I would..

      Your current setup is going to be time consuming, and you're going to lose things here and there anyway.. If you just try to do the same thing but make it a little better, you're still going to have the same situation, just not as bad. In this setup you have to have like 5 catastrophic failures to lose anything, sometimes even more..

      $100/tb is pretty expensive. $40 or $50 per TB if you wait for something good on Slickdeals. Enterprise/highspeed drives are a waste of $ since this is cold storage, and you will want to upgrade after 3 years anyway. You can also get a 2 bay hardware NAS that lets you do whatever you want (linux based OS) for pretty cheap. In short, you're right and you're wrong. A careful spender could get exactly what you describe for about half the cost.

    5. Re:A paranoid setup by bill_mcgonigle · · Score: 1

      Use rsync between them using the --backup switch so that any changes get put into a different folder. ...
      A clear list of any files that may have been changed for *any* reason (Cryptolocker anyone?)

      +1 Clever.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    6. Re:A paranoid setup by safetyinnumbers · · Score: 1

      You may need to use -c to force rsync to compare checksums.

      I use something like this as part of my backup DATE=$(date +%C%y%m%d%H%M)
      rsync --del --backup --backup-dir=../changedfiles_$DATE

      The whole backup also goes to S3 glacier.

      As an added step - I don't delete pictures from my camera unless they match the checksum of files in the _backup_ - not the original copy (via a script).

      That way, once they're first copied from the camera, a single failure in the original, PC copy or backup copy will all result in the camera version remaining and I can check what has gone wrong.

    7. Re:A paranoid setup by Anonymous Coward · · Score: 1

      If you use ZFS on a hardware raid you can only detect bitrot. Data will be lost!
      If you use RAIDZ[23] on individual drives you can repair bitrot. Data can be saved!

    8. Re:A paranoid setup by brokenin2 · · Score: 1

      Even it's ability to chirp loudly when a drive fails?

      I think you've used some pretty lame raid controllers.

      How about it's ability to not waste CPU?

      ZFS is good... great even, but (irony intentional) absolute statements are always wrong!

    9. Re:A paranoid setup by cas2000 · · Score: 2

      > Just don't let ZFS know that there's more than 1 drive.

      That is *precisely* the wrong thing to do. As in, the exact opposite of how you should do it.

      Instead, configure the RAID card to be JBOD and let ZFS handle the multiple-drive redundancy (raidz and/or mirroring), as well as the error detection and correction.

      Otherwise, there is little or no benefit in using ZFS. ZFS can't correct many problems if it doesn't have direct control over the individual disks, and RAID simply can't do the things that ZFS can do.

      Of course, this means that you're actually better off with a cheap dumb non-raid HBA card (or even just the SATA ports on your motherboard if there's enough of them) than an expensive HW RAID card. This is another advantage of ZFS.

      (a good option is to use an LSI SAS2008 card or similar, and make sure it's re-flashed to "IT" mode firmware if you're using consumer-grade SATA drives with it to avoid TLER issues. readily available brand new for under $100 for 8 SAS/SATA ports)

      > You can't have them both trying to manage the redundant storage.

      yes. and it's ZFS that should be managing it, not the raid card.

      > ZFS certainly isn't necessary though, if you've got hardware raid.

      wrong. RAID does not provide error detection or correction. RAID protects against drive failures only, not silent corruption.

    10. Re:A paranoid setup by cas2000 · · Score: 3, Informative

      good post, except for three details:

      1. if you're using ZFS on both systems, you're *much* better off using 'zfs send' and 'zfs recv' than rsync.

      do the initial full copy, and from then you can just send the incremental snapshot differences from then on.

      one advantage of zfs send over rsync is that rsync has to check each file for changes (either file timestamp or block checksum or both) every time you rsync a filesystem or directory tree. With and incremental 'zfs send', it only sends the incremental difference between the last snapshot sent and the current snapshot.

      you've also got the full zfs snapshot history on the remote copy as well as on the local copy.

      (and, like rsync, you can still run the copy over ssh so that the transfer is encrypted over the network)

      2. your price estimates seem very expensive. with just a little smart shopping, it wouldn't be hard to do what you're suggesting for less than half your estimate.

      3. if you've got a choice between hardware raid and ZFS then choose ZFS. Even if you've already spent the money on an expensive hardware raid controller, just use it as JBOD and let ZFS handle the raid function.

    11. Re:A paranoid setup by Anonymous Coward · · Score: 0

      Rubbish, obviously a ZFS noob.

      You can set up mechanisms to have errors reported to you immediately via so many method, the moment a drive fails, some chassis also have hdd fail alarm built inside the backplanes.

      CPU is a moot point, things have been constantly improving both on hardware (cpu/chipset) and software (kernel/drivers/daemons).

      Raid cards also waste more electricity than dumb HBAs and especially waste more electricity than on board sata chips, see motherboard with 12 sata ports.

      There are no reason to use raid cards unless you are too jaded to adapt to ZFS.

    12. Re:A paranoid setup by Anonymous Coward · · Score: 0

      It is logically impossible to do what ZFS does without the RAID understanding the file system. If you're using hardware RAID, you're losing out. As for "wasting CPU", most of these servers are mostly idle. Making use of spare CPU, which is a small portion of the power cost, to increase speeds and reliability and other awesome sysadmin features.

      Soon you will start preaching about how Protected Mode is a waste of CPU resources. Everything should run in the kernel, get rid of user land!

    13. Re:A paranoid setup by rdnetto · · Score: 1

      Hardware RAID is a bad idea for backups, as the card is a single point of failure, and anything not from the exact same batch may use a different (proprietary) RAID format. At least with Linux softraid (either mdadm or btrfs/ZFS), you can always download a copy of the source and checkout the old version, if necessary.

      --
      Most human behaviour can be explained in terms of identity.
  25. Re:uhuh by Anonymous Coward · · Score: 0

    Boy you must be a real hit at parties.

  26. WinRAR... by mlts · · Score: 1

    WinRAR isn't perfect, but it works on a number of platforms, be is OS X, Windows, Linux, or BSD. This provides not just CRC checking, but one can add recovery records for being able to repair damage. If storing data on a number of volumes (like optical media), one can make recovery volumes as well, so only four CDs out of a five CD set are needed to get everything back.

    It isn't as easy as ZFS, but it does work fairly well for long term archiving, and one can tell if the archive has been damaged years to decades down the road.

  27. Re:uhuh by CanHasDIY · · Score: 1

    Warning for all UNIX newbies: that command will reset the file to 0 bytes. Just that you know.

    (I've seen some cases when a rookie is setting up a Linux system and people jokingly throw him these "rm -rf /" commands and the poor guy actually ends up wrecking his system.)

    I think the general consensus is that if you're stupid enough to run a command you got from SomeRandomInternetAsshole420 without verifying what it will do first, you deserve to have your system wiped.

    --
    An enigma, wrapped in a riddle, shrouded in bacon and cheese
  28. BTRFS or ZFS by mcelrath · · Score: 1

    BTRFS and ZFS both do checksumming and can detect bit-rot. If you create a RAID array with them (using their native RAID capabilities) they can automatically correct it too. Using rsync and unison I once found a file with a nice track of modified bytes in it -- spinning rust makes a great cosmic ray or nuclear recoil detector. Or maybe the cosmic ray hit the RAM and it got written to disk. So, use ECC RAM.

    But "bit-rot" occurs far less frequently than this: I find is that on a semi-regular basis my entire filesystem gets trashed (about once every year or three). This happened to me just last week...my RAID1 BTRFS partitions (both of them) got trashed because one of my memory modules went bad. In the past I've had power supplies go bad causing this, or brown outs, and in other cases I never identified the cause. I've seen this happen across ext3, jfs, xfs, and btrfs so it's (probably) not the file system's fault. In such cases, fsck will often make the problem worse. (Use LVM and its "snapshot" feature to perform fsck on a snapshot without destroying the original). You'd think these advanced filesystems would have a way to rewind to a working copy (for instance in BTRFS -- mount a previous "generation") but this seems to not be the case.

    Anyway, btrfs guys, your recovery tools could be a lot better. The COW enables some pretty fancy recovery techniques that you guys don't seem to be doing yet. If you've got a great btrfs or zfs recovery technique, please reply and tell us.

    --
    1^2=1; (-1)^2=1; 1^2=(-1)^2; 1=-1; 1=0.
    1. Re:BTRFS or ZFS by Anonymous Coward · · Score: 0

      btrfs/zfs send to another machine (prevention) is the best approach, as the OS will always be a single source of failure even in the best systems (as well as fire/flood/lighting...).

      for btrfs recovery there is https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg27613.html and https://btrfs.wiki.kernel.org/index.php/Restore

    2. Re:BTRFS or ZFS by fnj · · Score: 1

      Just say no to BTRFS. Use ZFS with RAIDZ.

    3. Re:BTRFS or ZFS by rrohbeck · · Score: 1

      If it was integrated into the Linux kernel I'd use it. But with a chance that the next kernel update will make my FS driver unusable I won't touch it with a long pole.

      So that leaves btrfs.

    4. Re:BTRFS or ZFS by Anonymous Coward · · Score: 0

      Sounds like a Linux issue. FS is one of the most important parts of a server. If you can't trust your FS, you can't trust anything else.

    5. Re:BTRFS or ZFS by rrohbeck · · Score: 1

      Yes it is a Linux issue - specifically that ZFS licensing is incompatible with the GPL so ZFS can't be integrated into the kernel.

    6. Re:BTRFS or ZFS by Anonymous Coward · · Score: 0

      Sorry that GPL is incompatible with free. May want to fix that first.

    7. Re:BTRFS or ZFS by rrohbeck · · Score: 1

      There are some very good reasons for the GPL. I like it.

  29. Re:uhuh by Sarten-X · · Score: 2

    And yet, one of FLOSS's selling points is our great community support...

    --
    You do not have a moral or legal right to do absolutely anything you want.
  30. RAID + redundancy by sl4shd0rk · · Score: 1

    There's really no way around it. Storage media is not permanent. You can store your important stuff on RAID but keep the array backed-up often. RAID is there to keep a disk*N failure from borking your production storage and that's it. If you can afford cloud storage, encrypt your array contents (encfs is good) and mirror the contents with rsnapshot or rsync to amazon, dropbox, a friends raid array, whatever. SATA drives are cheap enough to keep a couple sitting around to just plug in and mirror to every weekend but you'll probably find a friend's cable modem and rsync+ssh a very handy alternative (hint: check out --bwlimit option) when run from cron.

    --
    Join the Slashcott! Feb 10 thru Feb 17!
  31. Freenas (ZFS based) or BTRFS by hibble · · Score: 1

    "We'd love it if the file-system could detect this and try correcting first, and if it couldn't correct the problem, it could trigger the restoration. But that only seems to be an option for RAID type systems, where the drives are colocated."

    If you have ~2TB of irreplaceable memories set up a NAS with a RAID array. whilst bit-rot can be detected it can only correct itself if the file system knows what the bits should have been. To this end BTRFS and my recommendation ZFS can be set to say scan all data 1 a week/month etc and using the redundant data in the RAID array correct the 'Bit-Rot'.

    I have a intel atom board in a old case with 4 drives(2x 500GB mirror and 2x 1TB mirror). I have FREENAS on this it is powered on every night by wake on lan. Backs up any new data and gets shut down. once a week it backs up new data then runs the command 'zfs scrub' this checks for bit-rot or inconsistencies in the file-system and corrects them if any are found.(can email you a warning if you want as well). This way if any files get damaged on a home pc/ laptop etc.. any user can turn on the NAS and recover there files from the shared folder.

    1 point of warning ZFS is RAM hungry so 4GB is the minimum. something to keep in mind when ebaying for a old pc to use. others will also point out that file transfers are ~20-30MB/s with a low powered atom so use something with more grunt if its to be a 24/7 NAS.

    1. Re:Freenas (ZFS based) or BTRFS by Anonymous Coward · · Score: 0

      ZFS is only RAM-hungry if you turn on data deduplication, in which case you need about 1GB of RAM for every 1TB of disk space.

  32. Long term Data Preservation by Anonymous Coward · · Score: 0

    This is a big deal in digital movie preservation. There will be a cloud solution based on Swift open source available in the next couple of months.

  33. That's what some RAID levels _could_ be for by Sloppy · · Score: 1

    A two-disk RAID1, or a RAID5, theoretically ought to be able to detect when there's corruption, but shouldn't be able to correct it. If you've got two different data values, you don't know which one is right.

    But it occurs to me: RAID6 (or three-or-more disk RAID1) really ought to be able to correct. Imagine a three-disk RAID1: if two disks say a byte is 03 and one disk says 02, then 03 is probably right. RAID6, similarly, has enough information to be able to do the kinds of repairs that you could do with par2.

    It'd be cool to find out this is already in the kernel's md device. Probably not so yet, though. ?

    --
    As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
    1. Re:That's what some RAID levels _could_ be for by cmurf · · Score: 1

      For raid5/raid6 this is called scrub, or in md parlance writing either check or repair to md/sync_action for the md device. Check records mismatches, it doesn't fix them. Critically though, if there are drive read errors reported, the normal read error handling will cause the underlying sectors on the drive to be overwritten with rebuilt data.

      But as for constantly doing a parity check, that's not how any RAID I'm aware of works because it would be as slow as running a degraded array. No optimizations for small file reads would be possible, it would always have to do full stripe reads, compute parity and then compare to the parity chunk on disk. And for RAID6 this would effectively bring the write performance penalty to reads.

      For RAID1, normally different LBA requests are made for each device, which is why RAID1 reads are faster than single device. If instead the same LBAs are read and then compared, again this is slow. And so the correct way to do it is scheduled scrubs.

  34. use a torrent by Anonymous Coward · · Score: 0

    make a torrent of your stuff, spread the copies around, use a private tracker, force a recheck on any file you think is corrupt and let the swam do it's thing.

  35. Just get a carbonite account by gravis777 · · Score: 1

    I have been going through this issue myself. In a single weekend of photo and video taking, I can easily fill up a 16 gig memory card, sometimes a 32 gig. About 10 years ago I lost about two years worth of pictures due to bitrot (ie my primary failed, and the backup DVD-Rs were unreadable after only a year - I was able to recover only a handfull of photos using disc-recovery software). Since then, I kept at least three backups, and reburning discs every couple of years. But if I can fill up two BD-Rs in a weekend, and given the high price of media, that wasn't an option. Extra harddrives?

    I finally realized the best way was just to get a Carbonite account. They are about $70 a year for unlimited encrypted storage space (if you are really anal, I guess you could always put things into TrueCrypt encrypted file containers and upload them). The worst part is how long it takes to do a backup on a residental broadband line (it would also suck if your ISP has data caps). It has taken me about 2 weeks to do half a terrabyte.

    The deal is, the peace of mind that comes from this is huge, and it is cheaper than buying another harddrive.

    Yes, I know that is not the question you asked, but I feel like it is a much more practical alternative. I mean, as I continue backing stuff up, I am sure I will pass a terrabyte. How much are you going to pay for discs, for harddrives? Then trying to keep them safe and secure, and having to worry about bitrot?

    Seriously, I've lost family pictures and videos before even though I had backups, and it sucked. Do yourself a favor and get a cloud backup. Yeah, it may take a while to do your backups and restorations, but it is worth it.

    1. Re:Just get a carbonite account by Anonymous Coward · · Score: 1

      Regarding bitrot as above commenters -- how can you be sure that your cloud provider is not suffering from bitrot on your stored files? PAR2 files help, but a cloud provider that would checksum the files you uploaded with a checksum file you provide (and not just say "Yep, everything is fine, nothing to see here.") would ease my mind somewhat.
      Assuming, of course, the cloud provider is using a modern ZFS or BTRFS to store your cloud data.

    2. Re:Just get a carbonite account by Anonymous Coward · · Score: 0

      This is also a good time to pitch in a vote for CrashPlan. It does both cloud backup *and* backup to other computers running CrashPlan, so you can pretty easily manage a multi-site backup of your system. The program is cross-platform, so the other systems can be Win/Mac/Linux.

      My only complaint is the lack of documentation of the on-disk storage format, so you have to rely on the CrashPlan app to read your data. I use a second (but not as robust) backup method for this reason, but I have had no complaints about CrashPlan.

    3. Re:Just get a carbonite account by gravis777 · · Score: 2

      how can you be sure that your cloud provider is not suffering from bitrot on your stored files?

      http://en.wikipedia.org/wiki/Carbonite_(online_backup)#Product_details

      Works for me - better than what I have going on at home, and cheaper than I could set up something like this. And anyways, I still have my External HDD backups as well. Its just another level of backup to keep me from data loss.

    4. Re:Just get a carbonite account by AmiMoJo · · Score: 1

      It's not clear but it sounds like the files are encrypted but probably still available to the company the owns the servers. At the very least their client software is closed source and the data is stored in the NSA^h^h^h USA so I wouldn't recommend it.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    5. Re:Just get a carbonite account by gravis777 · · Score: 1
    6. Re:Just get a carbonite account by AmiMoJo · · Score: 1

      Assuming you trust that their client software doesn't make a key available to the NSA when they want it, of course. Sorry but all US providers are suspect now. They could get a letter that forces them to do pretty much anything to their customers and can't even ask a lawyer about it.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
  36. How about Stone? by Fussen · · Score: 1

    M-DISC:
    DVD format presently, BLU-RAY format in the future. Someday an electronic eye will just be able to look at the disc surface and see it all in one snapshot.

    They aim for 1000 years. I expect 100. It may be reasonable. Just keep drives around.

    http://www.mdisc.com/proving-ground/

    1. Re:How about Stone? by Anonymous Coward · · Score: 0

      M-disc is good but incompatible with DVD readers/writers.

      Blu-ray, however, by default uses a formulation extremely similar to M-disc. However, because of the high cost of this formulation, lower-cost "LTH" variants of BD-R were developed that use a formulation similar to that of DVD-R.

      If you want the best possible combination of optical media reliability, availability, longevity and cost, then this is a no-brainer - it is (non LTH) BD-R.

    2. Re:How about Stone? by Miamicanes · · Score: 1

      If you're storing anything besides DVDs that need to be capable of direct casual playback on a DVD player, you're better off just burning the files (or even the .iso file of a DVD) to a non-LTH BD-R disc.

      M-Disc is just a non-LTH BD-R with DVD geometry. It's an elegant solution for preserving DVDs in a way that gives you the best of both non-LTH BD-R and casual playability of a DVD, but it's stupid to spend M-disc prices for bulk data backup, including digital photos, when you can buy a brand new BD-R drive and two 25-gig non-LTH discs for what you'll spend on a 10-pack of 5-gig M-discs alone.

      There's nothing exotic about BD-R anymore. DL and 3L BD-R discs are pretty expensive, but single-layer 25-gig non-LTH BD-R discs are cheap online, and an OEM-wrapped bare drive with software bundle costs maybe $50 more than a DVD+/-RW drive. And if you have a laptop that doesn't officially have a BD-R drive, you can probably buy a bare drive on eBay and swap it out yourself as long as your computer isn't a Macbook or weird ultra-ultra-thin PC notebook. For more normal laptops, there are basically two optical-drive form factors with two loading-forms (tray or slot). As long as you don't mind cannibalizing the bezel from the laptop's original drive, the hardest part of the whole thing is the bezel swap.

      One warning: 95% or more of the BD-R discs you'll find at any retail store (Best Buy, Tiger Direct, etc) are going to be LTH, and manufacturers don't exactly bend over backwards to make it obvious that the discs in a package ARE LTH type. Make sure you consult Google -- or at least Newegg -- before buying blanks, and if the discs are less than a buck apiece, they're almost GUARANTEED to be LTH.

      If you use LTH discs, all longevity bets are off. LTH discs are inferior junk made with cheap organic dye, just like DVD+/-R discs are. LTH discs exist for exactly one reason -- cost reduction. Genuine phase-change discs aren't cheap to manufacture, disc manufacturers spent lots of money tooling up to make blank DVD media based on organic dyes, and LTH lets them repurpose it for making cheap BD-R media. If you're burning a disc that only has to last until next week, go ahead & use LTH. If you're burning a disc that you want to be readable (at least, without expensive data recovery and bit rot) 25 years from now, spend a few bucks more on phase-change media.

    3. Re:How about Stone? by Fussen · · Score: 1

      What I'm confused about is the reference of M-Discs to LTH media.. M-Discs don't use dyes.. they require LG drives with modified lasers that actually burn pits into synthetic stone.

      So, as time rolls forward, the only thing that needs to be concerned is the preservation of the disc and the ability to read that disc with a drive that is functional. The trade off is that the disc is only so large, and may require many discs.. but the trade-off of having a stack of discs / records that take space but hold the data seems reasonable if the sole purpose is just to make that data survive in a non-editable format.

      Please correct me if I am wrong, but that's the benefit of M-Disc and the US Navy investing in using M-Disc as a media choice for hardened / critical situations. M-Disc is an ancient approach to the digital age; etch your story in stone and people will read it one day when you are dead.

    4. Re:How about Stone? by Fussen · · Score: 1

      I think M-Disc is worth a look. Yes M-Disc is not compatible with writers unless it is an M-Disc certified writer, such as drives made by LG.

      The big mistake you have made immediately is that M-Disc IS compatible with DVD drives. That's what makes it such a good choice, because long after it's hard to find a M-Disc burner drive, one just has to find ANY optical drive that understands DVD or Blu-Ray media (if M-Disc Blu-Ray is chosen / available.)

      This is a totally different deal than Low To High disc writing as there is no dye's used. M-Disc writers etch physical pits into synthetic stone (requiring a special disc drive laser and increased power at point of writing,) preventing the concept of bit rot, since it's stone. The only thing that can happen to the substrate is the surrounding medium collapses and makes it impossible to view the stone substrate.

    5. Re:How about Stone? by Miamicanes · · Score: 1

      I think you just accidentally misread it... I said that M-discs are basically non-LTH BD-R discs with DVD track geometry.

      LTH discs are the ones made with organic dyes, just like DVD+/-R.

      M-Disc is NOT made with organic dyes. It's a phase-change magneto-optical recordable DVD that's readable by normal drives/players, but requires a BD-R drive with the right firmware to burn.

  37. Re:uhuh by behrooz0az · · Score: 1

    WARNING: DO NOT RUN ANY COMMAND IN THE PARENT, THIS COMMENT OR ANY OF THE SIBLING COMMENTS.
    You really suck at being an asshole too, the right command for destroying files and being innocently obfuscated is:
    dd if=/dev/zero|pv|dd bs=1024 count=$(ls -s 'filename'|awk '{print $1}' of='filename'|openssl sha1

    --
    Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion. -- Spazmania (174582)
  38. M-Disc by Anonymous Coward · · Score: 0

    Obviously. Since your data isn't constantly changing, and is videos and photos, M-Disc is the ideal solution.

    1. Re:M-Disc by cmurf · · Score: 1

      Right. The physical structure and materials used for stamped vs "burned" DVD/BR media are completely different. The photosensitive "burned" media can't be considered to have any useful permanence.

      However, the biggest problem we face with any of these discs, is what hardware we will use to gain access to the encoded data on them? PATA is effectively dead, yet not even 10 years since then we'd have some difficulty reading data from a PATA drive just because the connector is uncommon. What about in another 10 years? In 20 years will there be any mainstream computers using USB at all? What about in 50 years? If we need to keep weird ancient junk around just to extract data from disks or discs, then the plan has failed. Pretty much from the outset for mortal consumers, a do it yourself digital archive is a recipe for a data recovery project in the future.

  39. Bacula by dshk · · Score: 1

    It might be an overkill, but the open source backup software Bacula has a verify task, which you can schedule to run regularly. It can compare the contents of files to thir saved state in backup volumes, or it can compare the MD5 or SHA1 hashes which were saved in the previous run. I assume other backup software has similar features.

  40. Have mercy! by c0d3g33k · · Score: 4, Funny

    We have hundreds of thousands of family pictures and videos we're trying to save using this advice. But in some sparse searching of our archives, we're seeing bitrot destroying our memories. With the quantity of data (~2 TB at present),

    As the proud owner of dozens of family photo albums, a stack of PhotoCDs etc which rarely see the light of day, the bigger challenge is whether anyone will ever voluntarily look at those terabytes of photos. Having been the victim of excruciating vacation slide shows that only consisted of 40-50 images on a number of occasions (not to mention the more modern version involving a phone/tablet waving in my face), I can only imagine the pain you could inflict on someone with the arsenal you are amassing.

    1. Re:Have mercy! by Anonymous Coward · · Score: 1

      I agree. My wife will kill at the thought - BUT - Just go through the damn photos and pick 3-5 per year per person max. Nuke the rest.

      1- YOU will never look at them all.
      2- YOUR FAMILY will never look at them all.
      3- YOUR EXTENDED FAMILY sure as hell won't look at them all.
      4- YOUR DECEDENTS absolutely sure as hell won't look at almost any of them.
      5- Within 200 years no one alive will give a rats ass about you or your family beyond a general vague genealogy spread sheet with 1-5 pictures of you max.
      6- Within 1000 years no one will even care about that
      7- Within 4 billion years the solar system will die along with the Earth that will already be dead at that point for a billion or 2 itself as far as life is concerned. So really what is the point in hundreds of thousands of family photos.

    2. Re:Have mercy! by Anonymous Coward · · Score: 0

      I have a feeling the OP had a pinky in the mouth while typing it out...

    3. Re:Have mercy! by Anonymous Coward · · Score: 0

      "4- YOUR DECEDENTS absolutely sure as hell won't look at almost any of them."

      Actually, I can GUARANTEE, that none of your decedents (or anyone else's, for that matter) will look at ANY of them, or at anything else at a ll, EVER. Decedents are dead.

    4. Re:Have mercy! by Anonymous Coward · · Score: 0

      Having been the victim of excruciating vacation slide shows that only consisted of 40-50 images on a number of occasions (not to mention the more modern version involving a phone/tablet waving in my face), I can only imagine the pain you could inflict on someone with the arsenal you are amassing.

      I have around 100k of pictures on my computer. I never purposefully show them to anyone, but they are on my screensaver that is running pretty much all the time. I love having pictures randomly showing up from last year, then one of my grandfather as a child, then one from my army days, then one from me as a child, and so on and so on and so on. I can't remember ever showing my pictures to anyone in one set chunk, but lots of people over the years have seen my screen savers, (they run on my computers at work as well), and laughed at pictures that have come up on them.

      My pictures aren't to show anyone else, they are for myself. I would hate to lose any of them.

  41. One question... by Anonymous Coward · · Score: 0

    How are you cataloging 2TB of media?

    1. Re: One question... by Anonymous Coward · · Score: 0

      Mostly by dates and events. After videoing/photographing a sports event, for example, we create an appropriately named directory (131210-JohnBasketball) and load the videos and images. Sometimes we also add a text file indicating a particularly interesting clip or image (John hits his first 3-pointer).

  42. git-annex by Anonymous Coward · · Score: 0

    I'd suggest git-annex for this. It can do pretty much exactly what you ask: periodically scan all files in a repository, determine if any are corrupt, attempt to repair them, and (if necessairy) restore from a remote backup.

  43. The old-fashioned method by TheloniousToady · · Score: 4, Interesting

    Don't forget the old-fashioned method: make archival prints of your photos and spread copies among your relatives. Although that isn't practical for "hundreds of thousands", it is practical for the hundreds of photos you or your descendants might really care about. The advantage of this method is that it is a simple technology that will make your photos accessible into the far future. And it has a proven track record.

    Every other solution I've seen described here better addresses your specific question, but doesn't really address your basic problem. In fact, the more specific and exotic the technology (file systems, services, RAID, etc.) the less likely your data is to be accessible in the far future. At best, those sorts of solutions provide you a migration path to the next storage technology. One can imagine that such a large amount of data would need to be transported across systems and technologies multiple times to last even a few decades. But will someone care enough to do that when you're gone? Compare that to the humble black-and-white paper print, which if created and stored properly can last for well over a hundred years with no maintenance whatsoever.

    Culling down to a few hundred photos may seem like a sacrifice, but those who receive your pictures in the future will thank you for it. In my experience, just a few photos of an ancestor, each taken at a different age or at a different stage of life, is all I really want anyway. It's also important to carefully label them on the back, where the information can't get lost, because a photo without context information is nearly meaningless. Names are especially important: a photo of an unknown person is of virtually no interest.

    Sorry I don't have a low-tech answer for video, but video (or "home movies", as we used to call it) will be far less important to your descendants anyway.

    1. Re:The old-fashioned method by Grizzley9 · · Score: 3, Interesting

      Agreed. Looking through a family picture album from the late 1800's I realized my hundreds of GB's of current family pics will likely die with me. There are a ton of family images and a select few family pics may be copied by progeny but unlike their printed counterparts, there are no names or locations on many (and sometimes dates if the exif gets corrupted or overwritten).

      So what good is a bunch of pics or videos of long past events except to the person involved? Digital images today, unless meticulously managed and edited do little good for historical purposes like the photo album of yesterday. Especially if those are locked away in some online archive that may or may not be easily accessed if the owner can keep up with format and company changes over the decades they will have them and descendants know where they are.

  44. Prepare for maintainer-rot, too by Rob+the+Bold · · Score: 3, Interesting

    A family archive maintained by the "tech guy/gal" in the family is also subject to failure from death or disability or the aforementioned maintainer. Any storage/backup solution should therefore be sufficiently documented (probably on paper, too) that the grieving loved ones can get things back after a year or two of zero maintenance and care of the system. That would also imply eschewing home-brew type systems in favor of using standard tools so a knowledgeable tech person not familiar with the creator's original design can salvage things in this tragic but possible scenario. Document the system so even if the family can't do it themselves, and an IT guy has to be contracted to resurrect the data, he'll have the information needed to do so.

    Any system sufficiently dependent on regular maintenance by just one particular person is indistinguishable from a dead-man time-bomb.

    --
    I am not a crackpot.
  45. You need an editing plan more than a backup plan by neo-mkrey · · Score: 4, Interesting

    100,000s -- like 300,000? More? How many of them will you actually ever look at again? Less 1% I'm guessing. Here's my advice (and it's what I do), step 1) when transferring pics to your computer, delete the ones that are out of focus, bad lighting, framed poorly, etc. This is about 15%. Step 2) once a month, go through the photos you have taken the previous month and delete those that just don't mean as much anymore (if they have decreased in emotional value in 30 days, just think how utterly worthless they would be in 5 years?). This takes care of another 30%. Step 3) once every 3 months, I and my wife pick the cream of the crop for physical prints. This is about 10%. These are stuck into photo albums, labeled and kept in a fire proof safe in our basement. So 200 photos a month, gets reduced to ~100, and then 10 per month are printed. YMMV

  46. Photos = Lightroom plus DNG on a Drobo by carlcmc · · Score: 2

    Convert photos to DNG in Adobe Lightroom and use the ability for it to check for file changes. Store on a Drobo with dual disk redundancy.

  47. Re:There are pills which can help you by Kevoco · · Score: 1

    I work next to a moving and storage company. Occasionally the dumpster out back can be found unceremoniously overflowing with the contents of a forgotten storage locker. Anything of value has been teased out - you know what gets tossed? Everything else, especially photo albums, trophies, diplomas, etc.

    “What is most personal is most general”— Carl Rogers

  48. ZFS, of course by rainer_d · · Score: 2

    but there is a catch: to reliably detect bit-rot and other problems, you also need server-grade hardware with ECC.
    ZFS (especially when your dataset-size increases and you add more RAM) is picky about that, too.
    Bit-rot does not only occur in hard-disks or flash.
    You should really, really take a hard look at every set of photos and select one or two from each "set", then have these printed (black and white, for extra longevity).
    If this results in still too many images, only print a selection of the selection and let the rest die.

    --
    Windows 2000 - from the guys who brought us edlin
    1. Re:ZFS, of course by rrohbeck · · Score: 1

      I run btrfs on RAID6 (with weekly scrubbing) on a system with ECC RAM. That should reduce the incidence of bit rot to a negligeable level.

  49. Back up more frequently and to more places by brunes69 · · Score: 1

    The solution to Bitrot and reading of old media is very simple and honestly I don't know why it comes up so much. Storage is DIRT CHEAP. 2TB of Data is NOTHING, you can get a 3TB+ external drive for $100 or even less on sale. Buy 3 drives, keep 1 in SAFELOCATION*, Back up to 1 drive every even week, and the second one every odd week, and once a month swap the one in the SAFELOCATION out for a local one and repeat the cycle. Increase or decrease frequency of SAFELOCATION swapping depending on level of paranoia.

    There, the problem is simply and very cheaply solved and there is no level of bit rot that is going to cause all 3 of these backups to be destroyed within a 1 month time window.

    * where SAFELOCATION is a off-premise location, either a close friend's house or a locked office desk or a family member's house or a safe deposit box

    1. Re:Back up more frequently and to more places by Anonymous Coward · · Score: 0

      Yeah, because that's so efficient when you are maintaining thousands of drives.

    2. Re:Back up more frequently and to more places by cmurf · · Score: 1

      This is asking too much for most people. For one, they aren't going to backup this consistently, especially off site. And then they are unlikely to turn backup drives into shelved archives once they're full, instead they tend to reformat them and reuse them. And that means any corrupt files on the source end up being replicated to all backups, eventually. So rather than considering one particular strategy as golden and spending too much time on it, multiple strategies is more effective.

      I like the idea of printing photos, on acid free paper with pigment inks tested in combination for print permanence of course, and giving copies to family members possibly the best. It's a lot of material to create, store, move, protect, but its encoding is really simple, and requires no software, hardware, electricity, to decode.

  50. Re:uhuh by isorox · · Score: 1

    WARNING: DO NOT RUN ANY COMMAND IN THE PARENT, THIS COMMENT OR ANY OF THE SIBLING COMMENTS.

    Unless you are working on the nsa's main database. Then you should run these commands several times, just To be sure the backup is complete. Then take a sledge hammer to the original files, for securit. And restore from the backup, to guarantee the backup worked.

    Book a flight to Moscow first though

  51. Re:uhuh by CanHasDIY · · Score: 1

    And yet, one of FLOSS's selling points is our great community support...

    Every community with a notable population size is going to have its share of bad actors.

    Besides, ever since you were a kid you've been taught to not trust strangers based on their word alone.

    --
    An enigma, wrapped in a riddle, shrouded in bacon and cheese
  52. Re:There are pills which can help you by Anonymous Coward · · Score: 0

    Your comment. is stupid, ignorant, and presumptuous. It's almost beyond belief.

    1: Why do you think the OP is "obsessing" about the pictures? \

    2: RE: "once the people in the pictures are dead"
              It sounds like you're assuming they're all alive.
              You are quite mistaken if you think that people lose interest in their parents pictures (or children) after their death.
              Also, are you assuming that no one has relatives with historical significance?
              Suppose the original poster is a relative of Franklin D Roosevelt, or Elvis Presley.

    3: Quit being so ...
              How do you know how much time they are spending on their archive?

    4: "enjoy life while you are still ALIVE"
              How do you know what our level of enjoyment is?
              I'll give you a hint, child, some of us are well off and don't have to work.
              What you don't know is that we're a very happy lot, and when we're not traveling in Europe, Asia, Australia, etc, we may spend some time fooling around with our photos.

    again,
    Your comment is stupid, ignorant, and presumptuous. It's almost beyond belief.

  53. Checksumming + sufficient redundancy by MetricT · · Score: 1

    We wrote our own parallel filesystem to handle just that. It stores a checksum of the file in the metadata. We can (optionally) verify the checksum when a file is read, or run a weekly "scrubber" to detect errors.

    We also have Reed-Solomon 6+3 redundancy, so fixing bitrot is usually pretty easy.

  54. Do not defrag ? Definitely do not over clock. by perpenso · · Score: 2

    ZFS has proven that a wide variety of chipset bugs, firmware bugs, actual mechanical failure, etc are still present and actively corrupting our data.

    And I expect that defragging aggravates this. Read a perfectly good block of data from disk into flaky RAM, have a bit flip, and write out that corrupted data to its new location. Even if the software is verifying its likely to verify against RAM and it did successfully write what is in RAM.

    And then there is over clocking. If a computer is just used for gaming, no problem. But if its used for more serious things or archiving things of value to you then you may want to pass on over clocking. Folks who say you can verify an over clocked CPU are mistaken. Its not a crash or no crash thing, at a certain unpredictable point in over clocking an unpredictable CPU instruction may simply give an incorrect result. This incorrect result could end up in your data or image. I've seen over clocked CPUs mess up a text string that is supplied by the CPU itself, CPUID's vendor string.

    1. Re:Do not defrag ? Definitely do not over clock. by Anonymous Coward · · Score: 1

      Every ECC-equipped RAM module I've seen these days corrects single-bit errors and warns about multiple-bit errors.

      If your machine doesn't have ECC RAM installed, and it *can* have it installed, strongly think about doing so.

      If it cannot, add that feature to the list of things to look for in your next machine. (Protip: These days, *all* motherboards with AMD chipsets support ECC RAM.)

  55. Errors While Copying by organgtool · · Score: 1

    As other people have mentioned, a lot of these errors can occur while you are actually copying the files. I have copied files and immediately executed md5sums on the source and dest files only to find differences. Unfortunately, I didn't start this practice until after I had to restore from backup only to find that some of the backup files were corrupted.

    And given that this seems to be a common problem, why in the holiest of hells does the cp command not have a verify option? Yeah, it's easy enough to wrap the copy command with md5sums, but a verify option would be even easier. Throw in an auto-retry function on top of that and you'd be really cooking.

    By the way, the submitter did not mention the current method of backup, but if they are using Linux with the cp command, they would be better served by moving over to something like rsync.

    1. Re:Errors While Copying by EmagGeek · · Score: 1

      The question is, why the holy hell are you using cp and not rsync?

    2. Re:Errors While Copying by organgtool · · Score: 1

      Because at the time, I didn't know about rsync, let alone understand it well enough to feel comfortable using it for backups. Also, sometimes rsync is overkill when I just need to copy a few files but would like to know that the destination files aren't corrupt.

  56. ZFS is one option, Glacier is worth looking at. by jafo · · Score: 1

    I've used ZFS under Linux for 5 years now for exactly this sort of thing. I picked ZFS because I was putting photos and other things on it for storage that I wasn't likely to be looking at actively and wouldn't be able to detect bit-rot until it was far too late. ZFS has detected and corrected numerous device corruption or unreadable issues over the years and corrected them, via monthly "zpool scrub" operations.

    I have been backing these files up to another ZFS system off-site. But now I'm starting to look at other options because it's looking like I can begin doing it more cheaply than even my free hosting of a box I bought can provide.

    Amazon Glacier reduces the cost of S3 storage by an order of magnitude, making 2TB of storage cost around $20/month. For a backup copy, it's hard to compete with this, even just buying a USB drive to stick somewhere... You do have to be careful about recovery though, they charge based on peak download speed (a very weird pricing).

  57. Complex mathematical problem by Anonymous Coward · · Score: 0

    The 'simplest' things in life frequently turn out to be the most complicated- at least in terms of the knowledge required to manage the system properly. And often, the complicated mathematical analysis provides very simple solutions- which seems like a paradox.

    Look to how Google handles data for ultimate answers on current state-of-the-art storage systems. Google uses commodity equipment, with custom engineering approaches to managing aspects like expected errors and failures. Data loss occurs in various predictably unpredictable ways.

    -check that the data is stored CORRECTLY in the first place. Sadly many systems will 'write' data, never ensuring the 'write' process occurred without fault.
    -use systems like PAR2 to add maybe 10% redundancy information to allow small bit and block errors that occur later to be repaired. And NO, NO, NO- you do not need to strain your brain to wonder how systems like PAR2 work- just USE IT.
    -store data you CANNOT afford to lose in more than one place, even if you cannot ensure that multiple copies are of the same generation. Recovering MOST of your data, in an older version, is far better than recovering none. You can worry about 'synchronisation' issues later, if at all.
    -know that the more complex and painful your data protection method, the more likely you are NOT to use it properly, or at all. KISS (keep it simple, stupid) will ensure you take the steps to protect you data.
    -if you believe 'bit rot' to be real (it isn't in the sense you suggest), you have no choice but to periodically check all your data, hopefully using a system like PAR2 to correct errors and rebuild the PAR sets. If you don't check it, someone else would have to (a service YOU would end up paying for one way or another), since there is NO way for data to passively check itself without the need to read and 'process' computationally (checksum test, PAR test, etc).

    Treating all your data as of equal value will ensure lowest common denominator thinking- and you don't want this. Using PAR2, and multiple storage locations is a good enough passive defence for most data. The stuff you are paranoid about, you should periodically check and renew.

    But like I said at the top, the statistical maths behind data protection is far more complicated than you might imagine. So, simply use the best working practices available from those entities with a REAL interest in doing the job properly. Anyone who says "use tape" or "only buy enterprise storage hardware" or "you must use RAID" can be safely excluded from your list of advice givers.

  58. git annex by rescdsk · · Score: 1

    git annex is an open source project that lets you distribute files around various media (including external HDs, Amazon S3, SSH-connected computers, etc.). It has an fsck command for checking that your data still matches its checksums.

    There's a GUI interface that makes it a lot like Dropbox, where you just add files to a folder, and they are sync'd.

    It works on OS X and Linux, with an alpha for Windows.

    --
    -- rm -rf / tells you if you have root or not
  59. parity archives by Anonymous Coward · · Score: 0

    I never archive any significant amount of data without first running this script at the top:

    find -type f -not -name md5sum.txt -print0|xargs -0 md5sum >> md5sum.txt

    Which is useful for finding errors, but not for fixing them. If the information is relatively important, you may want to check out parity archives:

    https://en.wikipedia.org/wiki/Parchive

  60. Search for "Distributed Fault-Tolerant Filesystem" by Anonymous Coward · · Score: 0

    Research into Distributed Fault-Tolerant Filesystems has been going on for at least 40 years, with implementations flourishing since the advent of Ethernet and similar technologies. There are lots of options out there!

    There are some fundamental things to consider:
    1. All fault-tolerance requires redundancy. I'd recommend biting the bullet and going for full replication (redundant copies).
    2. The copies should not be co-located (the real meaning of "distributed" in this context).
    3. You should not trust the network: Not all copies will always be accessible simultaneously.
    4. Updates should not be permitted unless a quorum (50%+1 or more) of replicated systems checks in and agree on the data content.
    5. Updates should permeate all copies in the background.
    6. Read-only access may be much more permissive, requiring as few as 2 or 3 replicates to be accessible.
    7. History (repository-like) performance (including "undo") is often a desirable option.

    The above is "Armageddon-grade" if there are at least 6 replicates with at least 3 wholly redundant networks. For basic reliable archiving, 3-4 copies on the Internet should be plenty good, depending on the system chosen.

  61. how do you know your media is good? by Anonymous Coward · · Score: 0

    dd if=/dev/cdrom of=/dev/null bs=512 (or a convenient multiple thereof)

    Works for tapes, too.

    I've been using this to verify my media for 30 years.

    If only I had a lawn to show for it!

    ~childo

    CAPTCHA: 'accrue'

  62. MD5 and a few scripts by MooseTick · · Score: 2

    Here's a cheap easy solution (assuming you can write some basic scripts)

    1. Start by taking an MD5 of all your pics.Save the results.
    2. Backup everything to a 2nd drive. Take MD5s and be sure they match using basic scripts.
    3. Perioducally scan drive 1 and 2 and compare against their expected MD5 value. If one has changed, copy it from the other (assuming it is still correct)

    You could expand this with more drives if you are extra paranoid. You could do this cheap, check regularly, and know when bitrot is happening.

    1. Re:MD5 and a few scripts by Smork · · Score: 1

      Instead of writing your own scripts, perhaps you can try http://md5deep.sourceforge.net/

    2. Re:MD5 and a few scripts by Anonymous Coward · · Score: 0

      Exactly what I've already done. Step 2 is really "verify MD5 sums on 2nd drive."

      Then I iterate to drive 3, and then occasionally drive 4 that normally lives in a safe deposit box.

      Scripts have lots of options for incremental update etc.

      But I'm glad to have learned about ZFS on this thread.

  63. surprising recovery by shokk · · Score: 1

    I think that when writable CDs first came out, we thought that they would last forever. And in some sense they do last long enough. The other day I found a CD binder full of games and a few backups from 1996. The most surprising of all was a collection of photos that I thought had been long lost, and with a little rsync running over and over and over, I got all the files off intact and saved them to my Flickr account.

    The most important thing to understand, I think, is that we have to look at digital storage as a convenient and temporary medium and that anything longer lasting would need to be hard copied. It’s not a guarantee, but it’s a better likelihood of survival. Pictures can survive by pure chance for a couple hundred years. We’re lucky if our current stuff will handle a few years, much less natural disasters and history itself.

    For many, the cloud seems to be a utopia, but corporate and national politics can make all your treasured media disappear without warning, and none of the free services give you a guarantee of safety if something craps out on their systems. And as for paid cloud services, ask yourself if anyone will bother to take care of it after you’re gone, or if anyone will bother to archive it, or if your family will just toss it aside even if they are able to get them as part of your estate. Ask yourself who you’re saving all that for. Are we just digital hoarders?

    --
    "Beware of he who would deny you access to information, for in his heart, he dreams himself your master."
  64. Re:uhuh by Anonymous Coward · · Score: 0

    I think running "rm -rf/" is a right of passage.

    I did it on a community web server (Suse on a Solaris Sparkstation 5) about ten years ago, hit CTRL+C within a few seconds once I realised what was happening. But it was too late, the filesystem as toast. Luckily the htdocs was still complete!!!!

    Learning Linux the "hard way" is sometimes essential, and I sure as hell learnt my lesson about the dangers of being root, and the importance of backups.

  65. Re:You need an editing plan more than a backup pla by Anonymous Coward · · Score: 0

    That might actually take more work than just backing them all up properly.

  66. It's not bit-rot by dhaen · · Score: 1

    If you're noticing data corruption on only 2TB it's probably not what we normally call bit-rot. A bit that changes state for no apparent reason within a very large set of data can be described as bit-rot, otherwise it's general data corruption which has many causes which all are understood: Poor media, poor transmission of data, overwriting of data etc. Once you've got the system sorted out so you don't get data corruption, start thinking about the nature of your data. How much redundancy is in it? If it's jpegs the almost none, so a single bit error could be serious to a file. If uncompressed TIFFs then there is a lot of data redundancy and the single bit error might only be an error of a single pixel, which you might not even notice. And finally, don't expect optical media to be safe from errors. Only use it as part of a DR plan.

  67. snapraid by JoshRosenbaum · · Score: 1

    Snapraid (free!) might be an option: http://snapraid.sourceforge.net/

    It snapshots your data to some parity files on a separate drive. All you would have to do is occasionally copy those files offsite. Snapraid includes commands that allows you to check and fix bitrot as well.

  68. CrashPlan PRO Enterprise by AdamInParadise · · Score: 1

    CrashPlan could help you a lot. First, CrashPlan is a backup system, so it makes and manages a copy of your data, including every version of every file. CrashPlan addresses the bitrot problem on their side by running their own checksums on the stored files : if they detect an issue with a stored file, they will replace it with the original version, still stored on their computer. If some files get corrupted on your computer, you can restore them from CrashPlan, but you will need something on your side to tell you that something went wrong. Now, even if you realize that the file is corrupted years after it happens, you can still recover the previous non-corrupted version from CrashPlan.

    Now, 2TB is a bit much to store on CrashPlan's cloud : unless you have a very fast connection (at least 100MB) it's going to take you a while to upload your data. The solution is to run your own CrashPlan PRO Enterprise server onsite (with periodical offsite backups of course). Don't be fooled by the name, it's pretty easy to set up and administer, and the licenses are fairly affordable (75$/user/year).

    I've supporting CrashPlan PRO Enterprise in my company for 3 years, with 25 clients and about 1TB of data. While I'm not super-happy with the way the Code42 people run their CrashPlan business, the tech is solid. I'm kind of thinking that other backup systems work in similar ways.

    Now, I hope that you'll excuse me for asking this question, but which kind of crappy file systems and hard drives are you using that generate significant levels of "bitrot" in files which are basically just sitting there?

    --
    Nobox: Only simple products.
  69. No magic wand yet invented by Anonymous Coward · · Score: 0

    Accidentally, I have been studying this subject for a while...

    Bottomline: All methods will fail eventually. Your best bet is to have multiple layers of protection.

    It's not really matter of what media holds it's data longest without rotting. All media will fail eventually. Rather it is question, if you can check your data often enough and are you able to copy it to a new media before media/data renders itself unusable.

    You have small amount of data so spinning disk is your best and really the cheapest option here!

    For small amounts eq 2TB of data RAID and error checking file system is still just fine. For a lot bigger amounts of data, you'll want to get rid of those RAID's and step up JBOD setups stacked with error checking distributed filesystem. And even then you will need to verify your file checksums regularly.

    Keeping three copies is always minimum for successful voting of the good version. In addition it's never too safe having extra metadata, like checksums, on separate media as it takes virtually no space.

    And what else there is after bit rot ... you may always "accidentally just delete" your all precious files, or your house (and all media) can burn to ashes, them cloud providers might shut down their services without warning ... you never can be sure. So always keep copies at several different locations / providers.

    Rough calculations like MTTR (mean time to recovery) may help estimating needed I/O capacity to checkums check and refresh your media often enough. With modern hardware 2TB is very easily copied within hours. More speedier RAID-systems achieve 1 000MB/s throughput, so transfer would take only only 30 minutes!

    So just copy your data often to new media and do those checkums. There's really no magic wand...

    And propably nobody has told you about migrating legacy file formats to modern formats... but that's just another additional story to this preservation case...

  70. Bit rot detection by Anonymous Coward · · Score: 0

    An interesting way could some sort of ZFS based storage appliance. ZFS provides off the shelf bit ort decay protection using checksums at block and tree level, which are periodically scrubhed and repaired.
    While RAID setups are more common with ZFS nothing stops you settimg a filesystem with inside redudancy (copies) inside same vdev (let'Say disk). Failure of one entire drive could be thought, but you can set a raid 1 mirror set. 2 TB would be easily manegeable even with consumer disks nowadays

  71. Reed Solomon FEC by flyingfsck · · Score: 1

    There is also rsbep, which uses Reed Solomon FEC. This is a classic filter, so you can use it together with tar, gzip and gpg to protect archives against NSA snooping and bit rot simultaneously.

    Something like:
    $ tar -cz indirectory | rsbep | gpg -e > out.tar.gz.rs.gpg

    La voila!

    --
    Excuse me, but please get off my Pennisetum Clandestinum, eh!
  72. Stone age by flyingfsck · · Score: 1

    Got to carve those pics in stone, in Egypt, else nobody will care about them later.

    --
    Excuse me, but please get off my Pennisetum Clandestinum, eh!
  73. ZFS is Not a Panacea by ewhac · · Score: 1
    FreeNAS and ZFS are indeed awesome. But before y'all go installing FreeNAS on some spare hardware and think your problem is solved, you need to be aware that ZFS is not a panacea. You can't just drop it on Any Old Box with default settings and expect it to magically keep your data safe unto perpetuity. You need to pay attention to what you're doing.

    Some highlights:

    • ZFS's design requires RAM to be perfectly reliable, or at least report imperfections. Undetected bitrot in RAM can and will destroy your entire ZFS pool. Thus, a machine with ECC RAM installed is a requirement.
    • As if that weren't enough, ZFS eats huge amounts of RAM. The current guideline is 1 GiB of RAM per TB of disk spindles, with 8 GiB as a practical minimum.
    • ZFS assumes it has perfect knowledge of disk writes in-flight, and as such doesn't play well with RAID controllers, which can silently re-order writes. If your machine has a RAID controller, the RAID features should be turned off. Don't worry, ZFS has its own RAID features. However:
    • Because drive densities are now approaching drive error rates (10**13 bits of storage, with manufacturers quoting uncorrectable errors every 10**14 bits read), ZFS RAID-Z1 is no longer considered sufficient to ensure storage integrity, and you should plan for RAID-Z2 (two parity drives).
    • For the same reason as turning off RAID, a "production" FreeNAS/ZFS installation should not be run in a virtual machine. It's okay if you're just test-driving it to get a sense of what it can do, but a live system should run on actual hardware.
    • Using ZFS's de-duplication feature is officially discouraged. It may seem like a great idea, but it will gobble all your RAM and return very little benefit. On average, you're better off using compression.

    When ZFS dies, it dies in a big and fairly comprehensive way, and ZFS will die if you under-provide it. In any event, you should RTFM before contemplating a build, and know the trade-offs you're getting in to.

    Schwab

  74. Re:PAR2? No, MultiPar. by grep+-v+'.*'+* · · Score: 1

    Try again, but this time with subdirectories

    PAR2 with subs: Multipar and alternate

    I've been using it for well over a year, it works great. Was using this for a while -- it's OK, but Multipar is much better.

    Or just continue to use PAR on single directories with subs placed in some type of archive (zip, 7z, tar) file.

    None of these holds a candle to ZFS as a live file system, but these all work great when archiving files to DVD/BD.

    Heck, I'm currently copying multiple dirs to BD and using Multipar as "only" a checksumming and renaming repair tool -- not even bothering with the file content recovery option. For that matter, I've even created a (single) disc with 300% recovery -- if I lose all of the primary files and over half of the recovery content bits, I can STILL recover the contents. (I've tested this by manually damaging the file contents. I have multiple copies in different places, too -- there are just a few static files that I do *NOT* want to lose.)

    --
    If the universe is someone's simulation -- does that mean the stars are just stuck pixels?
  75. Why make this so hard. by Anonymous Coward · · Score: 0

    www.synology.com/ + some sort of amazon glacier/backblaze-like backup.
    This would give you room for expansion and a pretty neat offsite for a reasonable price.

    I've had a DS1512+ for over year without it faltering and it took all of 15 minutes to shove some disks in, configure the webpage interface and back it up.

    Worst thing that will happen is the Synology will completely die and you'll have to recover data from online.

  76. M-Disc by cfulton · · Score: 1

    You might try to backing up with http://www.mdisc.com/what-is-mdisc/ I've been using them since they came out and all my backups still work. It is supposed to last a thousand years. I don't know about that, but they do seem to be better than backing up to regular dvd which I have had go bad in as little as a year.

    --
    No sigs in BETA. Beta SUCKS.
  77. Evault announced long term preservation today by Anonymous Coward · · Score: 0

    $15 TB / Month
    http://lts2.evault.com/how-it-works/

  78. Re:uhuh by macbeth66 · · Score: 1

    A rite of passage? You must be joking! I've never met anyone stupid enough to have actually run that command with those parms. The first time someone tried that on me, I did a 'man rm' and looked the doc. I always thought that was the lesson; RTFM.

  79. Cake by Anonymous Coward · · Score: 0

    I want my cake sitting in the middle of the table, pretty as can be.

    I want to enjoy the wonderful tastes sensations as well.

    It would be really great if someone could eat the cake for me by proxy, but still let me enjoy the taste. With it still sitting in the middle of the table looking as pretty as can be.

  80. Re:uhuh by Anonymous Coward · · Score: 0

    I didn't do it because someone told me to. I did it as a mistake.

    Are you telling me you have never made any mistakes in life? Sheesh!!

  81. ROFL! Hundreds of thousands? by Anonymous Coward · · Score: 0

    And you think anyone is going to care to look at your massive collection of family photos by the time bit rot sets in?
    Did you ever think that perhaps it would be smarter to keep backups of say 10 important life-shattering moments, instead of every 30 seconds of everyone's lives in your extended mega family?

  82. My ZFS-based storage and backup solution by Anonymous Coward · · Score: 0

    Do what I did and build up a FreeNAS server using an HP microserver, at least 8 gigs of RAM and 4 2TB drives configured as RAID-Z. You could put this together for about $800. The HP microserver supports ECC ram which you really do want. Install FreeNAS on a USB stick and boot off that. Set it up for weekly scrubs.

    Then, because multiple redundant *and* offsite backup is the only way to feel truly safe (a RAID array, even with ZFS, does nothing to protect you from fire or theft), I backup as follows... I have 2, 3TB ESATA drives (i.e. external drives) configured as a mirrored pool (so far, I don't have a need for more than 3TB of backup). For an initial backup, copy your files from the RAID-Z to the mirror generating an MD5 checksum as you go. To save time, you can generate the md5 checksum at the same time you do the filecopy (as opposed to reading the file once for the copy, and then reading it again to generate a checksum) doing something like this...

    cat source | tee destination | md5 >> checksums

    Note, if you do the file copy and immediately read back to compute the destination checksum, you will no doubt be reading from the cache instead of disk which means you can't be truly sure the bits made it on the disk correctly. I never could figure out how to purge the ZFS cache after a copy, so my solution is just to make a list of files to copy, copy each file while generating a checksum, then, starting at the beginning of the file list, go ahead and generate a checksum for each destination file. The first file of many that you wrote to the backup pool shouldn't be in the cache anymore at that point, the verification read will have to go to disk which is what you want.

    But I digress... After backing up to the mirrored pool, execute a zpool export of the mirrored pool, shutdown the server and disconnect the drives. You now have 3 copies of your data, two of which are very mobile. Now, take one drive and put it in a fire safe, take the other drive and store it off site... at work or safe deposit box or whatever. Now, very bad things will have to happen for you to lose data. If your server is stolen or melts, you have two other backups. If your server AND your fire safe are stolen, you have one remaining backup offsite.

    The only downside here is that backup is not continuous. If a disaster does happen, you will lose any files since your last backup. But for data that is relatively static, like movies, music, photos, the changes between incremental backups (I do mine monthly) are pretty small.

    In sum, I rely on zpool scrub to prevent bit-rot in the RAID-Z pool (and indeed, on the mirror drives as well), and use MD5 to verify when I have to copy bits between distinct pools.

    1. Re:My ZFS-based storage and backup solution by Anonymous Coward · · Score: 0

      Why do you "validate" your data on ZFS with a 128bit hash of the entire file when ZFS already does 256bit hashes of the individual blocks of a file? Why are you concerned with flushing the read cache? It sounds like you're trying to manually do what ZFS already does automatically, and you're doing it in a way that is much less correct.

      If your only concern is to make sure the data is written to ZFS, then use a sync write, which won't return until the data is committed.

  83. Integrit by Anonymous Coward · · Score: 0

    Laptop is the master copy, since that's where I do photo editing. That gets backed up via rsync to a NAS at home. Multiple USB SATA drives at work back up the same data via RoboCopy. Once every few months, I run integrit (http://integrit.sourceforge.net/texinfo/integrit.html integrit) on the laptop then the remote drives. A shell script compares the integrit db output of each drive. If they match, all is good.

    I haven't seen bitrot yet with this setup since I started using it in 2008. That covers 2 NAS setups, 4 laptop drives (on 2 separate laptops) and 6 different SATA drives (on 3 different USB bridges).

  84. can I use this to make an extra dvd for a bucket by Anonymous Coward · · Score: 0

    Would it be a good idea that for each bucket of dvds (25 or 50 dvd in each probably) make a error correction dvd with dvdisaster?

    so if one of the dvds in the bucket go bad later I could use the others together with the error correction dvd to recreate the faulty files. The version of dvdisaster in my linuxdist is 0.72.1-1 should I use that, or some ppa?

  85. Re:You need an editing plan more than a backup pla by Anonymous Coward · · Score: 0

    ...which is why so few people bother.

    It's the tragedy of digital photography. Taking photos is cheaper than ever, yet the number of photos actually making it into frames and albums is about as low as it was when most people could only afford to have their relatives photographed after they died.

    It's effectively that way in my house. Someone (human or pet) pretty much has to die (or be exceptionally adorable) before I'll bother to print and frame a picture of them.

    In terms of historical record, this is a really bad state of affairs. It's the crap we don't consider worth keeping that is the real treasure for historians and archaeologists. The day to day stuff, not the "this is how we'd like to be remembered" stuff.

  86. Never Delete by Anonymous Coward · · Score: 0

    Sigh ... I'd love to have out-of-focus or poorly framed shots of my grandparents. *DON'T* delete anything. Move them to a "morgue" directory. If you abandon them, they are gone for good. I remember going through old photos looking for shots of old cars my *wife's* grandparents owned, and places where they lived. That is something I will *never* be able to do for *my* grandparents.

  87. md5deep and hashdeep by Anonymous Coward · · Score: 0

    I keep multiple copies on local drives and in the cloud. For the local copies (in different locations) I use md5deep and hashdeep to detect bitrot:
    http://md5deep.sourceforge.net/

  88. Tool for checking metadata by shani · · Score: 1

    I know it's not really an answer to your question since it's not done, but I started a tool to save and check metadata of files:

    https://github.com/shane-kerr/fileinfo

    Right now it just outputs a file with all of the meta-data (including SHA-224 hash of the file contents). If you think this seems interesting, I can whip up the part that uses that file to check the meta-data this weekend.

  89. Amazon Glacier by bwroga · · Score: 1

    I use the MD5 solution mentioned above, but also back everything up to Amazon Glacier. From what I've read, retrieving your data can be a pain, but storage is only $1 a gigabyte per month and they say that they store multiple copies across multiple locations and periodically check for data integrity. If data integrity is lost, they repair it using the other copies. I asked them how often data is checked for integrity and they said:

    "Amazon Glacier is designed to provide average annual durability of 99.999999999% for an archive. The service redundantly stores data in multiple facilities and on multiple devices within each facility. To increase durability, Amazon Glacier synchronously stores your data across multiple facilities before returning SUCCESS on uploading archives. Glacier performs regular, systematic data integrity checks and is built to be automatically self-healing. So, to address your first question, we performs checks frequently enough to ensure that we meet our design goal of 11 9s of average annual durability for an archive. In the very unlikely event that it is determined that one of your archives is not recoverable, we would contact you promptly."

  90. Re:uhuh by bwroga · · Score: 1

    What command are you referring to?

  91. extend this scripting by Anonymous Coward · · Score: 0

    When you have a file that has one hash on one side and different hash on other side, you can't know in a script which file is the correct one.

    So add a third location of the files and hash them. Now you have two identical hashes and one different, so you know which file has gone bad and needs replacing.

    Script all that chechking and implement automatic corrections.

    Bonus points for doing it on three computers at different locations.

    It's a fun weekend project :)

  92. rsync -c by patniemeyer · · Score: 1

    I have a pair of 4TB disks that I keep cloned with rsync. Periodically I verify the contents using rsync -c, which forces rsync to do a full checksum on the files. A few times a year this will identify a file that is actually corrupt and I'll manually recover it from the good copy.

  93. md5 hashes by Anonymous Coward · · Score: 0

    I have a home NAS (low electric use) running Ubuntu server. Every night it generates an md5 hash of all files on the drive.

    On my primary system I run for 6 hours the same hashing on my primary drives. I then compare the files to look for changes and fix corruption accordingly.

  94. BT by hicksw · · Score: 1

    Bit torrent?
    Set up your very own very private tracker(s).
    Create a torrent of the file trees to be duplicated and protected on the original host.
    Leech it at all the redundant sites.
    Wait for them all to complete the download and become seeds.
    From time to time, but not all at the same time, force a recheck on each member of the swarm, to detect corruption
    A failure should trigger a download to correct the corrupted block from the swarm.

    You can probably get better advice on how to handle a growing archive.
    I would probably try to add another torrent of the added files, then
    wait for the swarm to download those files.
    Then create a new torrent file that includes the old and the new in a single torrent and use that for the next forced recheck cycle.

    You probably want to have a few scripts to automate the rechecks and updates.
    --
    The world is coming to an end, but don't stop seeding

  95. Re:You need an editing plan more than a backup pla by Anonymous Coward · · Score: 0

    Step 2) once a month, go through the photos you have taken the previous month and delete those that just don't mean as much anymore (if they have decreased in emotional value in 30 days, just think how utterly worthless they would be in 5 years?).
    YMMV

    I disagree with this. You never know what is going to "mean something" 10 or 20 years from now. When I was in the Army my buddies thought I was weird because I would take pictures of everything. Even stuff that was boring and had no emotional value whatsoever.

    Now these days my old buddies want all my pictures from back then. Many times they we have discussed something on FB or whatever and I mention that I have a picture of that. They express surprise and ask me to post that picture. We then talk about that thing and how happy they are that I wasted money back then, (it was regular film back then that you had to pay for and pay to develop), on taking a picture of this.

    I wouldn't be so quick to delete any pictures just because they are taking up some space. Disk space these days is cheaper and getting cheaper all the time. It's worth the time it takes to back up your data, (start it before you go to bed), not to lose any pictures that you might care about later.