Slashdot Mirror


Bit Rot Stalks Your Digital Keepsakes

axlrosen writes "The NYTimes has an article about the problems of digital archiving. How many of your digital memories will still be around 50 years from now, considering lost disks, incompatible formats, hard drive crashes, fading CD-Rs, etc.? Unfortunately Peter Briggs' solution won't work for most of us. The only real way to make sure that your grandkids get to see your digital photos is to make real photographic prints from them. (When I bought my Mom a digital camera I installed Picasa for her, and made sure she knows to order real prints of all the pictures she wants to survive through the ages...)"

24 of 535 comments (clear)

  1. Boingboing.net article contents by Anonymous Coward · · Score: 5, Interesting
    Saturday, November 6, 2004

    Alien v Predator script saved by Internet pirates
    Amazing anaecdote from Peter Briggs, the author of the screenplay for Alien Versus Predator.

    I wrote "A vs P" originally - oh, God...did you hear that? I actually said "A vs P". I hate that thing...it's like "T2" or "LXG"! Anyway, I wrote it on an Amstrad computer, which was about one step above a Univac Room Filler. In '92 I swapped to an Apple Mac, which I've used ever since. And I ended up losing the Amstrad disk, which was some weird, unreadable proprietary brand anyway. It wasn't until whoever it was transcribed it and pirated it onto the web years later, that I was able to cut-and-paste it into Final Draft and have an electronic copy again. So, thank-you, Internet Leaker, wherever you were!

  2. 50 years?? by oddwick11 · · Score: 3, Interesting

    I am going through similar problems right now. I have about 30 floppies containing drafts of my mother's first novel. She wrote it in the early nineties on an IBM, using some early version of wordperfect.

    I decided to recover them and save the data on a CD, and I realized I didnt have a floppy drive installed on any of my machines! Somewhere in storage I had a USB floppy drive, but I cant get any software to read her files.

    My solution: buy antiquated hardware.

  3. Re:What I used to think by liminality · · Score: 2, Interesting

    there isnt a problem with digital records so long as you keep your formats up to date and have backups. generally, it seems like you should revisit data that is two years old to check if the format needs to be brought up to date before its too late. another good idea is to avoid anythign proprietary, including weird Microsfot implementations of common standards. saving digital photos as simple .jpgs is a better idea than saving them as photoshop documents for example. also, dont forget that scripting can be your friend. i use all kinds of applescripts to manage and batch-process my colleciton 1000's of digital photos so that i dont have to drop them into something proprietary like iPhoto.

  4. Re:Umm by pyro101 · · Score: 5, Interesting

    Hundreds of years? Have you seen the fade on photos 50 years ago, 100 years ago? These are even supposed to be the cherrished chemical grail that will make photos last forever. Would you like to know what photographers do with photos/film that they want to last for years, put them in a pitch black room insde of binders in drawers, that are rarely opened. The room is controlled both for humidity and temp. I'll take buying a new HDD every 6 months to that. Then you can print new prints every 10 years and abuse them to hearts wishes, not have to place the photo over there since it is too close to the sunlight, or go rabid if a kid tears up a $.20 peice of paper.

  5. The case for parity archiving? by DamienMcKenna · · Score: 2, Interesting

    Does this make the case for parity archiving?

    Damien

  6. There's still a single point of failure by Tim+Macinta · · Score: 2, Interesting
    I realised a few years ago that the only sane way to protect my data was to have it all online all the time. I store my data on redundant arrays of disks in two geographical locations (my house and my parents' house, synced nightly via rsync).
    What if somebody hacks your primary machine and erases your data? This would propagate to your backup server as well. I see at least two solutions to this: 1) make a WORM copy every so often and/or 2) write to the backup server is a journaled manner so that older data isn't automatically deleted. Of course, solution #1 is still prone to bit-rot and solution #2 doesn't protect you if somebody hacks your backup server as well (which should be substantially easier if they made it onto your primary machine). Anybody have additional suggestions? I've been thinking about this problem for a backup program I'm working on and am curious if anybody can improve upon the reliability.
    1. Re:There's still a single point of failure by Tet · · Score: 2, Interesting
      What if somebody hacks your primary machine and erases your data? This would propagate to your backup server as well.

      The syncs are delayed, so I have an overnight sync to a local disk in my main machine, weekly backups offsite, and 4 weekly backups from that to another offsite machine. Thus I have 28 days in which to spot the deleted data and restore from backup (actually, I don't need to spot it manually -- AIDE tells me when a file disappears from my machine). Eventually, I'll get around to implementing a backup strategy using rsync with hard links to do incremental backups, which we do at work. See rsnapshot. But for home use, what I have is more than sufficient.

      --
      "The invisible and the non-existent look very much alike." -- Delos B. McKown
  7. Re:Every 2-3 years by garcia · · Score: 4, Interesting

    That is exactly what I do. Two seperate types of backups going to three seperates machines.

    A daily backup of important files (and stuff that is changed daily) goes to all machines in one shot at ~6am.

    A weekly backup of EVERYTHING goes to three different machines every Sunday at ~5am.

    Now, I realize that all three could be screwed simulataneously but at least I know that TWO of those machines have automated backup to CDRW daily.

    Yeah, it's paranoid, it's redundant, but it's my data and it's important to me. If I lost my 2300 pictures I'd be lost.

  8. Re:Umm by Ford+Prefect · · Score: 4, Interesting

    Colour materials are another matter. Because they are based on chemical dyes instead of silver crystals, they are subject to chemical change (i.e. fading). Current films quote longevity of 50 to 100 years.

    A minor fade can still be pretty bad. I found an envelope of 1980s-era colour prints as taken by my father - all seemingly of a number of people with cameras standing outside, near some flowerbeds and low fences.

    On closer inspection, I noticed the very faint, faded image of the Taj Mahal in the background, near-indistinguishable from the sky.

    So, the photos are now useless, unless I scan them in and do some pretty heavy enhancement - but then what am I supposed to do with the results? :-)

    --
    Tedious Bloggy Stuff - hooray?
  9. Re:Perpetual backups by Reziac · · Score: 2, Interesting

    True; in fact I was going to make that very point, but got distracted and forgot :)

    And it depends on the value of the data. Music archives aren't really "worth" much, being mostly replaceable (even if you have to pay for it, the data is still available). The only music that has "value" in this context is something you've written yourself and not yet transcribed to paper, or computer-generated music that might be difficult or impossible to transcribe. But in most cases, you know what you did and can probably rebuild it, if at considerable cost in time and effort.

    As a rule, the same might be said about financial records, source code, and anything else someone had to physically type in (what was typed once can be typed again). Still, for these cases, the relatively small size and high PITA of rebuilding such data makes storage on "old, small, but reliable" media a realistic option.

    Then there's photos. Photos capture an ephemeral moment. The data recorded by that digital camera CANNOT be replaced. Your child will never take his first steps again, and "staging" a repeat performance just isn't the same. -- Given this, one has to wonder why anyone relies on digital photos as a permanent record anyway -- since in most cases, most of the data is thrown away by lossy compression (itself a sort of bit rot).

    There's no really good across-the-board solution, and as your mention of RAID suggests, all we can really do is make many and redundant backups, and try to keep critical data available to our future selves. If that means you don't throw out that 5" floppy drive (because 5" 360k floppy media will still outlive almost anything else) ... oh well! what's one more pile of clutter? :)

    --
    ~REZ~ #43301. Who'd fake being me anyway?
  10. Re:Umm by Lumpy · · Score: 2, Interesting

    real photos are more time tolerant than the low quality inks in printed digital photos.

    hell even the high end real exposed photos will not outlast the negatives.

    I have a very expensive 8X10 print of a digital photo a friend shot back 4 years ago when he had access to an insanely expensive digital camera for that time. (your 3MP canon point and shoot can do the same thing it can now)

    it is not exposed to sunlight directly and is behind UV protective glass in a frame and the yellow and cyan are already fading. this was on "archival" quality printer from a "archival" quality printer with "archival" quality inks.

    I'd say that printing them out will have a shorter lifespan than a CDR will.

    --
    Do not look at laser with remaining good eye.
  11. Redundant, Offsite Backups by raile · · Score: 2, Interesting

    I have three categories that I put data into, that I've decided to call Red, Yellow and Green.

    Red data is the most important and irreplacable -- things like financial data, things I've written, important emails, family videos, etc.

    Yellow data is not as important, but would be in inconvenience to replace -- things like purchased software, esoteric software drivers, etc.

    Green data is data that I like to have on hand but that could be easily replaced or that are updated frequently -- things like Linux distributions, freeware, etc.

    For the Red data, I create PAR2 parity files and burn 3 copies (with the PAR2 data). One is stored at home, one at work and one in a safety deposit box. Sensative data (like financial data) is encrypted with a key located in the deposit box.

    For the Yellow data, I burn 2 copies. One is stored at home and the other at work.

    For the Green data, I burn 1 copy.

    I will still need to keep an eye on the Red data and check the copies once every year or two, perhaps reburning to the most current technology, but I feel fairly confident that this data is safe, having three copies in different geographic locations, each with redundant parity files (with 10-20% redundancy) that can be used to reconstruct damaged data.

    YMMV. Hopefully my scheme works; I haven't had any catastrophic events that affect my data yet.

  12. Color - B/W by dexter+riley · · Score: 4, Interesting

    Is there a service where you can copy your color negatives to three b/w negatives, one for each color layer, so they can be recombined later to make a full color image? This strikes me as the best long-term analog solution to losing precious color pictures.

    1. Re:Color - B/W by Anonymous Coward · · Score: 1, Interesting
      Further, copying a negative to three new negatives would be tedious and expensive, not to mention actually making a print from such a system.

      It wouldn't necessarily have to be that hard. The most difficult issue with this system would be making sure these three negatives are physically aligned. From there isn't that tough to make a print: put the red-negative on a an enlarger, shine some red light through it. Then repeat with green light and the green-negative. Then repeat again with blue. It will be a bit tricky to get the color balance right, but that's not hard if you always use a timer to control the length of the time you're burning that photo paper with each of the three different wavelengths of light. (However, making a manually-retouched print would be much harder, because you'd have to reproduce the motions you make with the thingies you use to block out light on the parts you want to expose less, or go to some other technique.)

  13. Re:Perpetual backups by Anonymous Coward · · Score: 1, Interesting

    In the long run, format is as much as a problem as the durability of the storage medium. Who is to say that anyone will remember how a certain type of binary data file(oh lets just say db2) is read couple of decades from now?

    I heard an anecdote once about the computer systems of the east-german secret police, STASI, being rendered totally useless after the fall of the berlin wall; Simply because the engineers who had design and maintained the systems had left(fled more like it) without leaving documentation of the system.

    What I would like to see is some sort self-resolving or self-explanatory data format developed(im talking binary here). The sort of thing any professional programmer could figure out how to read, simply by looking at it. Its probably impossible, though. Any thoughs?

    The storage medium should preferably have no moving parts and be made of something really impervious, like diamond. A Holographic memory crystal. I imagine that such an object would contain some patters visible to the naked eye, which functions a pointer to how the dense data contained in the object should be extracted

  14. Re:Umm by Lechter · · Score: 2, Interesting

    It's not just a matter of the negatives, (color or b&w) so much as it's a matter of how they were developed. Masters like Ansel Adams & co. not only used better film, but they were also much more exacting in how the processed that film. Improperly stopped or fixed negatives (even when carefully stored) can deteriorate remarkably quickly....just ask a careless Photo 1 student. (not me, I was a careful Photo 1 student)

    --
    credo quia absurdum
  15. Article reveals future get-rich-quick scheme! by gotgenes · · Score: 2, Interesting

    "As long as you keep your data files somewhat readable you'll be able to go to the equivalent of Kinko's where they'll have every ancient computer available," said Mr. Schwartz, whose company has worked with the Library of Congress on its preservation efforts.

    "It'll be like Ye Olde Antique Computer Shoppe," Mr. Schwartz said. "There's going to be a whole industry of people who will have shops of old machines, like the original Mac Plus."

    Oh boy, a field day for the /.ers that have been squirreling away all that "obsolete" hardware. Quit running Linux servers on those machines, boys! They'll be too valuable when Kinkos is ready to buy those "antiques"!

    I knew I should've held on to my Apple II...

    --
    It's such a fine line between stupid and clever.
  16. Re:What I used to think by Myself · · Score: 2, Interesting

    You should pick up a Catweasel. It's a universal floppy controller for old media which can read Commodore, Amiga, Mac 800k, and other formats directly with modern floppy drives.

    The new Catweasel apparently also includes joystick/paddle ports and HardSID functionality. Yesss! :)

    As far as beating bitrot by multiplying the data: You can also use software FEC encoding to add check blocks to the data, growing it by less than an integer multiple. Repairing the errored bits is automatic, whereas storing multiple copies of the file still gives you no easy way to tell which copy is correct.

    Periodically rewriting the data and correcting for small errors that occur will prevent the accumulation of errors too large to be corrected. In RAM this is known as memory scrubbing and is used on some high-end servers to counteract cosmic rays and bit-rot.

    It's also a good way to detect impending media failure. Your drives should have SMART enabled, so you know when they're covering up a growing problem, and can get your data out of harm's way. This only protects against gradual deterioration however, and is no substitute for a backup in case of catastrophic drive failure.

    These questions are dealt with all the time by serious archivists. Storing metadata to provide context is important too. Historians of the future will probably have a thousand copies of "Driller.d64" but will they know what the original floppy label looked like?

  17. I've got stuff almost 20 years old! by JBMcB · · Score: 2, Interesting

    On it's original media, even! My second PC was, luckily, a Mac 512K. I've still got the system disks for it, with the original MacPaint and MacWrite disks. I've still got the first doodle I've done in MacPaint on 3.5" 400K diskette, and my PowerMac 6100/60 still reads it fine. When my all-singing, all-dancing Linux-based windows/appletalk/NFS/novell server is up and running, I'm going to back up everything onto RAID, then optical. As long as I keep cycling backup strategies, and keep offsite backups in a safety deposit box, all my data should be secure for quite a long time...

    --
    My Other Computer Is A Data General Nova III.
  18. Re:My Tinfoil Hat Is On by MDMurphy · · Score: 2, Interesting

    Maybe not a direct comparison, but for quite a while Sony's digital music players supported ATRAC in lieu of MP3. They got tired of falling way behind the iPod. People were unwilling to have to convert their existing tunes to ATRAC and to not have their ATRAC files be swappable with others.

    It was nice example of the market moving a big company away from a proprietary format. It probably won't happen with MS as more things out there keep adding support for WMA, but it at least shows it can happen.

    Enough people shied away from Divx to kill that format as well in lieu of real DVDs.

    So for the moment, I'll have a little faith.

  19. Re:Umm by wing03 · · Score: 3, Interesting

    Hundreds of years? Have you seen the fade on photos 50 years ago, 100 years ago?

    Standard colour prints are made with organic dyes. Those fade in time.

    Black and white photos are silver

    Those don't fade but other factors like the underlying paper turning yellow or the underlying film being cellulose nitrate a close relative of nitro cellulose (AKA gun cotton) causes it to disintegrate.

    For the chemical holy grail, (and this goes back to my knowledge gained in the 80s and early 90s)you're looking at Kodachrome 64 slide film and cibachrome positive-positive paper.

    Both use inorganic dyes and both have withstood a battery of tests and time.

  20. Amstrad by Todesmetall · · Score: 2, Interesting
    Hey, don't knock the good old Amstrad home computers :-) It was a very nice computer for its time (at least the CPC line, maybe not so much the "Joyce" aka PCW).

    The major downside was that it used a very unusual disk form factor: 3" - while the rest of the world standardized on 3.5"

  21. Re:Watch out for mistakes by pHDNgell · · Score: 2, Interesting

    Do you run rsync with --delete? If not, how do you deal with moved files? If so, how do you deal with accidental deletion?

    I'm not the person to whom this question was written, but I'll tell you my solution:

    My DB dumps are my biggest chunk of data. I dump each table (in each schema in each DB) to a separate stream and break the stream up into chunks of a specific size (configurable per stream). For each chunk, I maintain an md5. Every day, for every chunk, I compare the md5 against the md5 for the same chunk from the previous backup. If they are the same, I hard link the chunk from the previous location to the new location. If they're different, I gpg encrypt the chunk and store it in the new location.

    The net result is that I end up with about 4MB of storage required for my nightly ``full'' backups of about 3GB of postgres (in the normal case). Each dump directory is a full dump, but uses incremental storage. I currently have 95 days worth of backups online in ~3.5GB.

    After the dumps finish, they're immediately rsynced to two other systems on my LAN (each with RAID 5...one's going away hopefully). And then the destination system rsyncs the whole thing offsite.

    There is no automated deletion, but if I go in there and mess around and delete something I didn't mean to, I can get the deleted data from one of the local or remote replicates.

    Actually, I do the same thing with my mail (gpg'd tar streams per mailbox instead of dump streams per table), but there are deletions there since I only keep the last ten days or so of trees. The whole data size is closer to 400MB, but the deltas are closer to 15-20MB.

    My source code is handled via gnu arch or darcs and I just use their built-in mirroring functionality to make sure there's always at least two copies of my source trees online (usually many more).

    --
    -- The world is watching America, and America is watching TV.
  22. Re:Tell me about it by feidaykin · · Score: 2, Interesting
    DVDs have a lifetime of 30-50 years.

    "In the digital world, we don't need back-ups, because a digital copy never wears out. It is timeless." -Jack Valenti, former head of the MPAA, 2002 interview with Harvard Political Review's Derek Slater

    --

    "To confine our attention to terrestrial matters would be to limit the human spirit." -Stephen Hawking