Slashdot Mirror


On Data Obsolescence and Media Decay

mouthbeef asks: "What's the future of storage media? With CDs and tapes prone to relatively speedy decay, and hard-drives an entropic nightmare of moving parts, how will we keep our data safe over the long haul? I just got some e-mail from a writer pal who isn't really technologically sophisticated, alarmed because someone told him that his backup CDs would decay and rot in 20 years. He's an sf writer, and he was thinking "big picture:" a coming infopocalypse in which sysadmins devote their every waking moment to re-archiving their old backup data." Is such a scenario likely? Why or why not? (More)

"I wrote back that I didn't think that would happen, because:

  • Every time I buy a computer, it's got more storage on-board than all the computers I've owned until then, and I just migrate all the data files I've ever created or saved to the new box, like a hermit-crab changing shells
  • With broadband becoming more real and more cheap, it makes sense that in the long run we'll store most (if not all) of our data on remote servers -- encrypted, of course -- that are managed by trained pros with access to mirror drives, climate-controlled vaults, etc. etc.
  • Even if this doesn't happen, most of your data files will be in stupid, proprietary formats like Word 3.0 that won't be openable, anyway
(I've since changed my mind about the last one: thinking about it, I'm willing to believe that the high-speed, high-capacity distributed servers of tomorrow will have VMWare-style emulators for every chipset and every OS ever made on them as public utilities like grep or perl)

How reasonable does this seem to you folks? What do you do with data that you need to preserve for the ages? "

11 of 382 comments (clear)

  1. Re:Snake Oil by Anonymous Coward · · Score: 4

    I am an archivist. My job is to sift through data and decide what is worth saving. Generally about 5 percent of collections of modern records are saved. Popular culture is indeed documented to some degree in any historical library and there are several repositories which are dedicated specifically to the preservation of popular culture.

    The filter of decay has served mankind well? How illogical, when you have no idea of what has been destroyed how do you know mankind has been served well? Was mankind well served by the destruction of the Library of Alexandria, the Aztec library destroyed by the Spanish, the historical libraries destroyed by the Serbs in the Balkans?

    Sure CDs may last 100 years (we really don't know) but it is unlikely they will be able to be read by anything. Paper is still the most stable format available (although it is impractical for many reasons to transfer digital data to paper as some of my colleagues are prone to doing) and there are many vast libraries of data open to the public. We had well over 40,000 researchers use our library last year and less than 1 percent were scholars.

    My profession is wrestling with two technology related questions.

    1. How to make paper collections accessible electronically. For example the papers of ONE congressman (approx. 400k documents)took 5 years and nearly 3 million dollars to digitize. We have one collection which has 32M documents. Sure digital copies are cheap - IF the original was electronic and in a form easily translated.

    2. How to preserve much of the information which currently only exists in electronic form, be it governmental databases, personal computer files or web pages. We did an interesting experiment a couple of years ago when we captured about six dozen web sites which documented the devestating Red River flood in Minnesota, North Dakota and Canada. Most of these sites existed on the internet for only 2-3 months and were disappearing as we captured them. I think it will be possible to study how the internet was used as a tool in response to catastrophe from the governmental level to local churches and organiqations. Of course current copyright law makes it illegal for us to post this database of websites on the internet but thats another issue.

    Aging Newbie is correct in the assertion that only a small percentage of data need be preserved, yet I feel that conscious, reasoned choices about what should be saved serves mankind far better than the filter of decay. I also believe tha solution ultimately will involve a combination of strategies including electronic.

    Skavvy(whose firewall apparently won't allow him to register)

  2. THIS IS A PROBLEM NOW! by Anonymous Coward · · Score: 4

    WOW, i cannot beleive that half of the /. readers are not working on data recovery as we speak. I spent a good couple months of my life running back and fourth across hallways doing tape retreival because the machines that were made in the late 70s, early 80s couldn't be replaced. This was made even worse by the fact that half the tapes were courrupted. Fact is, we have lost a lot of the voyager space probe missions. With data centers poorly funded, the race to copy all the data from older 7 track format tape to new media is slow and gruiling. 7 track machines are NO LONGER MADE and the companies outfitting newer tape heads to read the old data are charging way more than the scientific centers can afford. Not only voyager, but magellin and so fourth.. GONE... and going as we speak. As the few machines that can retreive the data struggle to re-read the tapes literally hudreds of times trying to recovered those last missing bits, tapes yet to be re-archived are falling apart. Once the data is stored, what does one DO with half-complete 1970s computer records? There is yet an "emulator" to read most of this stuff. Fact is, it is gone, and anyone who says this problem isn't going to pop up again has yet to store anything important on a floppy drive. bortbox

  3. Snake Oil by sql*kitten · · Score: 4
    In his book "Silicon Snake Oil", Clifford Stoll talks of a similar subject. His point was that all our media is essentially perishable and quickly becomes obsolete: for example, there are magnetic tape and punched card formats which can no longer be read, because there are no surviving readers (or if there are, there is nothing to connect them to). His point was that our society would leave little behind in terms of data to be discovered by future archaeologists, and even if we didn, they couldn't access it.

    However, I think he was mistaken. Ancient societies left stone tablets, cave paintings and the like behind, and there's no-one who fully understands the languages or the contexts (when an archaeologist says an object is of "ritual significance" he actually means he doesn't know what it's for). We do have the technology now, as the poster says, to migrate our data ever forwards into new storage, assuming no cataclysm occurs. And even if it does, it is far more important, in terms of recovering data, that the language (source code) survives, rather than CD ROM drives, Minidisc players etc (the binaries), because then data recovery is an essentially straightforward task.

    I expect acid-free paper to survive long enough after an ecological catastrophe or, say, a meteor strike, to be useful to the survivors (better start moving the engineering textbooks down into the bunkers). And of course, Ship-It awards will outlast the end of time, not to mention non-biodegradeable shopping bags.

    As a civilisation, if we wish to preserve a legacy, we currently posess the skills and technologies to do so - if we choose to.

    1. Re:Snake Oil by Aging_Newbie · · Score: 4

      We should look at the information we have to save before we decry the methods of saving it. Society's popular culture is preserved poorly if at all while "everything" worthwhile from all of civilization still fits in a few libraries. The filter of decay has served mankind well so far - sorting out that which somebody treasured enough to save from the vast ocean of lesser stuff. In this century the Dead Sea Scrolls were discovered nicely preserved for over a millenium because somebody thought them worthwhile.

      Stored properly, writable CD's last 100 years or more while each holds well in excess of an encyclopedia. The problem of preservation is considerably simplified as compared to paper. By 100 years paper documents are of limited utility and only scholars can access them. With digital media, copies are simple and cheap so anyone could have a copy if they wanted.

      I think the challenge of the future will be one of sorting the trash; i.e. selecting moon landing data from a mountain of memos, reports, and minutae surrounding it. But, that would seem to have been the problem since history began.

      For all of our ego, I think we might have only a few times more real value to save for posterity than did our counterparts at the turn of the century or in the '50s. People seem comfortable with what we saved in the past - why not admit that we are really not that much more advanced and that the real value of our lives and era can be summarized on a few (or a few thousand) CD's a year. Not enough to cause an information apocalypse or anything but a shelf in a library...



  4. CD lifespan by David+A.+Madore · · Score: 4

    From what I've understood, the lifespan of a CD-R is around 20yr for those which are based on cyanine or AZO (and which appear blue or blue-green when you look at them) and around 100yr for those based on phtalocyanine (which appear golden to the eye).

    Of course, it depends very much on the way you treat those CD. If you put one in a light-free, dust-free, safe deposit box, it can probably survive several kyr (uh, thousands of years) without damage.

    The unfortunate thing, however, is that because the error correcting codes work so well, it is not always easy to tell that a CD has begun noticeably deteriorating until the data is actually unreadable, and then it is too late. It would be nice if the drives could return some sort of ``CD quality'' status.

    I always write down (on paper) the md5 fingerprint of the raw ISO image when I burn a CD. In that way, I can be sure whether I have pristine data yet. (And if I make copies, I can be sure the copy is exactly identical to the original.)

    This information is provided in the hope that it will be useful but WITHOUT ANY WARRANTY. Without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

  5. Shelf life of recorded CD-R longer than 20 years by Jish · · Score: 4
    This site claims 70-200 years... Anybody else have other evidence?

    Yamaha CD-R site

    Josh

  6. The ultimate backup by dattaway · · Score: 5

    One could always do what Linus did for backing up his work --sharing it with the world. I heard he didn't have a tape drive for many years until he was given an Alpha, but his work could always be found somewhere on the internet in good hands.

    The internet will always save your best work and discard the junk.

  7. Re:Data availailability by Detritus · · Score: 5
    I worked on a data recovery project for NASA/GSFC. The spacecraft data was originally recorded on analog 7-track 1/2" instrumentation recorders at ground stations all over the world. There were 100,000 tapes stored in the Public Archives of Canada. The tapes were deteriorating and destined for a landfill. It costs a substantial amount of money, every year, to store that many tapes in a climate controlled facility. That was just the data from one family of spacecraft (Alouette and ISIS).

    Recovering the data from just a portion of the tapes requires substantial amounts of time and money due to the labor intensive nature of the task. Think of copying 20,000 LP records to CD-R disks.

    With limited budgets, NASA and other scientific research agencies are often in the unhappy position of having huge amounts of potentially valuable data on rapidly deteriorating media, of which only a fraction can be saved. Unless someone invents a time machine, the data is irreplaceable.

    For many years, magnetic tape has been the medium of choice for storing spacecraft data. Storing it on an on-line system, on disk, just wasn't practical or affordable. Huge amounts of data were archived on 7-track 1/2" digital computer tapes, the same kind of tapes that you see in cheesy science fiction movies from the 1960s. Try to find one of those tape drives today, or a computer that can talk to it.

    --
    Mea navis aericumbens anguillis abundat
  8. acid free paper by rillian · · Score: 5

    As someone who just loves books .. most are not printed on acid free paper anymore and a huge amount of them is going to be lost within the next 10 to 30 years.

    I'm sorry to hear that. I've been fascinated by this phenomenon in our university library. Up until the 1930's somewhere, journals are pretty well preserved. Then they suddenly get awful as paper mills switched to new methods. Pages are yellowed and brittle. In the 1950's the error was discovered and pages become white again with the switch back to acid-free paper.

    Let's hope we don't make the same mistake with digital media. And it could be worse: almost all the film from the first half of the century is lost to self-rot and enviromental damage. For all its faults, DVD is probably the best thing that's ever happened to film from a historical perspective.

  9. Most of the data becomes useless by hernick · · Score: 5

    What I've noticed is that most of the data we're accumulating is quickly becoming useless. 10 year old schoolwork isn't something so worthy of archiving. The data you really want to keep shouldn't be very large anyway...

    Modern word processing still opens really old file formats like Windows .WRI and Word 1.0, and I don't see that likely to change in the near future. The filters will probably stay, but be optional. If you want to future-proof your documents, run a mass conversion utility on them and convert them to a more "standard" format than Word or Wordperfect. Say, pure ASCII, HTML or RTF. Sure, you're going to lose formatting, but if those are documents you're not likely to use ever again, yet there may be a slight chance you will, then losing formatting isn't important. If you need the content again, you shouldn't mind too much having to redo the formatting correctly again...

    Floppy disks are degrading rapidly, but most people's floppy collection can fit on a single CD-R. Then again, most people just don't care about their floppy collection, and will just let it die. The data contained on it isn't useful anymore.

    Let's see about Audio CDs. They degrade over time (scratches) and possibly rot. I believe that what will happen is that we're going to convert them to some format like MP3. I'm fairly certain that MP3 capability will continue to be implemented in computer for a very long time.. And if it shows signs of getting phased out, then you might simply batch-convert everything to the new format. Or just rerip your Audio CDs that are sitting in storage, if you really care about the quality (since batch conversion will result in degradation, unless we find a way to actually enhance the audio quality... which might or might not happen...)

    Movies. VHS tapes degrade... Probably, we'll be converting what we really want onto some kind of optical disk in the future. And the rest willl decay, and we won't care about it decaying. When the format (DVD-R perhaps ?) is being phased out, since it's in digital format, it should be possible quite easily to simply transfer our DVD-Rs to the higher capacity medium... Perhaps 10 discs on a single one... Saving a lot of space, and having the format live another 20 years. After all, how hard will it be to include MPEG-2 decompression in next generation video players ? The cost of an MPEG-2 decoding circuit probably won't be very high anymore.

    The other possibility I see is that bandwith gets cheap enough so that we may consider remote storage vaults. That has a couple of privacy issues I'm certain you can see... But it's incredibly convenient and will probably be adopted by everyone if we just find a way to have a high speed switched pipe to everybody's home at a reasonable cost..

    If we do indeed have high bandwith in every house, I see that the media companies might also get their acts together and start putting up their own gigantic media-archive. They could offer a monthly media-license that'd give you access to any music or movie you want. Or perhaps just make you pay for every access to the archive. Of course, such a thing.. I can think of so many ways it could go wrong. What if they decide to have only censored material on the archive ? What about independant artists ? Perhaps we'll just see a protocol to access and pay for access to media archives, and have a dozen appear. Let's say, DisnABCTimeAOL could have theirs, AndoTransmeVAMicrosoChryslerDaimler could have theirs...

    This could be so horrible if not properly done - a lot of "non approved" content could suddenly become unavailaible if you killed the distribution channels except those media-archives... So. Is this just an incoherent rant ? Would you care to add any constructive comment to it ? Answers ? Questions ? Anything at all.

  10. An old idea... but still a threat. by jw3 · · Score: 5
    The books of my youth - that were books by Stanisl/aw Lem, the polish sf writer (he's also #1 in Germany, and quite known in the States AFAIK). He described an informalypse for the first time in a book entitled "A diary found in a bath" - a book written in the early sixties. This disaster doesn't play an important role in the whole story, it is only mentioned in the "introduction" - written by an editor somewhere in the far future, a representant of an other civilization, which arouse on the Earth after the fall of our civilization - which was due to a viruse eating... paper.

    In many later books Lem refers to an informatic catastrophe: sometimes it is caused by a necro-virus, a product of a computer evolutions (the arm race was banned from Earth and transported to the Moon, where sophisticated computer systems worked automatically on weapon development. Each nation was allowed to get the weapons back on Earth, but that meant others could equally prepare; somehow, the automata on the Moon get out of control and start evolving, finally leading to a nanobot-virus thriving on silicon chips - therefore the title, "Peace on Earth"), sometimes by basic physical properties (in a humorous story "Prof. A. Donda" the title hero discovers a basic equality between energy, mass *and* information, and one of the consequences is that if information achieves a certain density it changes into matter, that - a new universe. God's word was counting from infinity to zero in an infinitely small time :-) ).

    I admit - I was gestaltet by Lem's writing. Many of his ideas from sixties and seventies came to life in the nineties (e.g. virtual reality or sciences which deal only with information retrieval). I do believe that information storage is a problem - but not because the medium would not last forever, but because of the signal / noise ratio you have even in your personal files. As I look on the four Macs we work with in our lab, and the couple of Gigabytes of data, and then dozens of GB of backups, different versions, obsolate versions, alternate versions, gel pictures you have no idea where they came from and who needs them, and so on, and so on... Yes, there are better solutions than using a Macintosh in a multiuser environment, but that's not the point. I've been using Linux for years and have my personal data at home, and I seem to have a GB or so of data I'm to afraid to remove just in case. And there are so many alternatives of storage, backup, databases... and I'm just a simple biologist!

    Returning to Lem - yes, I do believe we are approaching a critical point, like a bifurcation in a chaotic equation, and the word "chaotic" fits here in especially well. What happens next? He who cometh and giveth us a system (not OS, but an information retrieval system), he hath the power and our souls. Well, mine at least. Hope he doesn't come from Redmont, though.

    Regards,

    January