Slashdot Mirror


Archiving Digital Data an Unsolved Problem

mattnyc99 writes, "It's a huge challenge: how to store digital files so future generations can access them, from engineering plans to family photos. The documents of our time are being recorded as bits and bytes with no guarantee of readability down the line. And as technologies change, we may find our files frozen in forgotten formats. Popular Mechanics asks: Will an entire era of human history be lost?" From the article: "[US national archivist] Thibodeau hopes to develop a system that preserves any type of document — created on any application and any computing platform, and delivered on any digital media — for as long as the United States remains a republic. Complicating matters further, the archive needs to be searchable. When Thibodeau told the head of a government research lab about his mission, the man replied, 'Your problem is so big, it's probably stupid to try and solve it.'"

23 of 405 comments (clear)

  1. Microsoft to help! by UbuntuDupe · · Score: 5, Funny

    I can't wait to hear Microsoft's explanation why the project should use one of their proprietary formats.

    1. Re:Microsoft to help! by 19thNervousBreakdown · · Score: 4, Funny

      That's not a word.

      --
      <xml><I><am><so><damn>Web 2.0</damn></so></am></I></xml>
    2. Re:Microsoft to help! by RAMMS+EIN · · Score: 5, Funny

      Our formats are industry standards. They are backed by Microsoft, a robust company which has withstood vigorous competition, lawsuits, the .com burst, and the Bolshevik revolution brought about by Stallman et al. Where other companies have folded, Microsoft has flourished. With a known track record of backward-compatibility, your documents are safe with us. Trust us. We _invented_ trusted computing.

      And remember: nobody ever got fired for buying Microsoft.

      --
      Please correct me if I got my facts wrong.
  2. Not too long... by Electrode · · Score: 5, Funny
    "for as long as the United States remains a republic."

    So, they're shooting for about 10 years then?

    1. Re:Not too long... by eln · · Score: 5, Interesting

      Your timeline may be a little off (at least I hope so), but you're right that it's a silly goal. Whether the US has 10 or 1000 years left, history shows us it will most likely fall at some point, and that point will be fairly soon when compared to the entirety of human history.

      Making a format that will survive a thousand years so long as our advanced civilization is still around and still cares is pointless, because as long as there is a continuous line of people that care, they will be willing to transfer at least the more important stuff to new media. The trick is coming up with something that will still be readable when archaeologists dig it up 10, 50, or 100 thousand years from now.

    2. Re:Not too long... by thelost · · Score: 4, Insightful

      the trick is... hoping that in a hundred thousand years people still care at all about their past. The slow realization as I read Isaac Asimov's Foundation saga about the origins of the Galactic Empire chilled me, mostly because the people of the empire had become so numb to their past as to have made it vanish entirely.

      --
      Promote Charity on Myspace, Show Your Colours!
    3. Re:Not too long... by eln · · Score: 5, Funny

      In 20 thousand years they'll have Princess Diana was running around with a lightsaber killing communists or something.

      Are you trying to say she didn't do that?

      Crap, I am so getting an F on my history paper.

    4. Re:Not too long... by Pollardito · · Score: 4, Funny

      quick, let's update wikipedia to say she did, then you'll have a source for your paper

  3. How is this different by zappepcs · · Score: 5, Insightful

    than the previous ages where all information was kept on paper or in spoken words? The problem isn't so much how to invent something that will always be readable, but some way to always have the applications to read it. If it were not for the Rosetta Stone, much of what we know about the ancient world might still be a mystery.

    1. Re:How is this different by s20451 · · Score: 4, Interesting

      Say western civilization is disrupted for a period of time that is short by historical standards -- 40-50 years would be enough. Electrical power is only sporadically available, and as a result the Internet collapses and PCs become useless. With much more important issues to deal with, such as finding food, people ignore digital data storage.

      The era of restoration comes. However, when people blow the dust off those old DVDs and players, they discover that the DVDs have decayed to the point of unreadability. Massive quantities of archived data and knowledge are irretrievably lost.

      The main problem in our age is thermodynamics -- information is stored so densely that it tends to decay naturally, on its own. By contrast, ancient stone carvings (as well as their keys, such as the Rosetta stone), are sufficiently durable to last (basically) for ever.

      --
      Toronto-area transit rider? Rate your ride.
    2. Re:How is this different by ThosLives · · Score: 4, Insightful

      It's not so much the Rosetta stone, but the fact that a "Rosetta stone" has a built-in context - it's obviously communication or artwork of some kind. If you have a big pile of digital data, what is it? An image? Compressed text? Audio? Just a sequence of numbers? The thing "printed" information gives you is that the presentation of the data gives you an idea of what it is - we don't yet have any digital data formats for which the presentation of the data gives an idea of the content; in fact, most digital storage mechanisms present all types of information in identical manner.

      That's the real challenge - devising a digital storage format in which presentation can be used to apply context to the data.

      --
      "There are a dozen opinions on a matter until you know the truth. Then there is only one." - CS Lewis (paraprhase)
    3. Re:How is this different by Marxist+Hacker+42 · · Score: 4, Interesting

      Now that's the right problem. What is needed isn't some mysterious Universal Translator Format- it's storing the read hardware, with programs in ROM that understand the format, along with the electronic copy. Hell, store the whole thing in ROM chips with a well documented interface printed on the outside of the chip. Libraries could be made up of whatever reading technology exists at the time the library is built- with this common pin-level interface.

      --
      SJW: a person who perceives an injustice, and while correcting it, commits a greater injustice.
    4. Re:How is this different by toddestan · · Score: 4, Insightful

      You're assuming far too much. Remember, there are entire written langauges from 2000+ years ago that we barely know how to read. And we have the context of what they were written on, formatting, what the characters look like and things like that. Now, in 2000 years, if someone came upon your harddrive, or flash memory card, or whatever - assuming they could even read it, they aren't going to be able to pop it into a computer and see c:\My Music\ and C:\Documents and Settings\, and the only challenge left is to figure out what the hell an OGG file is. They aren't going to see files. They are going to see 1's and 0's. Lots of them - billions on a memory card and trillions on a harddrive. They won't have a clue know how to interpet the file system, even for something relatively simple like FAT16. They may not even know that a byte is 8 bits. They won't have context, they will be baffled by the fact that most every OS writes files in fragments all over the drive. They likely won't even be tell areas that were marked as deleted but not wiped from the actual data, let along figure out what the swap file is. I seriously doubt that someone in the future, given a working harddisk but nothing else to go on, would be able to pull anything meaningful from the drive. Heck, look at modern day examples - how long did it take Linux to be able to read and write to NTFS, given the number of very smart people working on it who already had a pretty good idea how it functioned?

  4. hieroglyphics by IWantMoreSpamPlease · · Score: 4, Funny

    Worked for the Egyptians didn't it?

    --
    So rise up, all ye lost ones, as one, we'll claw the clouds.
  5. I've heard this problem over and over by csoto · · Score: 5, Interesting

    Working at a University, this is not a subject I'm not unfamiliar with. We've had lots of discussions about this. Everyone always talks about how many zillions of "pieces of information" are out there. The number of web pages in existence is always brandied about. My point in these discussions is that most of what's out there is crap. Humanity is not lessened by its loss. Good stuff gets reproduced, reviewed, studied, dissected, etc. and survives. It *is* stupid to try to solve this problem, because the problem doesn't need solving.

    --
    There exists no way of exchanging information without making judgments. --Bene Gesserit Axiom
    1. Re:I've heard this problem over and over by kfg · · Score: 4, Insightful

      Expanding copyright protection to a term equal to two lifetimes means that now even some of the good stuff is being lost because it is not allowed to preserve it.

      If preservation is outlawed, only outlaws will be preservationists.

      I believe Ray Bradbury had something to say on this subject.

      KFG

  6. My solution for digital photos? by OfNoAccount · · Score: 4, Informative

    Since I shoot RAW, I also burn a copy of dcraw.c onto every disc - so even if the current platforms get lost by the wayside, there will be code to convert them still.

    Storage itself? Currently burning onto Delkin Archival Gold, storing cool and dark, and in two physically distant locations.

    They're also stored on my harddisk, and the best are backed up onto a USB drive.

    If it looks like the DVD-ROM drive is becoming obsolete I'll burn them on to whatever comes along next.

    If you're truly paranoid you can always print them on archival quality paper using pigment based inks ;)

  7. Open, well-used, file formats. by Daniel_Staal · · Score: 4, Insightful

    There are only two ways of doing this: keeping a copy of every program used to create these files (and a system to run them on) or converting them to some open and well-supported format.

    For text documents, HTML is probably the best bet. It is so widely used and supported readers are almost garunteed to exist as long as computers do in their current form. (And if something ever truely supersedes it, a mass-conversion program will be written anyway.) HTML probably works for basic spreadsheets too. Graphics support for GIF, JPEG, and PNG is probably at that level as well, and MP3 for music.

    As a bonus, most of the native programs for the documents to be preserved have translators to these formats already.

    Beyond that I have no idea.

    --
    'Sensible' is a curse word.
  8. Government Area of Expertise by ThatsNotFunny · · Score: 5, Funny
    When Thibodeau told the head of a government research lab about his mission, the man replied, 'Your problem is so big, it's probably stupid to try and solve it.'"


    I'd trust that guy. If there's one thing our governrment knows, it's stupidity.
    --
    "Was it a millionaire who said 'Imagine No Posessions?'" -- Elvis Costello
  9. It's whether it's WORTH it by pclminion · · Score: 4, Insightful

    It really isn't a question WHETHER we will be able to read old digital data in the future. After all, humans invented these formats, flawed as they may be, and humans can decipher them with enough effort. We can crack cryptography -- a deliberate attempt to make it as difficult as possible to decipher certain information. So it's hard to imagine any data format that could not be deciphered in the future with some honest effort.

    Instead it is a question of whether the data is WORTH the effort. From an anthropological standpoint, this is valuable historical data, and its value is not decreased by our inability to interpret it. The benefit of digital data is that it can be copied even if we don't know what it means. It will not erode or decay like other historical artifacts, if we put in the small effort required to preserve it. Assuming humanity doesn't self-destruct, there will be plenty of time in the future for historians to decipher and interpret the data when a need arises for it.

  10. Extra irony points. by Kadin2048 · · Score: 4, Insightful

    I believe Ray Bradbury had something to say on this subject.

    Perhaps more ironic -- it's a pretty good bet that whatever he wrote on the subject, it's not available online due to copyright restrictions imposed by his publisher or "estate."

    --
    "Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
    1. Re:Extra irony points. by kfg · · Score: 4, Funny

      Go to the library while you still can and memorize it. Buy camping gear.

      KFG

  11. Re:Who cares? by focitrixilous+P · · Score: 4, Funny

    I doubt you'd sell many Nano-Pump (tm) enlargement kits. It's all in the name, even in the future.

    --
    SAILING MISHAP