Slashdot Mirror


Avoiding a Digital Dark Age

al0ha writes to recommend a worthwhile piece up at American Scientist on the problems of archiving and data preservation in an age where all data are stored digitally. "It seems unavoidable that most of the data in our future will be digital, so it behooves us to understand how to manage and preserve digital data so we can avoid what some have called the 'digital dark age.' This is the idea — or fear! — that if we cannot learn to explicitly save our digital data, we will lose that data and, with it, the record that future generations might use to remember and understand us. ... Unlike the many venerable institutions that have for centuries refined their techniques for preserving analog data on clay, stone, ceramic or paper, we have no corresponding reservoir of historical wisdom to teach us how to save our digital data. That does not mean there is nothing to learn from the past, only that we must work a little harder to find it."

27 of 287 comments (clear)

  1. Won't matter by countertrolling · · Score: 4, Insightful

    Our landfills will provide all the info they need.

    --
    For justice, we must go to Don Corleone
    1. Re:Won't matter by Third+Position · · Score: 4, Interesting

      Our landfills will provide all the info they need.

      Well, I'm not entirely sure of that. If you pick up a stone or a paper with characters on it, you at least have an idea what it's purpose was. But 5000 years from now, how does someone interpret a shiny little disk? It might be a long, long time before someone is able to discern it's purpose, let along figure out how it's encoded and how to un-encode it. And that's even before getting a look at the language, and learning how to translate that.

      That's one advantage of paper, stone and parchment - they don't assume a technical infrastructure in order to use them.

      I have heard that some of the braided ropes left by Mayans might actually be a "written" language. But consider that it's taken us over 500 years to suspect these braids are a form of media, let alone learned to read it, and you can imagine what a future civilization might be confronting trying to figure out our digital media.

      --
      American Third Position
      Finally, a real choice!
    2. Re:Won't matter by JustOK · · Score: 4, Funny

      They wouldn't be able to use that stuff because of copyrights and DRM

      --
      rewriting history since 2109
    3. Re:Won't matter by Redlazer · · Score: 3, Insightful
      While I agree to some extent, an advanced culture is advanced not just with it's technology, but also in the way it thinks.

      I would suspect that, in the future, our ability to understand and figure things out will be far higher than it is today. Especially since the question of what a DVD is for is clear - not just to us now, but I would imagine even to someone who had no preconceived notion would be able to piece together the clues into what it might have been used for.

      A reflective on one side, perfectly round disc? Looking at it under a microscope would no doubt show the presence of the "peaks and valleys" of digital data, and I think it would shortly fall into place.

      Of course, a thought experiment such as this is nearly impossible to do, as I don't know anyone intelligent enough that also has never experience optical storage.

      It's just that, as time passes, and our perspective of the world zooms out (coinciding with our understanding of the world), it becomes much easier to see how things are connected together. In the above example, part of the trouble with the Mayan civ is that we know so little about them and their world. It is not that they were complicated, or smarter than us, or were able to figure out things better than us (Yes, I'll see you all in 2013); the real issue is that we do not know enough about their fundamental culture in order to deduce what they were using things for. Certainly, using rope as a form of writing is an incredibly unusual way to write.

      However, time marches on, and someone figured it out. Just like they will in the future.

      In a way, it would be interesting if, in the future, someone did confusingly stumble across a shattered DVD, and, having analysed the data, finds a young man's porn collection, relentlessly locked down with encryption, it takes an unusually long 15 seconds to decrypt, and his reward is just the disclaimer:

      "If you're reading this, I'm probably dead."

      The researcher can't help but be gripped by the strange coincidences that must have lined up to bring this to him here.

      Sorry, sometimes I like to write fiction.

      -Red

      PS. Certainly, the obvious counter to my thoughts is the human ability to look at ONLY either the forest or the trees - certainly, I've missed the "Tree" part of the "forest" when learning and figuring things out in the past, and will continue to in the future.

      --
      Guns don't kill people, "with glowing hearts" kills people.
    4. Re:Won't matter by ObsessiveMathsFreak · · Score: 3, Interesting

      A related but more pertinent point is that no-one right now is able to archive or in most cases obtain anything because of copyrights and DRM.

      I work in academia and I can tell you that future researchers are not going to be able to get their hands on 90%+ of the papers written today because the private companies that own them will lose the data when they inevitably go bust (Or just lose it). It will be one of the huge ironies of history.

      --
      May the Maths Be with you!
  2. The Middle Ages didn't have the DMCA by MagikSlinger · · Score: 4, Insightful

    The main way ancient writing reached us is because someone copied it. Lots of copies. Sometimes translated into another language and back, for example, a lot of Greek learning went into Arabic and came back out into Latin or Greek. With all the copy protection and encryption on our media today, can we ever copy the data and be able to decipher it again?

    --
    The bitter lessons of a veteran coder: http://bitterprogrammer.blogspot.com
    1. Re:The Middle Ages didn't have the DMCA by Ltap · · Score: 3, Informative

      The problem is that very few identifiably Greek writings survive. In ancient times, copying was a bit like playing telephone - writing at the time was very politicized, so scribes would often alter works while copying them, mostly to give a local slant or simply changing the names. This makes it frustrating to trace things like legends (see: Noah's Ark/Epic of Gilgamesh and its infinite variations with every other culture that existed nearby). A lot of Greek and Roman writings are now quite simply lost for good, but almost certainly inspired works that aren't lost. For instance, the Odyssey and the Iliad were originally just two parts of the epic story of Troy (out of, AFAIK, four or five parts in total), and the set of works that we derive most of our knowledge of Rome from, Ab Urbe Conditum, are only partially preserved - it was a set that chronicled the history of Rome from its founding to when they volumes stopped being produced, and there were hundreds, enough to fill entire libraries. It was only in the Renaissance that anyone tried to assemble a collection, and we've only been able to come up with about 30 - if we had the full set, we would know a great deal more about Rome than we do now.

      --
      Yet Another Tech Blog
      (but so much more, including game and movie reviews)
      http://yanteb.peasantoid.org
  3. Quick... by eegad · · Score: 3, Funny

    Everybody print out all their emails!!!

    1. Re:Quick... by biryokumaru · · Score: 4, Funny

      Oh god, why doesn't Gmail have a print all function!?

      --
      When you're afraid to download music illegally in your own home, then the terrorists have won!
  4. perfect example: Geocities by Eravnrekaree · · Score: 4, Insightful

    It is indeed a big problem. The problem was illustrated recently when Yahoo suddenly pulled the plug on Geocities, wiping out a vast cultural archive that went back to the early days of the internet, a lot of valuable information was lost as a result of that. Yahoo's blatant arrogance caused me to refuse to ever use any of their products again. Geocities was actually a fairly nice service, often people criticised it because of the ads, but how do you pay to continue to offer a free service. The loss of geocities was a perfect example of the need for a permenant store or online archive of information, personal websites and so on that can be maintained as a cultural legacy and informational resource.

    1. Re:perfect example: Geocities by jaavaaguru · · Score: 3, Interesting

      You mean like archive.org? I actually went there recently to look at old Geocities, and was shocked that they don't have it all backed up there. Archive.org has pretty much everything else I've looked for. Any idea why geocities is not there?

    2. Re:perfect example: Geocities by lennier · · Score: 4, Informative

      Perhaps because others were doing it. A number of independent projects tried to back up Geocities, and may have between them recovered most of the data.

      * http://geociti.es/
      * http://reocities.com/
      * http://www.archiveteam.org/

      --
      You are not a brain: http://books.google.com/books?id=2oV61CeDx-YC
  5. ffs.. the "zomg how to preserve" story -again-!? by Animaether · · Score: 3, Insightful

    Seriously, Slashdot.. until there's a revolutionary insight into this matter.. quick posting these stories ad nauseum.

    For further commentary, see previous stories... here's one.. it's from september 2009 and -nothing has changed-.

    http://ask.slashdot.org/story/09/09/29/1646251/Archiving-Digital-Artwork-For-Museum-Purchase

  6. One Site to Archive Them All by enoz · · Score: 3, Funny

    http://archive.org/

    They've already got a copy of your Geocities sites from the first Digital Dark Age.

    1. Re: One Site to Archive Them All by indeciso · · Score: 3, Funny

      ...One Site to find them, One Site to bring them all and in the Darkness bind them...

  7. To forget is good by Anonymous Coward · · Score: 3, Insightful

    IMHO we'll find that our problem is that we drown in a sea of useless information because we can't find the islands of relevance. Trying to archive everything will only lead to failing to archive anything. On the other hand I doubt that we'll lose much important information despite failing at organized preservation attempts, because important information is copied all the time, which is the only way for information to survive quickly changing technologies and file formats anyway.

    In a more philosophical light, I think that forgetting is good for us. It frees us from the constraints of our past and makes way for new ideas. Archives are backwards-facing, but we all live in the future, all the time.

  8. Forecast: Cloudy forever by presidenteloco · · Score: 5, Insightful

    I think that many people are failing to appreciate the longevity of information preservation
    that cloud computing (more specifically, redundant, geographically distributed network storage) can bring.

    If we get the protocols right, and insist on open standards for data interchange, we can obtain
    properties such as:

    Data bundles that know how to move themselves to more recently commissioned, and/or more
    reliable hosts.

    Data bundles that know how to check in with copies of themselves, to make sure there are enough of
    them alive, and that they are adequately geographically distributed, at every given moment.
    If not, then more baby copies of the same data would be produced and stored elsewhere automatically.

    There are other issues to longevity of course, like maintenance of software that understands different
    versions of data etc. Not trivial but very doable.

    How long an individual disk or SSD or stone tablet lasts is COMPLETELY IRRELEVANT to
    the prospects for information longevity, given the network, and new levels of automated distribution
    that will take place on it going forward.

    --

    Where are we going and why are we in a handbasket?
  9. Great! Now I'll have to buy the White Album again by BluBrick · · Score: 3, Insightful

    We will naturally make multiple copies of everything we consider important, continually transcribing important data onto the latest generation data storage media. (Consider what was the very first publication printed on Gutenberg's big invention.) Unfortunately, that's not necessarily what will be considered important many generations into the future.

    I have every confidence that, far into the future, we will have or be able to develop the capability to read any media we preserve today. The problem then becomes how to determine what data we should should preserve now rather than how to preserve it. What do we know now that will be important and useful to someone 10^n years from today?

    --
    Ahh - My eye!
    The doctor said I'm not supposed to get Slashdot in it!
  10. 924 Years and nothing has changed by rudy_wayne · · Score: 5, Interesting

    The Domesday Book was commisioned in December 1085 by King William (aka William the Conqueror, who invaded ngland in 1066). The first draft was completed in August 1086 and contained records for 13,418 settlements in the English counties south of the rivers Ribble and Tees (the border with Scotland at the time). It is a detailed statement of lands held by he king and by his tenants and of the resources that went with those lands. It records which manors rightfully belonged to which estates, thus ending years of confusion resulting from the gradual and sometimes violent dispossession of the Anglo-Saxons by their Norman conquerors.

    In 1986, at a cost of £2.5 million, the UK compiled the contents of the Domesday Book into electronic form that was stored on laserdiscs. The information stored on the laserdiscs, which is the equivalent of several sets of encyclopedias, is now unreadable because the equipment needed to read the discs is no longer available. Meanwhile the original book is still readable after more than 900 years.

    1. Re:924 Years and nothing has changed by elronxenu · · Score: 3, Insightful

      Because they forgot key parts of the process:

      • Keep it simple
      • Make lots of copies which are readily available
      • Keep converting to new formats over the years

      The UK fouled up by inventing new proprietary storage formats which needed custom hardware and software to read and process the data. The laserdisc needed a special laserdisc player and a BBC Micro. The BBC who produced this were years ahead of their time and had to invent a lot of stuff. Unfortunately the rest of the world invented a lot of different stuff, which is what we use today.

      And how many of these systems were produced? I don't know, but they cost 4000 pounds each which is a significant investment for a school and certainly the high price reduced the number of items which were sent into the community.

      Even though we have extracted the data from the original formats (and also obtained improved images by re-mastering original video footage) it seems that one of the main impediments to putting this data online is copyright - the contents of the 1986 project won't be out of copyright until 2090!

      The above two points come together with "keep converting to new formats". If your stuff is all proprietary, it may be hard to convert to new formats. If your stuff is copyrighted, you may be able to convert it but you can't distribute it, and widespread distribution is one of the requirements of effective data preservation.

      The data which was produced in 1986 wasn't lost and won't be lost. People are working with it and upgrading it. However, you won't be able to see it, primarily due to the shortsightedness of the original project.

      So loss of digital data is not so much a technical problem, more a social problem, of shortsightedness in creation, distribution and copyright.

      Kinda like the BBC's lost videotapes of Monty Python (or was it Dr Who?) ... priceless recordings were allowed to degrade and become unusable, were thrown away, or were overwritten ("media re-used"). I don't mean to point the finger only at the BBC - NASA did it too. Lack of foresight, folks.

  11. Lots of other things to consider by syousef · · Score: 4, Interesting

    In my own quest to preserve my digital photos, I've created multiple backups on hard disk including a remote backup which gets updated every few months. I use different disks created by different manufacturers and buy new disks every couple of years (but do not throw away old copies).

    I've recently come across another aspect that isn't addressed by the article. Data that is in use in an online copy can be modified (including corrupted).There is no point in copying/propagating data if the data you are copying is damaged. Typically this has happened when I've tried DAM software like Lightroom which will modify the original file despite claiming to be non-destructive I have no proof that photos were re-encoded or quality was reduced but I do know original files were altered, and I want an original unaltered file preserved

    Most people when they backup files do very little verification to ensure the files they are copying today are the same files that were created 5 or 10 years ago. They rely too much on backup software to do this for them, with no attention paid to what's happened to the data between copies. To keep this under control I've started putting checksums on all my photo files, which I check when I create a fresh copy.

    Of course where my photos are captured in a proprietary format I copy to an open or at least well documented format (typically jpg, sometimes also tif). This is done as soon as I transfer the photos, which are not removed from the camera card until i have 2 additional copies. So I shouldn't have the same issues that the author had assuming jpg can still be read throughout my lifetime.

    --
    Sammy

    --
    These posts express my own personal views, not those of my employer
  12. Self-correcting problem by drDugan · · Score: 3, Insightful

    we are generating data far, far faster than we can save. We have for some time, and while trends for storage are catching up, we will always be able to generate more than we store, as a function of how computing and communications work.

    So what to save? The Director of the NLM had a unique insight on this exact question: [paraphrasing] "What is used, is saved." Basically, its the utility of information, that information that people find useful and actually use is the best proxy for long term value. The good thing is that all people are motivated to store and maintain the data they find useful, or their constituents or customers desire. As long as people keep wanting data, it will be stored and available.

    This is a very different situation to real-world archeology. In the digital, connected world we can access data today once it's publicly available, evaluate it and use it if we want. There is no dust that covers old data, it does not get buried...

  13. Practical example : Classics emulation by DrYak · · Score: 3, Insightful

    The main way ancient writing reached us is because someone copied it. Lots of copies. {...} With all the copy protection and encryption on our media today, can we ever copy the data and be able to decipher it again?

    (And as another example of copies being important for preserving : Fritz Lang's Metropolis got recently another 30 minutes of its missing part recovered from a copy located in Argentina)

    After a long enough time, virtually any DRM measure end-up being broken. What only matters is time, resources and some clever tricks (to avoid waiting until universe heat-death while bruteforcing a 4096bit key).
    So DRM has only 2 direct effects :
    - it annoys legitimate users everywhere with no practical reason.
    - it forces the basement-dwelling teen with too much free time on their hand to wait until 2 weeks before official launch date, instead of 3 weeks before, because it took 1 week to the pirates to find a way to break the DRM.

    This implies 2 results :
    - That the 99.99% of pirate users, will never ever interact with the DRM nor be affected by it in any way.
    - The important part : DRM protected piece of data will get copied, eventually and a lot. Lots of copies will exist and virtually 99.99% of these copies will be the "pirated" copies. Be it legal backup or unlicensed copies.

    So in the end, the DRM-protected data will survive, only not the DRM version itself, but the DRM-free version as found on The Pirate Bay and similar. Case in point : Classics emulation.
    Most of the companies which produced the game we played as children are now belly. Of the few remaining, few of them have kept the assets of their old production. Few of them are interested in doing anything with these old assets. The few who do, generally do modern re-imaging and re-interpretation, rather than re-issuing the old.

    So in short, if you ever wanted to pull back some of your children memories out of the grave, don't count on the original companies.
    Some time you can find still working vintage equipment and media - but these will eventually break.
    Today, the biggest part of these oldies are available ... as image of pirated disks. It's practically sure that, if in 2010 you want to play the same game as in 1985, you'd probably see a cracktro in the beginning.

    All your Commodore C64, Amiga, etc. favourite games are currently best sourced from download site which contain warez copies that were carried over back from that era, while at the same time the companies went belly up and/or let their assets rot.

    So, in 25 years, when most of the current media companies have either disappeared, or completely forgotten about today's media, your children's best way to find a copy of them to remember fond memories, would be finding a copy which will be the digital descendant of what's today on pirate bay.
    Yes, **AA, today's EVIL pirate, might be tomorrow's heroic archivist.

    In 25 years, when the current maker of

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
  14. Gmail Paper by illiteratewithdrawal · · Score: 4, Funny

    Google does one better: Gmail Paper

  15. Re:Perhaps the way we think.. by CharlyFoxtrot · · Score: 4, Insightful

    Many of the laws that overly stymie information flow (DMCA etc.), I think, are just a knee jerk reaction in the way printing presses were suppressed, and controlled until everyone realised the benefits of having them opened up.

    Barbarians have always burned down libraries. No reason to think they'd stop doing that just because they wear ties these days.

    --
    If all else fails, immortality can always be assured by spectacular error.
  16. like an odd sock by timmarhy · · Score: 3, Insightful
    this is the same shit story that keeps popping up on /. ever 6 months or so.

    typically kdawson posts it, what a tard.

    --
    If you mod me down, I will become more powerful than you can imagine....
  17. Re:The fight is lost by BrokenHalo · · Score: 3, Interesting

    I have code and documents dating back to 1976 on a HDD on this machine. Until 15 years ago I had it all stored on 800BPI mag tapes, but before I left my last serious "big-iron mainframe" site I transferred it across to floppies. I doubt if I'll ever need the files again, but since they don't make any significant dent in my storage, there's no reason to throw them away.

    I know many historians (in fact my wife is one), and one day someone might be more interested in a perspective on '70s and '80s programming than I am right now. If I throw it out, that information will be gone forever.