Slashdot Mirror


The Digital Dark Age

zygan wrote to mention a Fairfax Digital article about the possibility of a digital dark age, as a result of the increasingly short-term lifespan of digital storage. From the article: "It is 2045, he suggests, and his grandchildren are exploring the attic of his old house when they come across a CD-ROM and a letter, which explains that the disk contains a document that provides directions to obtaining the family fortune. The children are excited. 'But they've never seen a CD before - except in old movies - and, even if they found a suitable disk drive, how will they run the software necessary to interpret the information on the disk? How can they read my obsolete digital document?'"

26 of 413 comments (clear)

  1. this should be soluble. by yagu · · Score: 5, Interesting

    Scary article. But probably too true.

    In my opinion data archival screams to be handled in as simple an lowest-common-denominator a way as possible. For me, that means text for documents, and picture formats that would seem guaranteed to be around for a long time, if not forever. I'm guessing a good candidate for pictures would be something like jpg. I can't imagine jpg going away or ever being a non-decipherable picture format. Video might be a tougher nut to crack but I would guess some flavor of mpg.

    Note that none of these flavors: text; jpg; nor mpg, include or imply any reliance on vendor proprietary formats (yes, I know there's a certain proprietary tinge to the picture and video forms, but they're pretty universal). So, storing and archiving for historical purposes rules out Microsoft and all of their formats. This would especially make sense considering there are already huge compatibility issues with Microsoft documents among their various versions of their products.

    Also, for retrieval assurance it no longer makes sense to me to use "dead" or "inert" methods for storage, e.g., tapes, cds, dvds, etc. Instead, at least for my purposes I maintain multiple physical and current storage devices for all of my important data. This has been a recent (last three years) development for me when I started reading about early failures of the supposedly rugged storage.

    So, that being the case that introduces (introduced) the need to devise a strategy for forward migration of all of may data so nothing got left behind. Fortunately, this has been mostly easy since right now the "active" storage du jour seems to be hard disk drives, and the capacity has grown sufficiently with each new generation of drives I have been able to simply roll my data forward onto the new drives with the new data with plenty of room to spare.

    This shouldn't be an approach foreign to comapanies with reasonably competent data shops either. But maybe a philosophical change. All is not lost, and hopefully all will not be.

    Just my $.02. ~

    1. Re:this should be soluble. by jd · · Score: 5, Interesting
      I would personally opt for PNG for images, to avoid loss of data. Video almost has to be MPEG, as neither MNG nor APNG have really gone anywhere at this time and the BBC's high definition format isn't getting much adoption yet either. For audio, MP4 would seem the best choice - less loss of data, but more likely to be readable in the far future than Ogg Vorbis (which is a shame) or AIFF (yay! AIFF's gonna die!)


      No matter what form you store the data in, if you want it readable in the far future, you've got to remember two things - there's no guarantee ANY specific technology will exist, and there's no guarantee ANY specific timeframe for the reading to take place.


      What you want, then, is to do the reverse of the language decoding that has taken place over the years. Imagine yourself faced with a puzzle every bit as baffling as Egyptian Hyroglyphics, only stored at a vastly greater information density and probably in an electronic format. What would you want/need, to be able to recover the data?


      Well, there would seem to be a few things that are essential. First, the explorer in the future will need to know the data is there and in what form. So, if you're using optical storage, make that clear (along with frequency). If you're using N-state logic, make it clear what N is. If there are M layers, tell them the value of M. You don't need to know all of the technical information, because all they need is where to start looking.


      Secondly, the information needs to be correctly indexed. Languages are broken because types of information can be grouped and identified. The same will be true here. So, produce a contents list with corresponding data formats and/or MIME types, along with the offsets within the medium.


      Thirdly, a key is a REALLY good idea - something analogous to the Rosetta Stone. Let's say you're using binary logic and a fairly rudimentary FS on the storage medium with text-based directories. The key would be a printout of the root directory in binary, again in ASCII and a third time as a set of records describing the logical layout. The printout would also need the offset of the directory. From this, it would be trivial for someone in the year 3000 to determine how offsets were calculated, how the data was laid on the disk and how the data is connected.


      If physical storage is going to be used, ensure the various media used will last about the same length of time. So, if you're aiming for a hundred years, CDs may just about work. But you must NOT have the CD in contact with sulphides or anything else which will destroy the surface. The CD must be kept cold (but not so cold it is damaged) to slow decomposition. It should also be kept somewhere where accidental exposure to UV is impossible.


      If you're keeping paper notes with the data, as I've suggested, the paper must be acid-free and the inks must be long-lasting. Most modern paper is of very low grade, as are most modern inks.


      If you're looking more at a time capsule that is for the FAR future (we're talking something that happens AFTER Star Trek), then you've got to be extra careful but it should still be possible. I see no reason why you couldn't have physical storage under ideal conditions which could be retrievable after a thousand years or so. You just have to be very careful on what you choose to use. Same with paper. If you're looking to produce the next Beowulf (no, not the clustering technology), then you're probably going to want to look at vellum or some other extremely high-quality medium. I'd also look up early inks on the Internet and modify a recipe that could be used as a refill for a printer ink cartridge. Many early inks are highly stable (iron oxide is one example) and fade more by damage to the medium than decay of the ink.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    2. Re:this should be soluble. by Coryoth · · Score: 3, Interesting

      Thirdly, a key is a REALLY good idea - something analogous to the Rosetta Stone.

      Not exactly replying to your post as simply having my memory spurred with regard to something relevant: if you're really interested in storing information for future generations then The Rosetta Project is an interesting on. They seek to have as many distinct languages as possible printed on a small disk, beginning in large print but decreasing in size as it spirals inwards to the point where it is micro-etched. It's easy enough to figure out how to read it, and as long as you cna build tools to magnify it you can read everything on it.

      Jedidiah.

    3. Re:this should be soluble. by Anonymous Coward · · Score: 1, Interesting

      Nickel. It was nickel. Pages of text were optically reduced to microscopic size and photographically etched onto discs of pure nickel. The goal was to come up with a way of storing data for 10,000 years. As long as the basic science of optics survives, and the discs themselves are extant, the information on those discs will continue to be readable forever.

    4. Re:this should be soluble. by Anonymous Coward · · Score: 1, Interesting

      No, XPM is better than BMP. Why? It is a plain text file. Once you can read ASCII, you can read XPM. It is basically the same as BMP -- uncompressed image data -- but is written in a format that can be parsed easily. Plus, since it is stored as a character array, it makes it easier to load into C/C++. Which I imagine will be around in some form in 2045. Hell, Fortan is still here and useful!

    5. Re:this should be soluble. by Anonymous Coward · · Score: 1, Interesting

      Have you noticed a common thing between text and JPEG images? You're proposing that they be the data-level formats (i.e. not the media) that will be used to store things for the Unforeseen Future.

      I noticed one thing in common, see. Both newline-delimited plain text (ASCII, basically, and the ISO LATIN 8-bit variants + a big handful of regional adaptations) and JPEG images will be "the state of the art of the past" in the future. As such, they will almost certainly be among the things that are taught in e.g. introductory software engineering classes. Plaintext certainly is today, yet consider that free-form text was considered reckless by sixties punched-card standards. Likewise education has progressed; linked lists and old-school static storage management (as in "this list grows this way and that one goes that way"; not malloc and free) today are mostly just discussed briefly or mentioned in a lecture handout; instead students are taught red-black trees and stuff like that which wasn't so long ago considered advanced topics.

      In some advanced vision of the future I think that JPEG will certainly be a part of some course assignment. Discrete cosine transforms aren't on their way out after all, regardless of whether Mr. Barnsley continues to seat his fat arse on those wavelet patents or not. This quality of the Really Good Algorithms and data structures will save our asses once the local hardware guy gets his anachronistic "Cee-Dee Raahm" drives working again...

    6. Re:this should be soluble. by Anonymous Coward · · Score: 1, Interesting

      So what? 1000 years befor not everyone could write, so today you get the "best" of what was been told then in 1000 years you'll see only the "best" of today, not everythin little kids(like me) write in their blog it's all this darwin stuff, only the best prepared will survive

  2. This is a touchy subject. by empvirus · · Score: 2, Interesting

    Really, who knows what the future holds? And who says we won't be able to trace history back to these days and even further? And just because we don't use a media anymore means it is forgotten and no one will ever be able to read the media again. I mean, if one did some digging, I bet he/she would find information to be able to read punch-cards even. Just my 2 cents.

    --
    Sometimes I comment just to hear myself typing.
  3. a lesson on impermanence by puzzled · · Score: 2, Interesting


      Each moment arises out of the moment before - call it 'dependent arising'. No object exists in perpetuity - even black holes evaporate over long time spans.

      This being said, our digital storage systems, in a collective sense, are becoming more like a brain and less like an archive. 'Memories' of some importance are in multiple locations and accessible via different search methods. They're also being changed, just as memories of our pasts acquire a patina as we age. Someone took something I wrote in the early 90s on Usenet and added it to their humor site. My flickr content is spreading if the hits are any indication, as are my contributions to YouTube.

      Public records are an important thing, but understand the other, positive things that are happening in the background as the the internet acts less like a database and more like a neural net with each passing day.

    --
    I am very easy to get along with, but I don't have time to waste being nice to people who are being stupid. -Theo
  4. SESSION #18 - SPEAK LIKE A CHILD by infonography · · Score: 4, Interesting

    Subject of a Cowboy Bebop episode. This is why I watch anime. They actually take some time to examine an idea like where to find a Betamax player 150 years from now. http://rfblues.aaanime.net/Sessions/session18.htm

    --
    Sorry about the writing. Robot fingers, you know? Cliff Steele in DOOM PATROL #23
  5. Give it to me by fumanchu32 · · Score: 2, Interesting

    Just give me the document. I'll print off a hard copy today, that new fangled paper technology looks promising (Assume acid free paper, no sunlight, etc, for you picky individuals). Just leave them a cd with my contact info. I will give them the directions to the family fortune, I promise. You can trust me, I'm a [insert political party of choice here].

  6. Interesting - historians' concerns by BitterAndDrunk · · Score: 2, Interesting

    I read an article about 10 months ago about the "death of history" due to the electronic age.
    In a nutshell, as we've moved to more digital forms of communication (phone and email), one of the primary methods historians use to piece together older eras is going extinct - the written correspondence from one person to the next.
    It was an excellent article; my google-fu sucks apparently because I can't find hide nor hair of it. Curses. No +5 Informative for me.

    --
    You better watch out, there may be dogs about . . .
  7. An interesting drawback to digitalization by kerohazel · · Score: 2, Interesting

    Reminds me of a discussion I once got into about analog vs. digital storage. Some of the people on the analog side argued that the myth of digital media being everlasting is false -- which it is. Digital media, on their own, should be seen as temporary storage. The true virtue of digital media isn't even the media itself -- it's the content. Content is what can be copied over and over again with no degradation.

    Like oral traditions, the chain of copying needs to remain unbroken for any information to truly last forever, outliving "mere mortal" media. As long as P2P networks continue to exist, I can die happily knowing that the sum of mankind's knowledge will be floating around there somewhere... even if it is buried under millions of terabytes worth of lesbian porn. ;P

    --
    Skype is too convoluted... Now I'm reverse-engineering the Kyoto Protocol.
  8. Re:The tools are not the problem. by limabone · · Score: 2, Interesting

    Why do people keep saying CD's die in 2 year/5 years/x years? Has anyone actually had a CD die on them? I have CD's in front of me at this very moment that are over 10 years old and still work great (yes I did in fact test them). Is there some conspiracy by the blank CD manufacturers to make you think all your CD's are going to die so you need to keep transferring the contents from one disk to another forever?

  9. Re:Paper by antifoidulus · · Score: 3, Interesting

    Heh, thats something I didn't understand in the scenario mentioned in the summary, why would someone create a paper explaining a document on a cd, but then not bother to print out the document itself? Seems a bit weird to be combining "formats" like that if you will. More than likely what would happen is that the grandchildren would find a spindle of cds that may contain old family photographs and throw them out not knowing what they contained(priceless family memories or they could just be leisure suit larry games)

  10. Re:Easy by michael_cain · · Score: 2, Interesting

    Seriously, if you want to think in terms of 100-150 years, this is a solved problem, and without the need for stone tablets. Pigment-based inks on acid-free paper. Silver-based black and white photo chemistry on acid-free paper. Stitched bindings, not glue. Store in a trunk where there's neglible light. Put the trunk in the attic of a house where it's reasonably safe from large amounts of water (rain or flood). Civil War documents using these techniques have survived nicely to the present day. The Bell Labs archives have Alexander Graham Bell's original laboratory notebooks, still easily legible. To date, there are no reliable archival media for this length of time for audio or moving pictures. Write it down. Sketch it (as silver-based photographic materials are getting harder and harder to find). And you can be the source material for the historians of 2155 :^)

  11. if you expect to have to reverse engineer it by toby · · Score: 2, Interesting
    bmp would be preferable to jpg

    Only if you expect to be in the situation of having no software to read JPG, and no specification. That's a slightly extreme scenario? Since your data has been, obviously, carried forward. You could always carry forward source code or specifications too, along with your JPG corpus. Or am I missing something?

    --
    you had me at #!
    1. Re:if you expect to have to reverse engineer it by drownie · · Score: 2, Interesting

      And don't even think about it if this is in a post-technological or post-apocalyptic scenario. That's when you want hardcopy! Old-fashioned printouts and photographs... with all their attendant preservation headaches. That should be in the bunker too. The basic data that defines our civilization along with a lot of technical data is in fact stored on microfilm, in metal containers in a salt mine. The german Bundesarchiv does this, the Swiss archive does it and I think there are some other countries with these kind of national archives.I think the library of congress has some of these kind of storage capacities. It's absolutely safe, there has never before been a safer storage for information. The microfilms are produced to last some thousand years. A microfilm reader is essentially some kind oflense with a light and the salt mine was here for the last 50.000.000 years. It will survive a nuclear war, an asteroid impact, it will probably survive human civilisation. I wouldn't be too worried.

      --
      *an infinite number of monkeys wrote this sig
  12. Answer - document custodian daemons. by jlseagull · · Score: 3, Interesting

    Idea #1
    What about a semi-intelligent expert system daemon that, given two document formats, could figure out how to convert one to the other?

    Consider this: I would like to archive a set of CAD documents, but they're in archaic format X. Modern CAD formats are A, B, and C. CAD programs typically have ancestors that can convert from past versions for migration purposes.

    So consider an interlinked set of CAD converters:
    #1 can convert formats F, G, H to formats D and E.
    #2 can convert formats W, Y, X, and Z to formats I, J, K, L, and F.
    #3 can convert formats D and E to formats A, B, and C.

    Consider then a daemon that continuously monitors a filesystem looking for documents that aren't in a current format. It then fires up the converters and performs the conversion while archiving all past versions.

    So in the example, the daemon fires up converters 2, then 1, and finally 3.

    It could also cryptographically sign the files to provide a chain-of-custody.

    It also maintains a set of applications and an emulator for different operating systems. When one needs to open an archaic dataset, one can either look at the converted files or call the daemon directly to seamlessly pass an emulated application session to the user if you want to look at it in the original form.

    Idea #2
    Documents could contain their own viewers. Yes, I know that's a bad idea making document objects executables, but hear me out. The document custodian daemon could also maintain a sandbox for document viewers to run in - it could even be a standardized virtual machine written in something like Java. This is getting a little out of my area of expertise, but I'll ask my girlfriend about it. It would get interesting after several levels of emulated virtual machines.

    This year, hard drives became cheaper than tape for the first time in terms of $/GB. RAID with NFS should be way better than tape backup in terms of retention and nearline access, but I'm not really an IT guy.

    I'm sure there's a business model in there somewhere.

    --
    'Be always mindful, even when ditch-digging.' --D. T. Suzuki
  13. Easier by Sycraft-fu · · Score: 4, Interesting

    Copy it to a new format. That is the real beauty of digital. Since it can be perfectly duplicated easily and quickly it's no problem to move it to a newer format. I have data on my drives now that was orignally on 5.25" floppy. It has just been recopied many times. Some of it has been converted to new formats, some of it is unmodified. Either way, it's still here despite being decades old.

    I don't know where this silly idea comes from that somehow digital is really fragile and we'll just lose all of it later. Sure, we lose tons of it all the time, but it's worthless, by and large. The by product of the information age is that we produce so much of it, it is not only impossible to archive all of it, it's undesirable. To have more information than you could ever sift through would be almost as bad as having none at all.

    Also what's the this stupid notion that we'll forget how to read things? That's like saying that we'll forget how to build sailing ships, now that we have motors. Of course that's not the case, the knowledge is preserved, in the case of sail boats, they are still made.

    This is even more clear for computers since emulation is a major protect for many people. We have emulators for all kinds of old systems. Means if you find data for one of them, you just load up said emulator and it'll get at it.

    Digital actually seems to be the ultimate prevention against a dark age. The ease of copying information and archiving it in multiple spots means that it's difficult for a single catastrophe to wipe out large amounts of data forever. There was a lot of work in teh past, for example the Mayan Codexes, that was destroyed and is totally unrecoverable. It was fragile precisely because it was hard to copy and thus there wasn't much of it around. Now, of the orignal hundreds of thousadns of Codexes, we have but 3.

    I think it's just a bunch of alarmism.

  14. Re:Similar issues with old movies by jimmydevice · · Score: 1, Interesting

    I have 10 year old CDRs, Gold backing with the dark purple / blue phenol? dye, burnt on a 1X scsi plexstor in 1995. Those still read flawlessly. I also have some cheap Al / yellow dye, They lasted about 3-4 years before starting to generate checksum errors.
    It all depends on the media and storage conditions. Conditions here are very dry 20% Humidity most of the time and stored at room temp.

  15. Re:I totally agree by Kadin2048 · · Score: 2, Interesting

    I was thinking of how you could store data that would really stand up to the test of time. History provides us some examples: things cut into stone seem to do pretty well. Paper isn't bad, providing you store it well. Animal skins, not so good. Celluloid isn't either (evidenced by the old movies and cartoons that are degrading).

    However glass is really good, and while it might not have the proven track record that stone tablets to, it can also support a much higher data density. For example, Ansel Adams original glass plate negatives are in some cases just as sharp as the day they were shot, and they should stay like that for the foreseeable future providing they're well taken care of. But even they are dependent on the chemicals used in processing -- whether the silver sticks to the glass over time, etc.

    So here's what I was thinking: what if you used some sort of photographic process to physically etch a pattern of bits into glass: use a fairly strong acid and get the etching pretty deep, or maybe etch the bits at the bottom of phonograph-like grooves so that light surface touching wouldn't destroy them. If you could make something like this that could be read with a regular CD Rom, that would be even better.

    I think some sort of process like this is used on metal (or is it actually glass?) to make the dies for stamping CDs. Basically I'm suggesting just make and retain the masters, but don't degrade them by stamping anything.

    --
    "Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
  16. For optical media, it's very easy... by tlambert · · Score: 2, Interesting

    For optical media, it's very easy... assuming the media actually survives, it's the same way this guy plays vynil LP's using a flatbed scanner:

    http://wired-vig.wired.com/news/digiwood/0,1412,57 769,00.html

    Obviously, in the future, ultra-high resolution optical input will put the current scanning/video technology to shame; they will just need to scan the thing in and run a program against the data to get the contents of the media back.

  17. It won't be that dark by kilodelta · · Score: 2, Interesting

    I work with a bunch of library science and archvist types who worry about this all the time.

    It's such a pain taking care of books that are a few hundred years old. But they miss the point when it comes to digital.

    For example, data I had on 5.25" floppies was moved to 3.5" floppies, then to a 20MB hd, then to a CD-ROM, then onto my current system.

    If it's that important you transition it to new media.

  18. Dupe from Scientific American 1995 by paulatz · · Score: 2, Interesting

    I'm sureI have read the same article several years ago,I cannot remember were, maybe on Scientific American or such. After a search on sciam.com I have found this dated January 1995, more than ten years ago. Are we reading the older news ever posted on slashdot?

    --
    this post contain no useful information, no need to mod it down
  19. What about all the horse manure in 2000? by Fredge · · Score: 2, Interesting

    This question is akin to somebody in 1900 asking what the world would be like in 2000 when the population kept growing and everybody had horses on the street - "think of all that manure accumulation - how will we walk without stepping in crap?"

    The point is - the question is irrelevant. In 100 years, assuming the continued growth of storage mediums, the average personal user will have access to terrabytes, if not more, for personal use. I imagine that the most basic of ISPs (if such an entity continues to exist separately from other existing utilities) will provide users with gigabytes of personal space online to keep store/back-up their data. The only reason to put things on physical mediums will be for short-term backups.

    I think a more pressing question is "will we be able to find the needle in the haystack?" Sure - Google does a decent job of indexing the internet now but even they are not 100%. Also the fact that while they may not be 'EVIL' today, it only takes 1 CEO change for them to become what most other companies are and then it's up to the next do-gooder to start an index from scratch. Then, assuming you can find stuff, you'll have to break the 200Mb encryption key. Luckily, the local Kinkos will have a quantum computer that you can use for $7.50/hour.