Slashdot Mirror


Researcher Warns of "Digital Dark Age"

alphadogg writes "A assistant professor from the University of Illinois at Urbana-Champaign is sounding a warning that companies, the government and researchers need to come up with a plan for preserving our increasingly digitized data in light of shifting document management and other software platforms (think WordPerfect and floppy disks). Jerome P. McDonough, who teaches at the Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign, says there exists about 369 exabytes worth of data, and that includes some pretty hard to replace stuff, including tax files, email and photos. Open standards could play a key role in any preservation effort, he says. 'If we can't keep today's information alive for future generations, we will lose a lot of our culture,' McDonough said. Even over the course of 10 years, you can have a rapid enough evolution in the ways people store digital information and the programs they use to access it that file formats can fall out of date.'"

14 of 367 comments (clear)

  1. Re:Of course by Bragador · · Score: 4, Interesting

    If archeologists find knives and trash to be important in a search, I'd say the average pictures that we are taking today might actually be very intereting to future generations for they represent normal life.

  2. Re:not to worry.... by pilgrim23 · · Score: 2, Interesting

    Recently at work we ran into a problem where a "knowledge management" package died. The company had gone belly up and there is no converter. We are printing and re-typing in thousands of pages because there is just no other way.
    I collect antiquarian books. Funny that a collection of plays printed up in Latin in 1542 only require the learning of a language, yet a knowledge base less then 10 years old is unreadable...

    --
    - Minutus cantorum, minutus balorum, minutus carborata descendum pantorum.
  3. People are starting to take note by duffbeer703 · · Score: 2, Interesting

    Government agencies and archivists are starting to wake up to the fact that this is an issue -- I think the Office 2007 file format change was a big factor that is getting it on the radar.

    Minnesota, California, Massachusetts and New York definitely have people studying the issue. Unfortunately, there are no easy answers when it comes to these things.

    In my opinion -- which is not necessarily the opinion of my employer -- one of the major problems is that there are far too many records being preserved.

    If you looked at the archives of a government or corporate office 30 years ago, only official memorandums, some meeting minutes and policies were retained. Today, technology like email has improved communication somewhat, but has also encouraged sloppy office practices so that it is nearly impossible to figure out what is useful and what isn't.

    To compound matters, the courts are now mandating document retention and email archiving which encourages the retention of even the most banal communication.

    IMO, the period 1990-2020 will be a black hole in history.

    --
    Conformity is the jailer of freedom and enemy of growth. -JFK
  4. The article mixes up 2 problems... by BUL2294 · · Score: 5, Interesting

    The article talks about two very distinct and different problems--hardware and file formats. The author has a point about the hardware--if the media goes bad or if there is no way to read the data, then the data is lost. However, the author is completely off-base when it comes to file formats...

    The author specifically mentions WordPerfect files. Bad example! The default file format in Wordperfect X4 (released in April, 2008) is the same as what was used in WordPerfect 6--which came out in 1993 (DOS and Windows). While I can't speak for OpenOffice or Google Docs, MS-Word can read those files (and WordPerfect 5.x files) with a simple File/Open. Excel opens Lotus 1-2-3 files as well. So, Word can open popular formats in use since 1988 (WP 5.0) and Excel can open some formats in use since 1983 (1-2-3 r1a). You can also buy programs like FileMerlin to convert old documents.

    Frankly, when it comes to file formats, conversion apps will exist for a LONG time. For DOS apps, you could even go so far as to create a v/m or use Dosbox, load up your obsolete word processor (I miss "Leading Edge Word Processor"!) and copy/paste the text into Word or Notepad...

    Image files, sounds, & videos are no exception... GIF has been around since 1987, JPEG has been around since the early '90s (opening those on a 10Mhz 8088 was slow!), and MPEG/WMV/AVI/Quicktime videos are easily openable...

    Finally, the more people that are affected by obsolete files, the more interest there is in some way to convert the data... But don't forget that a LOT of the data is junk--do you really care about your 7th grade paper you wrote on Hong Kong in 1989?

    --
    Windows 3.1x calc: 3.11 - 3.10 = 0.00
    1. Re:The article mixes up 2 problems... by jejones · · Score: 4, Interesting

      About mine? No... but how about the next Einstein's 7th grade paper, or the next Picasso's?

  5. The problem is real to museum conservators. by bornwaysouth · · Score: 3, Interesting

    My father (dead, retired 20 years ago as a curator of a technology museum), was bothered as were others in the field, back in the 80's. He had seen microfiche come and go, apart from the *new* digital stuff that was already being junked. He was relying on high quality long term photographs in nitrogen canisters. It only worked because he was storing a visual media such as a sheets of paper. Only the important ones, but about a million of them existed.

    As for Wordperfect and floppy disks: yep. That's a problem in our home. We are having to migrate WP files now and then. It is not sufficient to have old computers that run the programs. I had WP on my computer (but didn't use it.) A series of glitches when upgrading to SP3 had as a side effect the corruption of WP on my computer. Whatever the problem was, I could not even re-install it. We are now down to one computer that can read it.

    I, when I worked in IT, migrated library data. Getting it into any sort of readable text form was a trial. We have even been sent old Macintosh computers in the hope that we could get stuff off them. Usually we could, but it wasn't been done economically, and I cursed the Education system that had highly paid administrators who did not even dimly consider that a data storage system had a finite lifetime. Not even 20 years after my father retired on under half their salary.

    The core solution is as the original article says - for all government software, mandate that data export to a widely used open standard be available within the package at no extra charge. I do not know of any impediment to this worth considering. Where there are privacy issues, it is simply exported encrypted and funds are established that allow a few facilities to decrypt and migrate the data. If you cannot sell to government, including any educators, then you are marginal. OK, so some games will be unavailable to future generations. That is inevitable. But then that will be a reason to collect and maintain the hardware if you are a hobbyist.

    As for large corporations, it may be sufficient that the auditors require that data be accessible for forensic and liquidation purposes. That is, not readily, but if need be in extreme circumstance.

    In short, the immediate solution is an administrative one. Software and hardware is the relatively easy bit.

    My own prize example of a dead data format - the Windows .mic image format. I have a few files still of those on my computer. You can see what the picture is if you thumbnail it. But when you try to get a full sized image, Windows says it cannot recognize the file format. It is now a .mock format. Is there a term for operating systems no longer being able to recognize their own past? 'Osheimers' for example.

  6. Re:Licensing Formats by RoboRay · · Score: 2, Interesting

    There is no license required to build and sell a a CD player. There IS a licence required if you want to CALL your optical disk reader a "CD player."

    And you can still do this. The LG BH100 combination BluRay and HD-DVD player (I had one) couldn't display the HD-DVD logo because it didn't meet all the requirements of the HD-DVD player licensing. But it could still play HD-DVD movies just fine.

  7. Re:Of course by mikael · · Score: 2, Interesting

    One of the UK's beer companies used to help sell their cans by having pictures of models on the side. At the time, it was just an beer can with a picture of a model, but now these pictures capture the fashions of the era, that would be hard for any designer to reproduce without having reference pictures ( 1980's.

    Now these beer cans are actually collectors items.

    --
    Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
  8. Re:Even better by Whiteox · · Score: 2, Interesting

    I hacked into Carmen SanDiego and changed the character names to those of the staff of a school I was working at.
    The 5-1/2" floppy, formatted for an Apple // went into a time capsule around 1988. In 2013 it will be opened up.
    It'll be stuffed, just like the rest of the contents.

    --
    Don't be apathetic. Procrastinate!
  9. On a personal note... by actionbastard · · Score: 3, Interesting

    If you wish for your grandchildren and great-grandchildren to have some shred of knowledge of your existence, you should make certain that you have black-and-white photographs taken and printed on the highest quality, assured permanence, stock that you can find. Those prints should be stored in a fashion that protects them from decay so that your grandchildren and great-grandchildren at least may see what their ancestors looked like. From personal experience, I would not have known what my parents and grandparents looked like when they were children had it not been for the relative permanence of the black-and-white printing process.

    --
    Sig this!
  10. 20th Century culture lost by Simonetta · · Score: 5, Interesting

    I'm more concerned about losing the culture from the 20th century.

      Everyone born after 1975 hates the RIAA, doesn't pay any attention to whatever they say, and file-shares gigabytes without a thought to the music industry definition of 'piracy'. This is as it should be. It means that the music and movies of the (for now) young people is safe because it is widely circulated outside the control of those who have deluded themselves into believing that they own it.

      It's all the stuff from the first 2/3rds of the 20th century that will disappear. Because the people who like it are in their 50's, 60's, and 70's now and don't have the technical skills to copy and distribute it. Plus they actually trust the corporations will preserve it. I mean all the books, music recordings, television shows, movies, and plays from the first half of the 20th century. The stuff that is under 'infinite copyright' and will never be in public domain because the corporations will simply pay off the politicians to endless extend the copyright period, as they do now.

        As soon as all this stuff stops selling (and who nowdays is paying money for the book that was #3 on the New York Times BestSeller list of Oct 28, 1936?), and can't be legally copied because it can't enter public domain, then the corporations will just destroy it. Pulp the books; convert the film stock to ethanol to power their SUVs; dump the magazines in the oceans or in nuclear waste sites to absorb neutrons. When that happens, all this culture will be gone and historians 200 years from now will have little idea about how civilized people actually thought and acted in the critical early years of the modern technological age.

        You can talk to the old people about the need to preserve their culture by making 'illegal' copies of the books, magazines, and movies that were important to them, but they are just simply and completely clueless about the extent that their culture will die as they do.

  11. Re:They won't care either by prayag · · Score: 2, Interesting

    I think value of a creation is subjective. What may be garbage to you might be priceless to me. Your story might be something not worth fretting over for you but a publisher may find its worth in solid gold. So, its more of who decides what should stay and what should go. Some data someone might need today, someone might need it tomorrow, someone might need it in a decade time. Who decides what data to store and for how much time. And more importantly who deletes the apparently worthless data ?

    Who is going to pull the trigger and on what ?

  12. Re:They won't care either by Anonymous Coward · · Score: 2, Interesting

    Well, ancient garbage is literally archeologists' treasure, although it was unimportant and discarded at end of its use.

    Nowadays, most of the worlds garbage is not trash, pottery, papyrus, parchments, but old data. There is another problem from same category, not tackled in TFA: digital history is easily counterfeited, rewritten, made up.

    In the future, historians of the early Digital Age will have situation we have trying to learn about early Christianity: the abundance of contradictory sources.

    That is a problem we need to solve now, how to preserve massive amounts of "non important" data for future data miners, historians and researchers without endangering our freedom and how to reliably date (date-sign) the digital records so that they cannot be forged.

    Perhaps system for generating signature with dating public keys and (reliably destroyed) secret keys would go some distance, but in historical amounts of time every cipher can be cracked wide open. Besides, how do we ensure publicity of public keys over large time intervals (how do we defend against MiM attack, against someone planting both public date key and forged record)?

    The answer is: we need physical time capsules, something material that can carry data, that changes with time, but doesn't alter the data as it matures, so that it could be reliably dated later.

    Most of it is discarded by our relatives after we die.

    Maybe we should have "data cemeteries", places to hold digital history of our times and our lives, deceased relatives' important documents, certificates, medical records, genome, personal "non important" documents, email archives, journals, blogs, doodles, scrabbles, photographs, home videos... perhaps some observations about them from people who knew them. This archives should be available for adding (but no reading back!) during person's lifetime, so that each time person makes a backup of its data, or deletes something, it also gets added to the archive. Last addition is done after person dies, of course, then the archive is sealed and can't be read until several decades pass (to protect privacy of others who may be related somehow), then it is available for reading (but not for altering in any way).

  13. Re:not to worry.... by zymurgyboy · · Score: 2, Interesting

    Not to mention, Latin as you would learn it today is not really Latin as spoken by real people. There are very few surviving Latin texts written in Vulgar Latin -- complete with all the slang, misspellings indicating the direction the language was shifting or from whence it had come, etc. -- before it fully differentiated into the modern Romance languages we know today. Interestingly, one of the most important surviving texts is some chick named Egeria's vacation diary. That and some graffiti is about all we really have.

    What most people would deem useless is full of linguistic, maybe economic, and sociological references that make a rough sketch of someone from today a much more useful and rich cultural portrait when it is available (i.e. somehow preserved). Think on that next time you post a blog entry or snap a few vacation photos.

    --
    If you never make mistakes, it's probably because you're not doing anything.