Slashdot Mirror


Researcher Warns of "Digital Dark Age"

alphadogg writes "A assistant professor from the University of Illinois at Urbana-Champaign is sounding a warning that companies, the government and researchers need to come up with a plan for preserving our increasingly digitized data in light of shifting document management and other software platforms (think WordPerfect and floppy disks). Jerome P. McDonough, who teaches at the Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign, says there exists about 369 exabytes worth of data, and that includes some pretty hard to replace stuff, including tax files, email and photos. Open standards could play a key role in any preservation effort, he says. 'If we can't keep today's information alive for future generations, we will lose a lot of our culture,' McDonough said. Even over the course of 10 years, you can have a rapid enough evolution in the ways people store digital information and the programs they use to access it that file formats can fall out of date.'"

23 of 367 comments (clear)

  1. I say by speedingant · · Score: 5, Funny
    Lets go back to using basic text editors and floppy disks. Would we REALLY miss the new "XYZ 5000-tron GUI" Microsoft Word?

    And who needs to store pictures and movies on their computers anyway? In fact, I think the world would be a better place without them!

    Now if you excuse me, I'm going back to watching Iron Man on my wrist watch.

    1. Re:I say by Anonymous Coward · · Score: 4, Insightful
      It's funny how when digital culture is under attack by the RIAA people say that "software is art and deserves all the same legal protection" but when we talk about preserving 1980s and 1990s computer culture in the same way that we preserve books there are comments of ridicule. People pick some shit software and cast all software with the same (shitty) brush.

      And I'm not immune of course, there's a lot of shitty software out there and it's easy to trivialise the value of Custers Revenge or Giana Sisters but remember that historically archivists want to know about tasteless/racist video games or tributes/Mario-ripoffs just like they want to know about 1980s comedy shows and magazines.

      This article is saying that libraries and archivists had a blind-spot when it came to software. It took them decades to realise that people expressed themselves artistically in this medium. Archivists didn't know that they should preserve it like we do other media.

      I know how easy it is to mock these efforts (Eg, the tag "!nothingofvaluewaslost") but please consider supporting and justifying this digital culture as part of a wider effort to justify software expression.

      It's easy to pick out dumb software but closing

    2. Re:I say by geekoid · · Score: 4, Funny

      um..do you have a link to that watch?
      And more importantly, a song:
      sung to the spiderman tune.

      "Iron Man, Iron Man
      Does whatever an iron can
      Presses pants really fine
      Keeps those pleats right in line
      Look out! Here comes the Iron Man" - Marvel

      --
      The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
    3. Re:I say by Brian+Gordon · · Score: 4, Informative

      !nothingofvaluewaslost means that they disagree with the tag nothingofvaluewaslost. The '!' is a negation. gb2/digg

  2. Marketing and Management already know! by CorporateSuit · · Score: 5, Funny

    We can just store everything in the cloud! Problem solved!

    --
    I am the richest astronaut ever to win the superbowl.
    1. Re:Marketing and Management already know! by NoobixCube · · Score: 4, Funny

      In the cloud? Oh my god! What happens when it rains?! The farmers will have all our data! We'll have to sue the farmers for their harvest, since their crops will contain all the data and applications!

      --
      Admit it. You post strawman arguments as AC so you get modded Insightful for refuting them, rather than Troll
  3. Re:Anal by CRCulver · · Score: 4, Informative

    Even today with items like the Rosetta stone it's not worth much more than a Trivial Pursuit question - we'd not be any more educated or intelligent if stuff from 2000 years ago hadn't gone missing.

    There have been instances when the metallurgy of times past was remarkably superior in some respects to later arts. Think of Damascus steel or Chinese bell-casting. Though the general trend of technology is constant progress forward, in certain cases the ancients were able to teach us a thing or two.

  4. Information outlives technology by starfishsystems · · Score: 5, Insightful

    "I often ask, 'Everyone in the audience who thinks they're going to be using the same word processor in ten years, raise your hand.' No hands go up. 'Everyone who has data around that's going to have value in ten years?' After a minute's thought, every hand goes up. The lesson is clear: information outlives technology."
    - Tim Bray

    --
    Parity: What to do when the weekend comes.
    1. Re:Information outlives technology by Sebastopol · · Score: 4, Insightful

      Been using Excel, MS Word since 1990 and Quicken since 1992.

      I can still open all my work from my thesis, and can search credit card purchases from 20 years ago.

      No problem here.

      --
      https://www.accountkiller.com/removal-requested
  5. Migrate, migrate, migrate... by I.M.O.G. · · Score: 4, Insightful

    The only motivation for a company to invent new ways to preserve data long term is to provide it as a service so they can profit from it. Other than that, a companies main goals are deleting everything it legally can. Anything that no longer exists can't result in a lawsuit.

    Everything that is preserved is a potential liability. For items requiring indefinite retention because they are critical to the business... They will be stored, redundant, and backed up appropriately. As the systems that provide those qualities age, they will be replaced in regular maintenance and upgrade schedules as economics and timing come together in the right proportions. In that way, reliability and long-term survivability are maintained - nothing stays on ancient systems that are unmaintainable forever. When systems go out of support, everybody has already been looking to the next solution to migrate to.

    So what's wrong with this approach? Its essentially what all "big" companies are currently doing. I don't believe in this proprietary format FUD either - if the proprietary format is no longer supported, you migrate. Potential of future cost to migrate is the only concern, not survivability.

    Migration is todays solution to long term storage and I see no reason it should be ignored. Like security, data retention is an ongoing objective that requires maintenance - its not some end-state. Dreaming of a solution that will just last forever seems archaic, no?

  6. Re:Archive... by Opportunist · · Score: 5, Insightful

    OPEN file formats and OPEN hardware, well documented.

    Even if no program exists anymore to read your data, as long as you have the specs you can rebuild it. And I mean hard- AND software. If you know how to build it, you can build it provided you have the means. And I'm pretty confident that our future cousins will be able to build a current computer with their future technology, as long as they know WHAT they should build.

    --
    We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
  7. Re:Anal by rugatero · · Score: 5, Insightful

    I'm reminded of this story from a few years ago, where a 500 year old Leonardo drawing inspired improvements in mitral valve heart surgery.

    --
    This comment is for entertainment purposes only. Any similarity to real insight or information is purely coincidental.
  8. Re:Of course by Bragador · · Score: 4, Interesting

    If archeologists find knives and trash to be important in a search, I'd say the average pictures that we are taking today might actually be very intereting to future generations for they represent normal life.

  9. They won't care either by rtfa-troll · · Score: 4, Insightful

    Most of the garbage that we have now just isn't worth keeping. The biggest problem is filtering out the junk we have so that we know what is really valuable. That would be things like great music; writing; the origins of software freedom; works of history and biography etc. Then we could store that, but the problem is we mostly store SOX inspired lies for compliance audits. This garbage takes away from any effort to store serious stuff long term. Who could we trust to do the filtering? The govt? (no please don't answer that :-)

    --
    =~ s,(.*),<sarcasm>$1</sarcasm>,g if any_point_you_wish();
    1. Re:They won't care either by frieko · · Score: 4, Insightful

      I think we can trust culture itself to keep the valuable stuff. Culture is evolutionary. Good memes (Romeo and Juliet) are repeated, lame memes (Paris Hilton's The Hottie and the Nottie) are weeded out by forgetfulness.

      The problem lies in keeping the unimportant stuff. Nobody cares about your myspace, but if an archaeologist came across a 3000 year old obscenity on a bathroom wall, it would be the find of a lifetime.

    2. Re:They won't care either by GrpA · · Score: 5, Insightful

      Actually, I don't think garbage is the problem. I don't think there is a problem as it's being presented to us. Lots of printed media is destroyed also. Just the other day I found pieces of a five hundred page story I wrote a long time ago, then lost the disk. I'm not going to type it in again, so I just discarded it. It's not the first time in history and won't be the last. Very little of what is written is ever published. Most of it is discarded by our relatives after we die.

      I think the real issue is that some people feel a need to collect everything that's ever created, like digital horders. If a tax return is old enough to be on floppy, then you don't need it anymore and any critical information from it probably exists somewhere else.

      Content with real value self-perpetuates and remains and while some value is lost through attrition, such as websites going down, the consequences are often miniscule in comparison to the concept of archiving everything permanently.

      Maybe we do lose those digital pictures on the floppy (and the box of floppies it was stored in) but if it was critical, we'd do something about it. We might print it out, but we lose albums too. They get wet, mouldy and burned, and we lose those memories too.

      Too often it's not that important to us to keep until we want it later and can't find it.

      Like most things horded, the value lies in keeping good care of what is most important to us, and often we find that what we want to keep is just a reflection of what matters the most.

      To quote an interesting book entry I once read: Perspective. Use it or lose it.

      That goes for hording digital stuff too.

      GrpA.

      --
      Enjoy science fiction? "Turing Evolved" - AI, Mecha, Androids and rail-gun battles. What more could you want?
    3. Re:They won't care either by GrpA · · Score: 4, Insightful

      What you say is essentially correct, I'm just pointing out that this has always happened, regardless of the transition to digital.

      How many pages of Leonardo DaVinci were used over the centuries to start fires or even wipe asses? How many inventions, concepts and ideas were lost forever? How many musical pieces were lost to antiquity simply because they weren't as popular during the era and slowly became removed from history, piece by piece?

      What knowledge became undiscovered when the library of Alexandria was lost?

      Losses of information are perpetually occuring. Digital stuff is less likely to be lost because it's so easy to copy, so anything needed for long periods tends to be perpetuated by infinite copying.

      Archives are nice (Thankyou Wayback Machine) when you want to find something now lost, but I don't think blaming media is the cause.

      Think, as you've put it, that it's gone because someone decided to get rid of it... Did they make the right choice? Maybe not, but it was theirs to make.

      I think a bigger issue is DRM... I went to watch some old movie clips I had on an archive the other day while browsing it... They all failed - I didn't have the correct codecs. So I tried to download/find them. Nope. They were gone.

      So the clip, which I wanted to view was lost... All I have to know what it was is "funnyvideoclip.avi"

      But they were only of value to me so what's the big deal?

      Maybe if it was my wedding video, I'd be more annoyed, but then, how many wedding videos, pictures, photo's and even paintings have been lost throughout history?

      Just because the loss affected me, it doesn't mean there's a dark age. I'm saying knowledge is always being lost, due to obscurity, damage, natural disasters, political viewpoints and many other factors.

      So let's say we lose all copies of programs for the Commodore 64... Is it a dark age? Or is the knowledge we've kept of the machine quite sufficient for contemporary times.

      If anything, I think even more retention is made of digital material than non-digital... Just try finding a service manual for a 40 year old obscure car. Not very likely, but if there is a copy anywhere, I'd almost put money on it being digital !

      GrpA.

      --
      Enjoy science fiction? "Turing Evolved" - AI, Mecha, Androids and rail-gun battles. What more could you want?
  10. Re:not to worry.... by CarpetShark · · Score: 4, Insightful

    Historically, things that have been very uninteresting at the time, have been hugely valuable to researchers later on. We may not care about the countless people talking "crap" on bebo right now, but in a few hundred years it might be a different story. When people can easily analyse all those posts for meaningful psychological profiles that aren't currently understood never mind modelled and easily detected, all of that could tell a lot about our society. Even rubbish tips from thousands of years ago are hugely valuable to paleontologists.

    This goes more so, for important government records, etc. Peter Quinn did a great job of explaining that, with his Sovereignty talk.

  11. The article mixes up 2 problems... by BUL2294 · · Score: 5, Interesting

    The article talks about two very distinct and different problems--hardware and file formats. The author has a point about the hardware--if the media goes bad or if there is no way to read the data, then the data is lost. However, the author is completely off-base when it comes to file formats...

    The author specifically mentions WordPerfect files. Bad example! The default file format in Wordperfect X4 (released in April, 2008) is the same as what was used in WordPerfect 6--which came out in 1993 (DOS and Windows). While I can't speak for OpenOffice or Google Docs, MS-Word can read those files (and WordPerfect 5.x files) with a simple File/Open. Excel opens Lotus 1-2-3 files as well. So, Word can open popular formats in use since 1988 (WP 5.0) and Excel can open some formats in use since 1983 (1-2-3 r1a). You can also buy programs like FileMerlin to convert old documents.

    Frankly, when it comes to file formats, conversion apps will exist for a LONG time. For DOS apps, you could even go so far as to create a v/m or use Dosbox, load up your obsolete word processor (I miss "Leading Edge Word Processor"!) and copy/paste the text into Word or Notepad...

    Image files, sounds, & videos are no exception... GIF has been around since 1987, JPEG has been around since the early '90s (opening those on a 10Mhz 8088 was slow!), and MPEG/WMV/AVI/Quicktime videos are easily openable...

    Finally, the more people that are affected by obsolete files, the more interest there is in some way to convert the data... But don't forget that a LOT of the data is junk--do you really care about your 7th grade paper you wrote on Hong Kong in 1989?

    --
    Windows 3.1x calc: 3.11 - 3.10 = 0.00
    1. Re:The article mixes up 2 problems... by jejones · · Score: 4, Interesting

      About mine? No... but how about the next Einstein's 7th grade paper, or the next Picasso's?

  12. false analogies by bcrowell · · Score: 4, Insightful

    This is one of those fairly bogus, highly overblown stories that keeps cropping up every so often. A similar one is the supposed shortage of scientists and engineers in the US, which has never existed, and is always supposed to be coming Real Soon Now; in fact, the data to support this claim are always either nonexistent or wrong. (E.g., they compare Indian college graduates with US college graduates, but the Indian degree they're comparing with a U.S. bachelor's is more equivalent to an AA degree in the U.S.)

    First off, the concern about incompatibility of physical media was valid 30 years ago, but it's a false analogy to try to apply it to today's situation. Thirty years ago, I had data on a mixture of 8-inch floppies and 9-track tapes. I can't read an 8-inch floppy anymore, and although 9-track tapes still exist, most 9-tracks from that era are no longer readable due to physical deterioration of the media. But that was all in an era when hard disks were expensive, and the internet didn't exist. Today, I have all my data on hard disks of various computers, and I use file synchronization software to keep them all in sync. If one of my hard disks dies, I replace it, and I haven't lost any of my data. (I also have backups on optical media, but I basically never need those.)

    There's also the concern about formats. People tend to bring up, for example, the image of rooms full of physically deteriorating 9-track tapes with data from old NASA space probe missions. The formats are often not documented. The thing is, most of our data isn't at all analogous to the raw data from Mariner or Voyager or Viking. Those were unique historical events, and the only way to get more data like the data they collected is by sending another space probe. (People also tend to vastly overestimate the value of scientific raw data. It's extremely uncommon for raw data to be of interest decades later.)

    Most of the world's data isn't in some obscure NASA format, it's stored in formats that are used by tons of people, and are extremely well documented. Sorry, but I just don't believe that the knowledge of how to decode Adobe Acrobat format is going to be lost to future generations. Ditto for html, jpeg, and mp3.

    Another thing to keep in mind is that nowadays you can emulate old computers with excellent performance. For instance, my first home computer was a TRS-80. I can still run my old TRS-80 games on my linux box, using an emulator. Sure, emulation isn't perfect, and some information may be lost. But the claimed threat of data loss is vastly overblown.

    The biggest threat to the preservation of information isn't technological change, it's copyright. The most likely reason that I wouldn't be able to get back an old piece of digital data is that the people who tried to preserve it and put it on the web got sued by the people who own the copyright -- the same people who let it go out of print. The economic incentives are to hold on to your copyrights (because that doesn't cost you any money) and send out DMCA notices to anyone who puts it on the net (because that doesn't cost you any money either), all in the hope that your content will be worth eleven cents fifty years from now. This is exactly what we see happening, for instance, with ROMs for old video games, which you can play in MAME, except that you have to find an illegal source for the data, because the owners of the copyrights aren't willing to sell you a copy.

  13. I'm just helping the RIAA by goombah99 · · Score: 4, Insightful

    Garbage isn't the problem.. the problem is that we have millions of copies of the same data. Think of the 50gb of video games you may have installed.. 10 million people have the same games as you. Music? Unless you performed it yourself or it's sub-underground, chances are millions of people each have multiple copies of it. The anime you've torrented has 10,000 downloads. .

    No, see.. actually I'm just keeping a back up for the RIAA in case they lose their copy. PLus I keep it all transcoded to the next generation formats at no charge. And on top of that it's forward deployed for easy re-distribution without bottlenecking their servers. I even paythe lectric bill on the disks and internet connection. So copies are a good thing.

    --
    Some drink at the fountain of knowledge. Others just gargle.
  14. 20th Century culture lost by Simonetta · · Score: 5, Interesting

    I'm more concerned about losing the culture from the 20th century.

      Everyone born after 1975 hates the RIAA, doesn't pay any attention to whatever they say, and file-shares gigabytes without a thought to the music industry definition of 'piracy'. This is as it should be. It means that the music and movies of the (for now) young people is safe because it is widely circulated outside the control of those who have deluded themselves into believing that they own it.

      It's all the stuff from the first 2/3rds of the 20th century that will disappear. Because the people who like it are in their 50's, 60's, and 70's now and don't have the technical skills to copy and distribute it. Plus they actually trust the corporations will preserve it. I mean all the books, music recordings, television shows, movies, and plays from the first half of the 20th century. The stuff that is under 'infinite copyright' and will never be in public domain because the corporations will simply pay off the politicians to endless extend the copyright period, as they do now.

        As soon as all this stuff stops selling (and who nowdays is paying money for the book that was #3 on the New York Times BestSeller list of Oct 28, 1936?), and can't be legally copied because it can't enter public domain, then the corporations will just destroy it. Pulp the books; convert the film stock to ethanol to power their SUVs; dump the magazines in the oceans or in nuclear waste sites to absorb neutrons. When that happens, all this culture will be gone and historians 200 years from now will have little idea about how civilized people actually thought and acted in the critical early years of the modern technological age.

        You can talk to the old people about the need to preserve their culture by making 'illegal' copies of the books, magazines, and movies that were important to them, but they are just simply and completely clueless about the extent that their culture will die as they do.