Slashdot Mirror


Saving Digital History

Gavinsblog writes "The Washington Post is reporting that the Library of Congress in the U.S. plans to initiate the $100 million National Digital Information Infrastructure and Preservation Program (NDIIPP). It is hoped that the project will lead to the preservation of data that is constantly changing on the Internet. But I wonder who will choose what is worth saving?" This may remind you of the LOC's effort to preserve and digitize the audio collection in the National Recording Registry.

16 of 133 comments (clear)

  1. one persons trash... by trefoil · · Score: 5, Insightful

    is another persons treasure.. I'd say just save it all and allow others to sift through and decide what is worthwhile and what is worthless.. just like the library..

    1. Re:one persons trash... by NotAnotherReboot · · Score: 2, Insightful

      From the article:

      On top of the $5 million the library received for planning the initiative in 2000, the plan approved yesterday releases another $20 million of funding to develop a system for evaluating and storing digital information. Just as the library receives more than 20,000 printed pieces each day but keeps less than half, it now faces the herculean task of deciding what digital information should be saved for future generations.

      --
      The library doesn't keep all of the printed information it receives, keeping all of the information online is an enormous, if not possible task. The archive.org has terrabytes upon terrabytes of data, and they don't even come close to having everything that was on the web at any one time. With the budget they're talking about, keeping all of this information would most definitely not be possible.

    2. Re:one persons trash... by whereiswaldo · · Score: 3, Insightful
      Repeat after me:

      Disk space is cheap.
      Disk space is cheap.
      Disk space is cheap. ....
      Save everything. ;)

      "The Navy has both a tradition and a future--and we look with pride and confidence in both directions."

      Admiral George Anderson, CNO, 1 August 1961.
  2. New Media Doesn't Last by spun · · Score: 4, Insightful
    It all degrades faster than plain old ink on paper. There are plenty of books that last hundreds of years if kept in appropriate conditions. Film decays pretty rapidly. Tapes don't last, even CDs and DVDs wear out pretty quickly. Gopher is all but gone. Web pages disappear daily.

    The irony is that, while digital files could be preserved indefinitely in absolute perfection, many are being completely lost in much less time than it would take a book to turn to dust.

    Kudos to the folks at the Library of Congress, and other projects like the Wayback Machine who are working to preserve a surprisingly ephemeral media.

    --
    - None can love freedom heartily, but good men; the rest love not freedom, but license. -- John Milton
    1. Re:New Media Doesn't Last by spun · · Score: 2, Insightful
      Yes, I understand the difference between digital and analogue. I didn't learn that in IT class, I learned it when I was 10, on my own, building a robot from scratch using a Z-80 microprocessor.

      That is why I said, "The irony is that, while digital files could be preserved indefinitely in absolute perfection, many are being completely lost in much less time than it would take a book to turn to dust."

      Did you even read my comment before firing off a snide reply?

      --
      - None can love freedom heartily, but good men; the rest love not freedom, but license. -- John Milton
    2. Re:New Media Doesn't Last by Mostly+a+lurker · · Score: 5, Insightful

      Sorry, but this idea will not fly on a number of grounds. Consider how many punch cards would be needed to save even 4.7GB of data (contents of one DVD). It would take over 50,000,000 cards (even if they did not contain sequence numbers). The creation and storage costs would be astronomical and reading them back in to find any data you wanted would take weeks -- just for a single DVDs worth of data. Further, much of the most useful data (images and sound recordings) are more difficult to store on punch cards than almost any other alternative medium.

    3. Re:New Media Doesn't Last by cshirky · · Score: 2, Insightful

      An indefinite storage period is only part of the problem. Even if you keep the 1s and 0s by copying them every five years, file formats go out of scope, and even if you keep the software the file was saved in, the OS that ran it may well be dead (most are, after all) and even if you save a copy of the data _and_ the application that can read it _and_ the OS, what hardware are you going to run it on?

      So its a nested set of problems, with no one solution -- copying, conversion and emulation will all be required.

      There are two major advantages of analog over digital: the first is that inaction over a period of years does not destroy analog material. If you put a stack of paper in a box in the early 90s, it's probably fine. That degree of inaction, however, can be the death knell for digital material. If you put a stack of CD-ROMs or disks away in the early 90s, chances are at least some of that material is gone.

      The second is that while analog degrades slowly, bit-sensitive digital data (encrypted, compressed or executable files) degrades extremely quickly. If you make a mistake handling a book, say, you may end up with one torn page, but if you lose even a small piece of a bit-sensitive file, the entire thing vanishes forever.

      -clay

    4. Re:New Media Doesn't Last by JustaGiga · · Score: 4, Insightful

      It's not only a concern that physical media may become obsolete, but also the algorithms in which data is encoded on the media. We have lots of old backup media (reel to reel tape, 8mm tapes) at work that are probably still readable, but no one knows how the data was encoded on that media (or more importantly,) what information is on which tape.

      Most commercial tape backup solutions have proprietary encoding solutions, and who knows if that company is going to be in business/supported in 50 years. In fact, for true(r) long-term storage, it's recommended to copy the data from the commercial tape backup solution copy to plain old tar.

      Keeping an archive on media that will be around in 50 years seems like a minor point compared to finding the exact tape with the right data you need in a format you can still decode.

      -JG

  3. Quality not quantity by wiggys · · Score: 3, Insightful
    We already suffer from information overload as it is. Why bother to save the hundred million Geoshities webpages anyway? What's the point of keeping all the data when it's boring and irrelevant?

    Plus not all the data can be saved anyway... sites such the Internet Movie Database, Amazon.com, and even Multimap are database-driven. Even assuming you get access to the underlying database you still need to preserve the code which gets used to generate the pages. And for what purpose?

    Add to that the problem of accessibility. If the data isn't laid out in an easy-to-browse fashion then it's as good as dead anyway. I prefer to browse a library by topic, not searching for keywords and hoping a nice book pops out.

    --

    Sorry, but my karma just ran over your dogma.

  4. Choosing what should stay.... by MosesJones · · Score: 2, Insightful

    The answer is simple... what represents the goverment mindset of the day will be chosen to represent that mindset in the future. Cynical ? Of course not, why would they be even handed ? Will they store what Al Jazeera (sp?) says rather than what the Washington Post says, why would the views of Palestine be represented over the views of Israel.

    Or of course they will stear clear of politics and pick only science and absolute news, thus making it pointless for future historians.

    Saving what is said OVER what is already saved is an interesting idea, but will this be targeted beyond those people who already retain everything (like CNN and the BBC) or will it include them ? The BBC store everything, "Just in case", will this money record that information yet again, or will it concentrate on other fields after ensuring that the BBC information is already available?

    Historians of the future will have more information than historians of any other generation. Their problem will be that the miriad of views reflected via this information doesn't mean an increase in the spectrum of political opinion, but the ability of everyone to be opinionated.

    Their worst problem is that the leaders of the day (Bush, Blair et al ) don't stand out like the leaders of previous years. Will anyone rate the speach of Powell or Bush against, Churchill or Kennedy ? Nope. So how to judge politics of today, how to judge what should be stored, we have no leaders of merit, we have only retoric. So choose what to store, and realise that history will judge as much what you choose to save, as what you saved. This is a different problem to that which has faced historians up till now.

    --
    An Eye for an Eye will make the whole world blind - Gandhi
  5. Geocities by Detritus · · Score: 3, Insightful

    Geocities web pages may be exactly what a future historian is interested in. They tell you something about the common culture and people. Why do you think archaeologists are so fond of ancient trash dumps?

    --
    Mea navis aericumbens anguillis abundat
  6. Nineteen Eighty Four by Anonymous Coward · · Score: 3, Insightful



    In Nineteen Eighty Four, The Party embraced the digital revolution because they could easily control what the news said about them. (Who controls the past controls the future...)

    Anyway, the point is the government may not be the best to be in charge of this.

    </rant>

  7. Actually cheaper to save everything by Mostly+a+lurker · · Score: 3, Insightful
    I think the practical solution with online data will be to save everything and worry about indexing and selection decades hence when we have much better technologies to carry out these tasks.

    The actual cost of storage is not that high. The highest costs are involved when human intervention enters into the equation.

  8. Re:From the viewpoint of meme theory... by cshirky · · Score: 3, Insightful

    "The important information will save itself without outside help."

    That's whistling past a pretty big graveyard.

    The problem is that time changes the definition of interesting. Would you be interested in the ads from a copy of the NYTimes.com from 1998? Probably not, unless you wanted to chuckle at the 667Mhz Pentia selling for $2500.

    Would you be interested in the ads from a copy of the New York Times in _1898?_ Those ads are a view into a world you never inhabited, and expose the preoccupations of the era in a way that the articles don't.

    We can look at the 1898 ads, not because the important information saved itself, but because archivists did. Someday the ads from 1998 will have the same interests for historians and anthropologists. Who will do the archiving there?

    If we leave it to the present to sort the good from the bad, the future will never know what we considered unimportant. If you'd asked anybody in 1960 what that era's biggest technological revolutions of the time were, they'd have all said atomic energy and space travel. The real answers turned out to be the transistor and the birth control pill.

    We are just about the worst possible people to ask what's important now, because we're too close, and it would be hubris to pretend otherwise.

    -clay

  9. Re:National Security by tjic · · Score: 2, Insightful

    Do you think that the intelligence agencies are only
    now realizing that this is a useful idea? This article isn't about the black archives - you can assume that they've existed for years and have no such funding constraints.

  10. How to save digtial information? by broothal · · Score: 2, Insightful

    It's always a good idea to save a piece of history. Traditionally, it's been done by writing a book. As we've seen, a book can be read thousands of years later. But what about digital information? The media types changes rapidly and todays storage is obselete tomorrow. So, how will the historians read a "Seedee" 100 years from now? Ok, assuming they actually managed to build a device that can read the data of a CD, the data will most likely be corrupted, since CD's has limited lifespan.

    Now, the only way to accomplish this is to make it a dynamic storage. That is, go with the flow and when a new sooper dooper storage device is invented, copy the data to that, thusly ensuring two things. 1) The data is "refreshed" 2) The data can be read by the contemporary hardware.