Slashdot Mirror


Saving Digital History

Gavinsblog writes "The Washington Post is reporting that the Library of Congress in the U.S. plans to initiate the $100 million National Digital Information Infrastructure and Preservation Program (NDIIPP). It is hoped that the project will lead to the preservation of data that is constantly changing on the Internet. But I wonder who will choose what is worth saving?" This may remind you of the LOC's effort to preserve and digitize the audio collection in the National Recording Registry.

24 of 133 comments (clear)

  1. one persons trash... by trefoil · · Score: 5, Insightful

    is another persons treasure.. I'd say just save it all and allow others to sift through and decide what is worthwhile and what is worthless.. just like the library..

    1. Re:one persons trash... by whereiswaldo · · Score: 3, Insightful
      Repeat after me:

      Disk space is cheap.
      Disk space is cheap.
      Disk space is cheap. ....
      Save everything. ;)

      "The Navy has both a tradition and a future--and we look with pride and confidence in both directions."

      Admiral George Anderson, CNO, 1 August 1961.
  2. skip slashdot. by Anonymous Coward · · Score: 5, Funny

    No need to add slashdot as one of the website. They keep reposting stories here as an initiative to preserve their own history.

  3. New Media Doesn't Last by spun · · Score: 4, Insightful
    It all degrades faster than plain old ink on paper. There are plenty of books that last hundreds of years if kept in appropriate conditions. Film decays pretty rapidly. Tapes don't last, even CDs and DVDs wear out pretty quickly. Gopher is all but gone. Web pages disappear daily.

    The irony is that, while digital files could be preserved indefinitely in absolute perfection, many are being completely lost in much less time than it would take a book to turn to dust.

    Kudos to the folks at the Library of Congress, and other projects like the Wayback Machine who are working to preserve a surprisingly ephemeral media.

    --
    - None can love freedom heartily, but good men; the rest love not freedom, but license. -- John Milton
    1. Re:New Media Doesn't Last by Xzzy · · Score: 4, Interesting

      > There are plenty of books that last hundreds of
      > years if kept in appropriate conditions.

      My suspicion is that punch cards will make a return at some point. ;) No, really. Not only does it resolve the longevity issue, but it could also solve the issue of obsolete reading hardware (seems to me it'd be easier for a distant generation to rig up a punch card reader than a cd-rom drive). Punch cards are in a rather obvious format as well, if worst came to worst and humanity nuked itself back to the stone age.. in ten thousand years a disc that looks like a mirror is probably harder to translate than a piece of paper with regularily spaced holes.

      I think the only difference will end up being the material used; how many centuries could a stainless steel plate with pin sized holes last in a library's basement?

    2. Re:New Media Doesn't Last by Mostly+a+lurker · · Score: 5, Insightful

      Sorry, but this idea will not fly on a number of grounds. Consider how many punch cards would be needed to save even 4.7GB of data (contents of one DVD). It would take over 50,000,000 cards (even if they did not contain sequence numbers). The creation and storage costs would be astronomical and reading them back in to find any data you wanted would take weeks -- just for a single DVDs worth of data. Further, much of the most useful data (images and sound recordings) are more difficult to store on punch cards than almost any other alternative medium.

    3. Re:New Media Doesn't Last by JustaGiga · · Score: 4, Insightful

      It's not only a concern that physical media may become obsolete, but also the algorithms in which data is encoded on the media. We have lots of old backup media (reel to reel tape, 8mm tapes) at work that are probably still readable, but no one knows how the data was encoded on that media (or more importantly,) what information is on which tape.

      Most commercial tape backup solutions have proprietary encoding solutions, and who knows if that company is going to be in business/supported in 50 years. In fact, for true(r) long-term storage, it's recommended to copy the data from the commercial tape backup solution copy to plain old tar.

      Keeping an archive on media that will be around in 50 years seems like a minor point compared to finding the exact tape with the right data you need in a format you can still decode.

      -JG

  4. WaybackMachine by ChunKing · · Score: 5, Informative

    Isn't this already being done by the WaybackMachine (http://www.waybackmachine.org)?

    --
    cogito ergo sig...
  5. That's good. by Black+Parrot · · Score: 4, Funny


    I deleted all my porn, and I was afraid I wouldn't be able to get it again when I need it.

    --
    Sheesh, evil *and* a jerk. -- Jade
  6. Something Old, Something New by SparklesMalone · · Score: 5, Funny

    How much energy should humanity spend remembering its past? I love history, but frankly I'd rather they fund more discoveries (i.e. NASA) than archive drivel like my slashdot musings.

  7. What about the DMCA? by kfg · · Score: 3, Interesting

    Good question. Why not sue them for infringement for reproducing your post and find out?

    KFG

  8. Quality not quantity by wiggys · · Score: 3, Insightful
    We already suffer from information overload as it is. Why bother to save the hundred million Geoshities webpages anyway? What's the point of keeping all the data when it's boring and irrelevant?

    Plus not all the data can be saved anyway... sites such the Internet Movie Database, Amazon.com, and even Multimap are database-driven. Even assuming you get access to the underlying database you still need to preserve the code which gets used to generate the pages. And for what purpose?

    Add to that the problem of accessibility. If the data isn't laid out in an easy-to-browse fashion then it's as good as dead anyway. I prefer to browse a library by topic, not searching for keywords and hoping a nice book pops out.

    --

    Sorry, but my karma just ran over your dogma.

  9. sloshdat and Mod Point for history by QEDog · · Score: 3, Funny
    "But I wonder who will choose what is worth saving?"

    Well, maybe they can come up with a system where people post what they think it is important in history and then some of the same people moderate that using a unit called Mod Points up or down to see if they are or not worth saving... maybe call it sloshdat.

    A mechanism would be deviced to protect the figures that make history against the people reading the history, and effect that could be called Sloshdatted.

    I'm sure that with a system like this, historic figures such as many of the presidents would be Modded Down, while anyone who trashes an established monopolistic corporation would appear in the history books.

    A system like this, would, without any doubt, save and Mod Up a comment like the present one for future generations.

    --
    "There is no teacher but the enemy."-Mazer Rackham
  10. Open plea by grub · · Score: 3, Funny


    Dear U.S. Library of Congress,

    Although not a U.S. citizen, I implore you to retain redundant backups of the website goatse.cx. Losing this website to a disaster would be tantamount to losing the collective works of Shakespeare, DaVinci and Picasso. The goatse.cx guy is an artist in the truest sense of the word.

    Yours very truly,

    grubby

    --
    Trolling is a art,
  11. No material can be ignored. by Ignorant+Aardvark · · Score: 3, Funny

    We need to take extra precautions to preserve some "movies", because, ahhh, they contain certain "positions" unlikely to be witnessed before or since outside of their "industry." I will therefore generously donate 500 burnt CD's of such movies to the people compiling this digital library.

  12. Actually.. by NotAnotherReboot · · Score: 3, Informative

    From the article:

    On top of the $5 million the library received for planning the initiative in 2000, the plan approved yesterday releases another $20 million of funding to develop a system for evaluating and storing digital information. Just as the library receives more than 20,000 printed pieces each day but keeps less than half, it now faces the herculean task of deciding what digital information should be saved for future generations.

    --
    The library doesn't keep all of the printed information it receives, keeping all of the information online is an enormous, if not possible task. The archive.org has terrabytes upon terrabytes of data, and they don't even come close to having everything that was on the web at any one time. With the budget they're talking about, keeping all of this information would most definitely not be possible.

  13. Wonder what Disney will think by UTPinky · · Score: 3, Funny

    So what I want to know, is if one of Disney's movies get archived, will they sue the Library of Congress?

    --
    I'm only paranoid because everyone is against me...
  14. Geocities by Detritus · · Score: 3, Insightful

    Geocities web pages may be exactly what a future historian is interested in. They tell you something about the common culture and people. Why do you think archaeologists are so fond of ancient trash dumps?

    --
    Mea navis aericumbens anguillis abundat
  15. From the viewpoint of meme theory... by asparagus · · Score: 3, Interesting

    The important information will save itself without outside help.

    For example if talkorigins.org was wiped out of existance tomorrow, the theories it has created will live on in the minds of those who have read them. These essays can be easily recreated by re-reading the various creationist works. On the other hand, if the various creationist works were destroyed, they would probabally not be recreated because they have already been refuted.

    The history of information is the history of massive portions of it being eliminated, but then either re-printed, re-discovered, or re-invented centuries later.

    The Catholic church 'knew' the earth was the center of the universe.

    Along came Copernicus with his helio-centric theory, and the popes tried to lock him in his house for his entire life.

    Now, if the modern versions of these men were to make the same claim, they would be soundly laughed at.

    So, while this is a noble effort, it is merely a collection of data. Time itself the bayesian filter that will determine which parts of the internet are important.

    -Brett

    1. Re:From the viewpoint of meme theory... by cshirky · · Score: 3, Insightful

      "The important information will save itself without outside help."

      That's whistling past a pretty big graveyard.

      The problem is that time changes the definition of interesting. Would you be interested in the ads from a copy of the NYTimes.com from 1998? Probably not, unless you wanted to chuckle at the 667Mhz Pentia selling for $2500.

      Would you be interested in the ads from a copy of the New York Times in _1898?_ Those ads are a view into a world you never inhabited, and expose the preoccupations of the era in a way that the articles don't.

      We can look at the 1898 ads, not because the important information saved itself, but because archivists did. Someday the ads from 1998 will have the same interests for historians and anthropologists. Who will do the archiving there?

      If we leave it to the present to sort the good from the bad, the future will never know what we considered unimportant. If you'd asked anybody in 1960 what that era's biggest technological revolutions of the time were, they'd have all said atomic energy and space travel. The real answers turned out to be the transistor and the birth control pill.

      We are just about the worst possible people to ask what's important now, because we're too close, and it would be hubris to pretend otherwise.

      -clay

  16. Nineteen Eighty Four by Anonymous Coward · · Score: 3, Insightful



    In Nineteen Eighty Four, The Party embraced the digital revolution because they could easily control what the news said about them. (Who controls the past controls the future...)

    Anyway, the point is the government may not be the best to be in charge of this.

    </rant>

  17. Preservation vs DRM by dpilot · · Score: 4, Interesting

    Since the public domain died back in the 1920's, and since this is about digital content, it stands to reason that pretty much all of the content that LOC is talking of preserving will be covered by some sort of copyright, and an increasing portion will be protected by some sort of DRM. What will the LOC stand be on this?

    Since the LOC seems to hold some of the strings over implementation of the DMCA, they can obviously craft a loophole for themselves. But it will be interesting to see what that loophole is, and how it will work. Will they simply leave the stuff under DRM, and have their own copy of keys, or will they manage to have an unprotected copy?

    Enquiring minds want to know.

    --
    The living have better things to do than to continue hating the dead.
  18. Actually cheaper to save everything by Mostly+a+lurker · · Score: 3, Insightful
    I think the practical solution with online data will be to save everything and worry about indexing and selection decades hence when we have much better technologies to carry out these tasks.

    The actual cost of storage is not that high. The highest costs are involved when human intervention enters into the equation.

  19. Archive.org, and its limitations by Animats · · Score: 3, Interesting
    There is, of course, archive.org. That's a surprisingly small operation for what it does. A few volunteers work on the server farm (less than a thousand commodity PCs), and there's a little office at the Presidio of San Francisco. The web crawl is done at Alexa, and the Archive is filled from Alexa's backup tapes, which is why it runs so far behind.

    There's a live backup of the Internet Archive at the Library of Alexandria in Egypt. Thus, no single government can censor the archive. More duplicates may be established in other countries.

    Perhaps unfortunately, it's easy to remove material from the archive. Just put a "robots.txt" file on your site, and not only will it not be captured again, the archive will immediately refuse to display copies of the blocked site. This seems to be enough to keep the militant copyright holders happy.

    Most text is saved, but not all pictures, and very little video. This is good enough for most historical purposes.