Saving Digital History
Gavinsblog writes "The Washington Post
is reporting that the Library of Congress in the U.S. plans to initiate the $100 million National Digital Information Infrastructure and Preservation Program (NDIIPP). It is hoped that the project will lead to the preservation of data that is constantly changing on the Internet. But I wonder who will choose what is worth saving?" This may remind you of the LOC's effort to preserve and digitize the audio collection in the National Recording Registry.
Good question. Why not sue them for infringement for reproducing your post and find out?
KFG
This may sound like a joke but I really hope they save the big red dot. I dont know if the website is still in existence but a while back there was a website that had a big red button. When you clicked it, it said you have clicked the big red dot. The counter had some ridiculous number. This was back when it was envogue to show off your hit count.
> There are plenty of books that last hundreds of
;) No, really. Not only does it resolve the longevity issue, but it could also solve the issue of obsolete reading hardware (seems to me it'd be easier for a distant generation to rig up a punch card reader than a cd-rom drive). Punch cards are in a rather obvious format as well, if worst came to worst and humanity nuked itself back to the stone age.. in ten thousand years a disc that looks like a mirror is probably harder to translate than a piece of paper with regularily spaced holes.
> years if kept in appropriate conditions.
My suspicion is that punch cards will make a return at some point.
I think the only difference will end up being the material used; how many centuries could a stainless steel plate with pin sized holes last in a library's basement?
The difference being that this archiving is _digital_, though...
Didn't you pay attention in that IT class when they were explaining the difference between Digital and Analogue? Digital's main advantage is its reproductability. So if, say, the CIC Lib^H^H^H^H^H^H^HLibrary of Congress were to refresh the information once every five years or something like that, then you've got an indefinate storage period. The problem with it is that it needs constant maintenance. The reason this is better than analogue archives is pretty simple... when analogue decays, it's pretty much never going to achieve its original quality. You can do things to try and make it similar, but you're never going to get it as pure as the original.
With digital archives, you can avoid the decay simply by transferring. This isn't an option really with analogue because once you transfer, you tend to lose quality. But bits are simply 1s or 0s, and digital transfer can be perfect. Throw some md5 checksums in there to make sure that you don't corrupt the data, and boom... you've got perfect digital copy.
Karma: Non-Heinous
we have an incredible fascination with spending today looking at where we were yesterday instead of where we are or where we're going.
I'm not talking about history. I love history. My shelves are well stocked with various dead trees delineating history.
I'm talking about our own lives. When we go on vacation we tend to spend most of our time *documenting* our trip rather than living it. Then we live it "in absentia" as a kind of recreational post mortem.
It's a fascinating to thing to observe, but I admit it puzzles the hell out of me.
This point was driven home to me a while ago when someone pointed out how odd it was that I only have one photograph of my SO of 10 years. I only have it because my mother took it. In my mind why would I want a photograph when I could just look at *her*?
KFG
It is hoped that the project will lead to the preservation of data that is constantly changing on the Internet.
One possible reason: because the OIA and Company might need the data to track down terrorists, etc. (Much the same way that the FBI keeps a collection of outdated phones books.)
After all, when the events of Iran-Contra blew over, Congress quietly passed a bill authorizing the CIA to use any Federal agency for cover. Why not the Library of Congress? Indeed, where else? Makes perfect sense.
-kgj
I noticed in the article that one of the topics on which information was being preserved about was 9/11 and that got me thinking.
On a broader scale news media love the internet because they can make outlandish claims when a story first breaks and then modify it as the facts become available. How do we know whats being preserved is accurate ?
Secondly, do we trust the people controlling all this nice, easily modified information not to change it to suit some political whim ?
They say the victor writes the history book. Digital storage will allow the victors to run a few drafts by their spin doctors first.
Do not try to read the dupe, thats impossible. Instead, only try to realize the truth
What truth?
There is no dupe
The important information will save itself without outside help.
For example if talkorigins.org was wiped out of existance tomorrow, the theories it has created will live on in the minds of those who have read them. These essays can be easily recreated by re-reading the various creationist works. On the other hand, if the various creationist works were destroyed, they would probabally not be recreated because they have already been refuted.
The history of information is the history of massive portions of it being eliminated, but then either re-printed, re-discovered, or re-invented centuries later.
The Catholic church 'knew' the earth was the center of the universe.
Along came Copernicus with his helio-centric theory, and the popes tried to lock him in his house for his entire life.
Now, if the modern versions of these men were to make the same claim, they would be soundly laughed at.
So, while this is a noble effort, it is merely a collection of data. Time itself the bayesian filter that will determine which parts of the internet are important.
-Brett
Sounds great, why is it going to cost 100 million dollars? Can we say pork?
Since the public domain died back in the 1920's, and since this is about digital content, it stands to reason that pretty much all of the content that LOC is talking of preserving will be covered by some sort of copyright, and an increasing portion will be protected by some sort of DRM. What will the LOC stand be on this?
Since the LOC seems to hold some of the strings over implementation of the DMCA, they can obviously craft a loophole for themselves. But it will be interesting to see what that loophole is, and how it will work. Will they simply leave the stuff under DRM, and have their own copy of keys, or will they manage to have an unprotected copy?
Enquiring minds want to know.
The living have better things to do than to continue hating the dead.
I didn't say it was practical, or suggest that the density would be anything to write home about.. but the fact of it is, we haven't yet been able to develop a digital storage format that is longer lived than punch cards. ;)
i'm sure something better that's got a life of a thousand years or more will come along eventually, but speaking in the here and now the only way to get that is with holes in a piece of paper.
There's a live backup of the Internet Archive at the Library of Alexandria in Egypt. Thus, no single government can censor the archive. More duplicates may be established in other countries.
Perhaps unfortunately, it's easy to remove material from the archive. Just put a "robots.txt" file on your site, and not only will it not be captured again, the archive will immediately refuse to display copies of the blocked site. This seems to be enough to keep the militant copyright holders happy.
Most text is saved, but not all pictures, and very little video. This is good enough for most historical purposes.