National Archives Cuts Back On Web Site Archiving
hhavensteincw writes "The National Archives and Records Administration (NARA) is coming under fire for a new policy to stop the "harvesting" of a digital snapshot of all federal agency and Congressional Web sites after every Presidential and Congressional term. NARA, which archived more than 75 million Web sites in 2004 after George Bush's first term ended, will not harvest agency and Congressional Web sites when his current term is over because it says agencies are supposed to be archiving Web content on their own. But NARA has been criticized by some for opting out of preserving these important historical archives on the Web."
If you dont document it.
---- Booth was a patriot ----
... the price of storage dropping as it has.
So what is the real reason for this? Its certainly not cost.
Is it possible that nobody is interested in the data?
The NARA should not be considering quitting right when the Bush regime is caught red-handed deleting vast amounts of incriminating digital content that it was legally required to archive.
If anything, NARA should be required to archive even more now, to guard against losing the unique copies at the other ends of official communications and publications. It should upgrade to a policy of redundant archivers keeping separate copies under separate policies, so that a rogue Executive can't flip one switch and toss all the evidence of their actions into the fire.
--
make install -not war
I for one welcome our new Googlovernment.
It really should not come as a surprise that yet another federal agency has decided not to do its job, but only what it wants to do. . . The reality of the situation is simple, the web is becoming a major communications method for the government, and the content will be a lens into the history of the government's interaction with the people. I am actually afraid that this "ignoring the present" is not some form of conspiracy to prevent the recording of history, but more of a case of senior government officials not understanding the world as it is. Not recording the communications of the government to the people, in the form and context of how they were presented is a complete abdication of the responsibilities assigned to NARA and I hope that this story gets the US Congress to intervene and tell teh agency to do its job. Of course, I also hoped that Santa would bring me a new car, and the Easter bunny would bring golden eggs. So, I am ready for another disappointment.
Hope is the worst of evils, for it prolongs the torment of man. -- Friedrich Nietzsche
Any archives done by the government are useless because those who control the government can modify them if they so desire. This data needs to be archived by multiple independent private parties.
their job is to archive public records. Every document produced by the US government is public record unless classified.
They're using their grammar skills there.
Doesn't google do this already on their own servers?
Take whatever budget they have for the web archive and give it to archive.org, let them do the work. Include some long term DVD tech to stash at the library of congress. If the gov't can't do its job, pay someone else to do it.
You have to get launch clearance from the government to do that.
why should the national archives repeat all the captured page loads that FBI and NSA are getting from the big telecom providers?...they don't just spy on your e-mail you know.
SLASHDOT: news for people who can't concentrate on work or have no life at all and got tired of yelling back at the TV.
independance
i do not wish to be exposed to your dance in Depends{tm}
Sacred cows make the best burgers.
for once, this (parent post) dumbass is almost relevant.
How amazed would you be to suddenly find that you just forgot what I wrote and you needed to reread my post.... again.
Because we saw how well that plan worked for the White House emails...
Back when archives.org was archiving whitehouse.gov, we saw changes in speeches to match the current rationales etc. Is this why they don't want to archive?
--Sam
I think it is a big mistake for NARA to stop what they are doing. A centralized authority bearing the imprimatur of NARA for creating, implementing, executing and enforcing a standard of archiving is desperately needed. This standard is critical for future historians to be able to make sense of our collective legacy.
Halting now and distributing responsibility amongst the various federal agencies will foster a haphazard distorted view of the past.
Prior to 9/11, the presidential records of the first Bush presidency had been scheduled to be turned over to the National Archives, but the second Bush delayed their release.
Right after the 9/11 incident, these records were reclassified. Around the same time, there was a wholesale reclassification of documents in the National Archives going back to WWII, making them unavailable to the public.
Private archiving, (e.g. archive.org) coverage is not what it once was either, though maybe for different reasons.
More and more operators are choosing to protect their "intellectual property" using robots exclude, noarchive, or similar policies.
More and more websites use dynamic methods to present data, or use more complex interfaces involving javascript, flash, java, etc that make them technically hard to capture.
Conversations that formerly occurred on usenet now happen on proprietary bulletin board systems that are technically difficult to crawl. Furthermore, most BBS TOS forbid automated crawling.
It is interesting that as more and more content is backed by databases, it is getting harder and harder to access and search for the desired content.
"NARA, which archived more than 75 million Web sites in 2004 after George Bush's first term ended, will not harvest agency and Congressional Web sites when his current term is over because it says agencies are supposed to be archiving Web content on their own."
Um, are these agencies the same ones that were supposed to be archiving all their e-mail as well? You know, the e-mail that was all conveniently deleted according to "procedure" just before it was needed in a major congressional investigation?
I have studied a bit of history at the University level and I am not sure whether the digital age will make that job easier or harder in the future. With the overwhelming amount of online content in blogs and such it will be easier to find accounts of events but harder to seperate opinion from fact. It will be easier to search through being electronic but harder to sort through due to the overwhelming quantity of information on the current internet. It is also much easier to alter unless things like electronic hashes are stored along with the content. And that is with HTTP which is easily readable and not proprietary - I wonder how formats like MS Word docs are going to far with the test of time.
Are there even organizations out there archiving the wider internet for posterity? With published books they tend to be edited and distributed to libraries and preserved in a physical form where you can find them on the shelves 50 years from now. I don't know of any libraries storing/preserving electroic materials in the same way...
We read with interest your postings on this topic. The National Archives and Records Administration (NARA) has posted background information regarding our web harvest decision at http://www.archives.gov/records-mgmt/memos/nwm13-2008-brief.html. This background document includes links to our guidance products related to web records and the decisionmaking process we went through to arrive at our decision. Paul M. Wester, Jr. Director, Modern Records Programs National Archives and Records Administration