Slashdot Mirror


National Archives Cuts Back On Web Site Archiving

hhavensteincw writes "The National Archives and Records Administration (NARA) is coming under fire for a new policy to stop the "harvesting" of a digital snapshot of all federal agency and Congressional Web sites after every Presidential and Congressional term. NARA, which archived more than 75 million Web sites in 2004 after George Bush's first term ended, will not harvest agency and Congressional Web sites when his current term is over because it says agencies are supposed to be archiving Web content on their own. But NARA has been criticized by some for opting out of preserving these important historical archives on the Web."

45 comments

  1. Its not History by nurb432 · · Score: 3, Insightful

    If you dont document it.

    --
    ---- Booth was a patriot ----
  2. interesting in consideration..... by 3seas · · Score: 3, Interesting

    ... the price of storage dropping as it has.

    So what is the real reason for this? Its certainly not cost.

    Is it possible that nobody is interested in the data?

    1. Re:interesting in consideration..... by bumburumbi · · Score: 5, Insightful

      Is it possible that nobody is interested in the data? People may not be interested in the data now, but as time passes, it will become more and more important. I am a bit surprised that the National Archives and the Library of Congress collect so little of the American cultural heritage. In Iceland, where I live, the National Library collects everything on the national TLD (is) three times a year, important sites are crawled more frequently. I know that the US web is several orders of magnitude larger than the Icelandic web. One would however assume that the resources available to the NARA and LC are significantly larger than what the Icelandic National Library has to spend on collecting websites. Collecting a subset of the US web every four years should be well within the means of the US government.
    2. Re:interesting in consideration..... by Kwirl · · Score: 4, Insightful

      I think we all know that the less history remembers of George W Bush's term as president of the free world, the better off we will look in our children's eyes. If he gets lucky he might get off easy with a 'worst president to ever hold the office' footnote.

    3. Re:interesting in consideration..... by Foobar+of+Borg · · Score: 3, Insightful

      If he gets lucky he might get off easy with a 'worst president to ever hold the office' footnote.
      Of course, like with Nixon, you will still have slavering beasties defending him for the next few decades and blaming everything on liberals and campus radicals.
    4. Re:interesting in consideration..... by UncleTogie · · Score: 1

      I think we all know that the less history remembers of George W Bush's term as president of the free world, the better off we will look in our children's eyes.

      Far better our children be aware of history, so they might be less inclined to repeat it.

      Looking good in the eyes of another is not nearly as desirable as acting good and eliminating that worry.

      --
      Don't tell me to get a life. I'm a gamer; I have LOTS of lives!
    5. Re:interesting in consideration..... by Anonymous Coward · · Score: 1, Interesting

      Well, if you can tell me what part of a) opening up dialogue with pinko, commie China and b) getting OUT of Vietnam where Kennedy and Johnson (both "liberals," last time I checked) makes Nixon a "conservative," then please let me know.

      Frankly, I'd have thought the bastard would be more palatable to lefties than to the Buckley crowd -- then again, it may just be that the neocons have confused the definitions so much as to make them meaningless.

    6. Re:interesting in consideration..... by Foobar+of+Borg · · Score: 1

      Well, if you can tell me what part of a) opening up dialogue with pinko, commie China and b) getting OUT of Vietnam where Kennedy and Johnson (both "liberals," last time I checked) makes Nixon a "conservative," then please let me know.
      Nixon was not a leftie by any stretch of the imagination. Opening up a dialogue with Communist China was about realpolitik, not ideology. Plus, Communist China was hardly a bastion of liberalism. It was/is an autocratic regime. You seem to be confusing Communism and Socialism, which are two incredibly different ideals (Sweden and Canada, for example, are not going to be opening up any re-education camps anytime soon). As far as getting out of the Vietnam War, that was something Kennedy likely wanted to do, but Johnson didn't. While Johnson supported a lot of liberal causes (civil rights and so on), the Vietnam War was something *liberals* were protesting him about. It was the *conservatives* in the US who were all for the war and had the whole "America, love it or leave it" mentality, without pausing to ask why we were even in Vietnam (sound familiar?).

      Frankly, I'd have thought the bastard would be more palatable to lefties than to the Buckley crowd -- then again, it may just be that the neocons have confused the definitions so much as to make them meaningless.
      Again, you are cherry picking. And Buckley liked Nixon. Sure, Nixon looks like a liberal now compared with the neo-Nazis we have running the country, but at the time he would be considered very conservative. In the US, we don't have a liberal party anymore. We have the right-wing party (Democrats, with a few being so "liberal" as to be centrists) and the neo-Nazi party (Republicans).
    7. Re:interesting in consideration..... by Anonymous Coward · · Score: 0

      Canada and Sweden do have "reeducation camps" -- when they lock people up for "hate speech" and the like, for the crime of suggesting that maybe, just maybe a Nigerian isn't a Swede no matter what it says on their passport.

      Why? Because "racism" is the only thing that stands in the way of the international worker's brotherhood crap. Capitalism and Communism are two sides of the same coin. Is socialism bad? No. is social marxism a problem? yes. it defies reality.

    8. Re:interesting in consideration..... by FishWithAHammer · · Score: 1

      One would however assume that the resources available to the NARA and LC are significantly larger than what the Icelandic National Library has to spend on collecting websites. Oh god, I laughed so hard.
      --
      "You can either have software quality or you can have pointer arithmetic, but you cannot have both at the same time."
    9. Re:interesting in consideration..... by Anonymous Coward · · Score: 0

      I once worked on a proposal for the Electronic Records Archive for NARA. At the time, the agency's budget was on the order of $30M/year. They anticipated 17 Exabytes of data by 2017. Needless to say, they couldn't afford anything that would have done the job. Don't discount funding -- or a ploy to have it increased, anyway. They're short-funded as it is.

  3. Wrong Time to Quit by Doc+Ruby · · Score: 5, Insightful

    The NARA should not be considering quitting right when the Bush regime is caught red-handed deleting vast amounts of incriminating digital content that it was legally required to archive.

    If anything, NARA should be required to archive even more now, to guard against losing the unique copies at the other ends of official communications and publications. It should upgrade to a policy of redundant archivers keeping separate copies under separate policies, so that a rogue Executive can't flip one switch and toss all the evidence of their actions into the fire.

    --

    --
    make install -not war

    1. Re:Wrong Time to Quit by Anonymous Coward · · Score: 2, Insightful

      I'm not certain if you read TFA (or TFS, for that matter), but these are public websites that the NARA was archiving. They were doing it ONCE every term. If you want to see just what the NARA was doing, click on "Cached" on Google's search page...same idea.

      Honestly, I'm not pro-Bush by any stretch of the imagination, but the NARA's decision is NOT going to help the Bush "regime" hide anything that wasn't already readily accessible to the public.

    2. Re:Wrong Time to Quit by Doc+Ruby · · Score: 1

      I'm familiar with NARA's archiving, and how they're thinking of turning it over to the Internet Archive.org. But there's a difference between publishing and archiving. Yes, those websites are available to the public, but if there isn't a mandated archiving system, then lots will slip away. The sheer volume of published materials, so often revised to cover up abuses after it's slipped out, means that relying on the public to archive it piecemeal will risk lots of important evidence being lost. That's the entire point of the government archiving even public materials.

      Probably the best system would be for libraries, public and private (eg. universities and research institutes), to each independently archive the public sites, the way they do now with newspapers (microfilm, and lately CD/DVD-ROM). If we'd just relied on newspapers to keep their own archives, lots of coverups by newspapers could have revised history in their own master archives by now (and surely have tried). The government material is even more essential, and subject to coverup by an organized criminal regime like the one we're living through right now.

      --

      --
      make install -not war

    3. Re:Wrong Time to Quit by Anonymous Coward · · Score: 0

      And I understand your points as well. However, I honestly have to wonder, with NARA's practices even BEFORE this recent change, what are the odds of picking up something "incriminating" during that once every term archive? It's not like this is a hourly, daily, or even weekly backup...we're talking once every four years.

      What you're talking about is something I DO agree with - the NARA should be taking much more regular archives of internet materials. However, with the current scope of the project, it doesn't make any sense to even bother, imo.

    4. Re:Wrong Time to Quit by the+pickle · · Score: 4, Insightful

      The NARA should not be considering quitting right when the Bush regime is caught red-handed deleting vast amounts of incriminating digital content that it was legally required to archive.

      Am I the only one who read this story and thought that maybe the NARA isn't choosing to do this? I think it's a mighty strange coincidence that they'd be doing this on their own in the last year of a presidency that, for the past seven years, has shown a willful disregard for the law, especially when it comes to the administration's own recordkeeping. Dubya's White House has made the missing files associated with the Clintons look like a single lost receipt by comparison.

      p

    5. Re:Wrong Time to Quit by Doc+Ruby · · Score: 2, Insightful

      I think something is better than nothing. That volume of evidence and "virality" of distribution means that even a snapshot will preserve traces that are hard to totally expunge from the entire Federal government's public records. But if that snapshot isn't even taken, that's much harder.

      The dropping from inadequate archiving to none has crossed a threshold where people are now paying attention and demanding adequacy. The inadequacy of the prior policy means that both those in power in the Bush regime and many outside it agree on changing the program, which is a start for political compromise. Switching to the Internet Archive is a mediocre interim measure, but one which Republicans probably don't like, because even though it's their trademark privatization, it's still publicly funded, and not to a crony just skimming a contract while failing to expensively fulfill it. All of which creates political conditions and momentum towards a more distributed archival process, which could fund archives including libraries as I described.

      So instead of giving up, now is a good time to demand more and better. Because it's the right thing to do, and because the way it's happening shows a path to actually getting it.

      --

      --
      make install -not war

  4. Just outsource it to Google by Anonymous Coward · · Score: 0

    I for one welcome our new Googlovernment.

    1. Re:Just outsource it to Google by Anonymous Coward · · Score: 0

      Well, whether or not it is outsourced to Google, it would certainly be very "Not Evil" of Google to make (a subset of) their cached pages available to projects like archive.org or the Natioal Archives. Say, one of their robot hits per month per site, or quarterly, or even more. Whatever.

      Either way, they have the data, and the robots, and I am all about independently derived data, and multiple parallel datasets. It provides another layer of transparency, y'know?

  5. Should we be surprised . . . by TXISDude · · Score: 2, Interesting

    It really should not come as a surprise that yet another federal agency has decided not to do its job, but only what it wants to do. . . The reality of the situation is simple, the web is becoming a major communications method for the government, and the content will be a lens into the history of the government's interaction with the people. I am actually afraid that this "ignoring the present" is not some form of conspiracy to prevent the recording of history, but more of a case of senior government officials not understanding the world as it is. Not recording the communications of the government to the people, in the form and context of how they were presented is a complete abdication of the responsibilities assigned to NARA and I hope that this story gets the US Congress to intervene and tell teh agency to do its job. Of course, I also hoped that Santa would bring me a new car, and the Easter bunny would bring golden eggs. So, I am ready for another disappointment.

    --
    Hope is the worst of evils, for it prolongs the torment of man. -- Friedrich Nietzsche
  6. These archives are useless.... by Anonymous Coward · · Score: 4, Insightful

    Any archives done by the government are useless because those who control the government can modify them if they so desire. This data needs to be archived by multiple independent private parties.

    1. Re:These archives are useless.... by Anonymous Coward · · Score: 0

      Most people who work at the National Archives are not political appointees that have reason to change archives; they are career civil servants. It is also easier to control the quality of archival preservation and conservation through a single government agency, which is extremely important if we want these documents available in fifty years. While there are problems in the government, there are also massive problems with quality control in the private sector.

    2. Re:These archives are useless.... by timeOday · · Score: 1

      White house email, for instance. (And yes, the link references "lost" email by both Bush and Clinton).

  7. The national archives exists for exactly this. by DragonTHC · · Score: 4, Informative

    their job is to archive public records. Every document produced by the US government is public record unless classified.

    --
    They're using their grammar skills there.
    1. Re:The national archives exists for exactly this. by Anonymous Coward · · Score: 0

      And they are doing that. I think NARA is right about this. I work as an engineer for the federal government (NRC). I have a couple of thoughts about this.

      1) NARA is like the storing house for the government. It is each individual agency's job to develop and maintain a Records Disposition Schedule that identifies what documents are considered Offical Agency Records (OARs) and how long they need to be archived for. The schedules usually dictate that important records will be transfered to NARA after a certain amount of time. While I can't speak for every agency, our agency's disposition schedule already contains a category for information posted to our website so why should NARA waste even one iota of effort to re-harvest information that they is already getting archived anyway?

      2) Most websites nowadays are just views into a Content Management System with a database back end. What good does it do to archive a snapshot of all the webpages within a website on a particular day? The next day, the links get updated to point to new content and the old ones disappear. Even if I concede that it is worthwhile to grab a snapshot in time of a website's static html pages, what you probably end up with in reality is a set of html pages in which 75% of the links don't work. Yeah, that's useful.

    2. Re:The national archives exists for exactly this. by Anonymous Coward · · Score: 0

      Exactly. And let's not forget that it's not up to any one person to decide if any given record is important. Even something of seemingly little relevance now could be sown to be very important in retrospect.

      In short, it's not up for us to decide what documents are important and which aren't. It's up to history.

      So as far as I'm concerned, any reasons they have for not keeping records of documents just doesn't hold water.

  8. Repetative? by tastypotato · · Score: 1

    Doesn't google do this already on their own servers?

    1. Re:Repetative? by Anonymous Coward · · Score: 0

      no

    2. Re:Repetative? by Anonymous Coward · · Score: 0

      IFAIK, Google only keeps the most recent snapshot from it's crawlers. That's not too useful after the sites are taken down.

  9. Easy answer by witherstaff · · Score: 1

    Take whatever budget they have for the web archive and give it to archive.org, let them do the work. Include some long term DVD tech to stash at the library of congress. If the gov't can't do its job, pay someone else to do it.

  10. Re:Getting sick of this by Anonymous Coward · · Score: 0

    You have to get launch clearance from the government to do that.

  11. National Archives have become redundant by museumpeace · · Score: 1

    why should the national archives repeat all the captured page loads that FBI and NSA are getting from the big telecom providers?...they don't just spy on your e-mail you know.

    --
    SLASHDOT: news for people who can't concentrate on work or have no life at all and got tired of yelling back at the TV.
  12. Re:Getting sick of this by quonsar · · Score: 1

    independance

    i do not wish to be exposed to your dance in Depends{tm}

  13. Re:corepirate nazis reducing chance of history.... by Missing_dc · · Score: 1

    for once, this (parent post) dumbass is almost relevant.

    --
    How amazed would you be to suddenly find that you just forgot what I wrote and you needed to reread my post.... again.
  14. Agencies are supposed to be documenting their own? by Random+Q.+Hacker · · Score: 1

    Because we saw how well that plan worked for the White House emails...

  15. doublespeak by osssmkatz · · Score: 4, Interesting

    Back when archives.org was archiving whitehouse.gov, we saw changes in speeches to match the current rationales etc. Is this why they don't want to archive?

    --Sam

    1. Re:doublespeak by QuoteMstr · · Score: 1

      Was archiving whitehouse.gov? AFAICS, archive.org still is.

  16. NARA should continue archiving by siculars · · Score: 1

    I think it is a big mistake for NARA to stop what they are doing. A centralized authority bearing the imprimatur of NARA for creating, implementing, executing and enforcing a standard of archiving is desperately needed. This standard is critical for future historians to be able to make sense of our collective legacy.

    Halting now and distributing responsibility amongst the various federal agencies will foster a haphazard distorted view of the past.

  17. Other Significant National Archive Redactions by Anonymous Coward · · Score: 1, Informative

    Prior to 9/11, the presidential records of the first Bush presidency had been scheduled to be turned over to the National Archives, but the second Bush delayed their release.

    Right after the 9/11 incident, these records were reclassified. Around the same time, there was a wholesale reclassification of documents in the National Archives going back to WWII, making them unavailable to the public.

  18. Problem is bigger than Natl. Archives. by joebob2000 · · Score: 2, Informative

    Private archiving, (e.g. archive.org) coverage is not what it once was either, though maybe for different reasons.

    More and more operators are choosing to protect their "intellectual property" using robots exclude, noarchive, or similar policies.

    More and more websites use dynamic methods to present data, or use more complex interfaces involving javascript, flash, java, etc that make them technically hard to capture.

    Conversations that formerly occurred on usenet now happen on proprietary bulletin board systems that are technically difficult to crawl. Furthermore, most BBS TOS forbid automated crawling.

    It is interesting that as more and more content is backed by databases, it is getting harder and harder to access and search for the desired content.

  19. Yes, but... by webdog314 · · Score: 1

    "NARA, which archived more than 75 million Web sites in 2004 after George Bush's first term ended, will not harvest agency and Congressional Web sites when his current term is over because it says agencies are supposed to be archiving Web content on their own."

    Um, are these agencies the same ones that were supposed to be archiving all their e-mail as well? You know, the e-mail that was all conveniently deleted according to "procedure" just before it was needed in a major congressional investigation?

  20. To be a historian in 100 years... by Deviant · · Score: 1

    I have studied a bit of history at the University level and I am not sure whether the digital age will make that job easier or harder in the future. With the overwhelming amount of online content in blogs and such it will be easier to find accounts of events but harder to seperate opinion from fact. It will be easier to search through being electronic but harder to sort through due to the overwhelming quantity of information on the current internet. It is also much easier to alter unless things like electronic hashes are stored along with the content. And that is with HTTP which is easily readable and not proprietary - I wonder how formats like MS Word docs are going to far with the test of time.

    Are there even organizations out there archiving the wider internet for posterity? With published books they tend to be edited and distributed to libraries and preserved in a physical form where you can find them on the shelves 50 years from now. I don't know of any libraries storing/preserving electroic materials in the same way...

  21. NARA Response by paulwester · · Score: 1

    We read with interest your postings on this topic. The National Archives and Records Administration (NARA) has posted background information regarding our web harvest decision at http://www.archives.gov/records-mgmt/memos/nwm13-2008-brief.html. This background document includes links to our guidance products related to web records and the decisionmaking process we went through to arrive at our decision. Paul M. Wester, Jr. Director, Modern Records Programs National Archives and Records Administration