Bush Administration's E-Mail Deluge May Overload Archive System
Lucas123 writes "The Clinton administration generated 32 million e-mails. Bush's administration has generated 50 times as much data — 140TB, 20TB of which is email — which soon will have to be archived through a new government-built records management system. The new system may not be up to the task because the technology behind it may not be able to handle the sheer volume of data along with the fact that the Bush administration has been slow in providing the National Archives and Records Administration (NARA) with needed information about the records, according to a Computerworld story. Questions have also been raised about millions of missing e-mails from between March 2003 and October 2006. 'It wasn't until this summer that an intensive effort began to share information,' said Ken Thibodeau, director of NARA's Electronic Records Archives."
It hasn't helped that the Bush administration has been slow in providing NARA with needed information about the types and volume of data that will need to be archived. It wasn't until this summer that an intensive effort began to share information, Thibodeau says.
I can understand the reasoning that for national security, some information needs to be kept secret. The thing is, the more I hear of this administration's obfuscation of their communications and dealings, I can't help but wonder what in the World they are hiding.
Whenever I receive news that information that we're supposed to have access to from the Bush administration has gone missing, it makes me queasy. There's so much secrecy surrounding random little things that it's started to make me paranoid. Maybe it's just me wanting to blame the last eight years on a scapegoat, but I feel like someone at the top is trying to hide something really big and succeeding.
Well, most of that 400GB from Clinton's administration was dirty pictures of interns. In all seriousness, though, I don't think the problem will be finding a way to store all that data. The real kicker will be finding information you need in it. Seems to me like the best way to hide relevant and/or damaging e-mails would be to have them stored right alongside truckloads of chain letters.
How much of that is spam? I can imagine they are not allowed to delete spam. Spam has increased, so this would mean that all of it is still there.
The rest can mean a lot of different things. I am forced to work (otherwise no food) with 150MB excel files that I would love to put in a database and would take up at least 10 times less space. And I am not even talking about speed increase and ease of use, because somebody else has the file open, so I can not change the content.
Or perhaps Clinton did not keep everything. Or ...
Don't fight for your country, if your country does not fight for you.
It isn't storage and it isn't finding it, the problem is preserving it long enough to look through and index it. I'm sure that Google and companies that do similar work have the technology to do it. I'm also quite sure that for the right price the Federal government could obtain software to do most of the heavy lifting.
The problem is that the Bush administration deliberately migrated only partially to a new system leaving it in a state of constant risk for bit rot and corruption. It's hard to say how much of it has already been lost due to incompetence.
And remember this is tax payer dollars and a Republican President, I'm sure he's OK with us writing a check for millions upon millions of dollars to correct his inept decision.
There's an inherent architectural difference between storing mail in a database built on Microsoft's JET technology, and one which stores its data in something that is (although distinctly odd) very much like an xml data store. The Domino architecture makes segmenting the archive into manageable parts by date, by person, or by any combination thereof much simpler.
Essentially, the Domino architecture results in exactly what you describe -- throw more storage space at it and you can keep storing more data. The Microsoft architecture does not.
The problem with quotes on the internet, is that nobody bothers to check their veracity. -- Abraham Lincoln