Bush Administration's E-Mail Deluge May Overload Archive System
Lucas123 writes "The Clinton administration generated 32 million e-mails. Bush's administration has generated 50 times as much data — 140TB, 20TB of which is email — which soon will have to be archived through a new government-built records management system. The new system may not be up to the task because the technology behind it may not be able to handle the sheer volume of data along with the fact that the Bush administration has been slow in providing the National Archives and Records Administration (NARA) with needed information about the records, according to a Computerworld story. Questions have also been raised about millions of missing e-mails from between March 2003 and October 2006. 'It wasn't until this summer that an intensive effort began to share information,' said Ken Thibodeau, director of NARA's Electronic Records Archives."
The other 120 TB was probably just Clinton's porn stash that the Bush administration found while purging off records.
"The Clinton administration generated 32 million e-mails. Bush's administration has generated 50 times as much data -- 140TB, 20TB of which is email -- which soon will have to be archived through a new government-built records management system.
Well, to be fair, email wasn't quite as popular during Clinton's administration as it is now. Then again, the 400GB of e-mails that the Clinton administration must have generated (if it is 50 times less than 20TB) must have been rather hard to store when he left office.
...Now too many many emails.
Whining is Washington's most favorite thing to do.
No more fancy signatures and html crap will cause a 60-80% drop in volume if not more.
Mandate the Usenet way with replies after the original, (it will) teach people to cut irrelevant repeats.
Stop the addition of stupid and ineffective disclaimers.
Teach the use of (ftp) servers for sharing large documents, no more Microsoft sized attachments, send a link.
"The likes of Facebook and WhatsApp are free to those whose privacy is of zero value."
It hasn't helped that the Bush administration has been slow in providing NARA with needed information about the types and volume of data that will need to be archived. It wasn't until this summer that an intensive effort began to share information, Thibodeau says.
I can understand the reasoning that for national security, some information needs to be kept secret. The thing is, the more I hear of this administration's obfuscation of their communications and dealings, I can't help but wonder what in the World they are hiding.
Whenever I receive news that information that we're supposed to have access to from the Bush administration has gone missing, it makes me queasy. There's so much secrecy surrounding random little things that it's started to make me paranoid. Maybe it's just me wanting to blame the last eight years on a scapegoat, but I feel like someone at the top is trying to hide something really big and succeeding.
Besides, only 140TB (or 20 TB)? That's child's play for any competent DB admin, never mind only about $2k worth of hardware to hold it.
Assuming that none of it's been put into the archival system yet, that means they're dumping 140TB on it in one go.
You index 140TB on $2k worth of hardware and come back to me when you're done. Hopefully I won't have died by then.
How much of that is spam? I can imagine they are not allowed to delete spam. Spam has increased, so this would mean that all of it is still there.
The rest can mean a lot of different things. I am forced to work (otherwise no food) with 150MB excel files that I would love to put in a database and would take up at least 10 times less space. And I am not even talking about speed increase and ease of use, because somebody else has the file open, so I can not change the content.
Or perhaps Clinton did not keep everything. Or ...
Don't fight for your country, if your country does not fight for you.
The Bush administration moved the White House from a Notes/Domino based system to a Microsoft Exchange based system.
Before moving, they'd had no downtime -- even when congress was taken out for 2 days by the code red word (they were on Exchange).
In moving, they mysteriously 'lost' all their backups for a period of time that was suspicious as hell, and now they can't scale to handle the capacity issues they face.
In a Notes/Domino world, this kind of archiving problem wouldn't be all that hard to deal with. You'd just need enough storage for it, and create archives per week/month/year (or an archive per individual's mailbox, or whatever) to put on as much hardware as was required. I single checkbox would be all that was needed to have it encrypted as well.
Oh well. I guess if conveniently "loosing" mail when you don't want it found is one of your design goals, than you probably want to migrate to something less reliable.
The problem with quotes on the internet, is that nobody bothers to check their veracity. -- Abraham Lincoln
It has come to my attention that as I prepare to leave office my previous instructions to make all email and other documentation available to the shredder was incorrect. The correct policy is to make everything available to the archiver. If you have any concerns please feel free to pick up a copy of the standard presidential pardon boilerplate from my secretary's desk. Thank you, W
If the g'vt kept the data on you that google does you'd better believe you'd be calling it "doing evil"
Maybe not $2k worth of hardware but $200k will do. Which is still peanuts in government terms. They probably spend that amount on paperclips and toilet paper in the pentagon alone.
Honestly, storing and indexing 140TB of e-mail is a trivial task when you can apply a six digit budget to it.
If their "archival system" blinks at the sight of 140TB of mostly text then it doesn't even deserve the name.
Perhaps you're too young to remember, but Clinton's administration had a problem with missing emails during investigations too (Lewinsky, why hundreds of FBI records on their political enemies ended up in the White House, illegal campaign donations from China, etc).
Yes, but there is a magnitude of difference in importance between lost emails about blow jobs and a little dirty money, and emails about the loss of privacy and civil liberties of US citizens, torture of POWs, and the various other nastiness that GWB et al are suspected of. Much different.
If you want news from today, you have to come back tomorrow.
As with almost all problems where electronic/internet technologies bump into real life issues eg privacy, non-repudiability and simple confidence it is because the Law has not kept up with technology, and that in the USA is the responsibility of the Congress. Writing was thousands of years old, and the printing-press more than 300 years old when the Constitution was adopted in September 17, 1787. The drafters understood the technology.
Today we are blessed with ignorant self serving legislators who do not, and are far too happy to follow hard-case makes bad law hurd thought, eg children, porn, paedophilia, drugs and terrorism. The courts have long held that you can read post-cards, but that if your letter-in-an-envelope is opened then a felony is committed or the information is normally in-admissible.
For this to work people have to start encrypting and signing their e-mails and the Congress and the SCOTUS must enforce identical rules for electronic and hand-written communication.
Specifically you can not go out and discover the entire contents of someone's library and papers in a law suite, and expect to go on a search-engine enabled fishing expedition.
Riiight... Blame it on Exchange.
Seriously, if "conveniently [losing] mail" was the goal of the transition, they could have moved from Exchange to Domino and gotten the same effect.
Forget not, throwing storage (read: money) at any system tends to fix the problem given a competent staff. You don't make a very compelling argument.
Boot Windows, Linux, and ESX over the network for free.
Yea, and there's an aesthetic feel to it too. If I'm in a 20 reply discussion, I like to edit out anything more than 2 exchanges old, and I change the subject title every two mails.
Nothing annoys me more than 20 mails titled "re: call"
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
At the current presidential email growth rate, NTFS isn't gonna cut it for Obama.
There's an inherent architectural difference between storing mail in a database built on Microsoft's JET technology, and one which stores its data in something that is (although distinctly odd) very much like an xml data store. The Domino architecture makes segmenting the archive into manageable parts by date, by person, or by any combination thereof much simpler.
Essentially, the Domino architecture results in exactly what you describe -- throw more storage space at it and you can keep storing more data. The Microsoft architecture does not.
The problem with quotes on the internet, is that nobody bothers to check their veracity. -- Abraham Lincoln
It's been eight years since the Clinton administration. This is 4x the doubling period based on Moore's Law. While Moore's Law relates to transistor density, Wikipedia says that it's roughly similar to gains in disk storage. So in the last eight years, we could estimate disk storage gains of 2^4 = 16x. This doesn't get you all the way to 50x, but it cuts out a big chunk of the gains.
Of course that happens when you embed the 1600x1200 raw image of dick cheney giving everyone the finger with each email
Non impediti ratione cogitationus.