Bush Administration's E-Mail Deluge May Overload Archive System
Lucas123 writes "The Clinton administration generated 32 million e-mails. Bush's administration has generated 50 times as much data — 140TB, 20TB of which is email — which soon will have to be archived through a new government-built records management system. The new system may not be up to the task because the technology behind it may not be able to handle the sheer volume of data along with the fact that the Bush administration has been slow in providing the National Archives and Records Administration (NARA) with needed information about the records, according to a Computerworld story. Questions have also been raised about millions of missing e-mails from between March 2003 and October 2006. 'It wasn't until this summer that an intensive effort began to share information,' said Ken Thibodeau, director of NARA's Electronic Records Archives."
The other 120 TB was probably just Clinton's porn stash that the Bush administration found while purging off records.
"The Clinton administration generated 32 million e-mails. Bush's administration has generated 50 times as much data -- 140TB, 20TB of which is email -- which soon will have to be archived through a new government-built records management system.
Well, to be fair, email wasn't quite as popular during Clinton's administration as it is now. Then again, the 400GB of e-mails that the Clinton administration must have generated (if it is 50 times less than 20TB) must have been rather hard to store when he left office.
...Now too many many emails.
Whining is Washington's most favorite thing to do.
There are always going to be people complaining about something, but usually that day-to-day stuff can be handled quite easily enough. When your organization is making the right decisions however, typically everyone remains quiet and they are quite happy.
The dangers of knowledge trigger emotional distress in human beings.
No more fancy signatures and html crap will cause a 60-80% drop in volume if not more.
Mandate the Usenet way with replies after the original, (it will) teach people to cut irrelevant repeats.
Stop the addition of stupid and ineffective disclaimers.
Teach the use of (ftp) servers for sharing large documents, no more Microsoft sized attachments, send a link.
"The likes of Facebook and WhatsApp are free to those whose privacy is of zero value."
I'm so confuzzled. 50 x 32 million emails is 140TB, except 20TB which is email. When you make lots of email messages does it eventually start spawning and budding off things that aren't email?
Besides, only 140TB (or 20 TB)? That's child's play for any competent DB admin, never mind only about $2k worth of hardware to hold it.
It hasn't helped that the Bush administration has been slow in providing NARA with needed information about the types and volume of data that will need to be archived. It wasn't until this summer that an intensive effort began to share information, Thibodeau says.
I can understand the reasoning that for national security, some information needs to be kept secret. The thing is, the more I hear of this administration's obfuscation of their communications and dealings, I can't help but wonder what in the World they are hiding.
Whenever I receive news that information that we're supposed to have access to from the Bush administration has gone missing, it makes me queasy. There's so much secrecy surrounding random little things that it's started to make me paranoid. Maybe it's just me wanting to blame the last eight years on a scapegoat, but I feel like someone at the top is trying to hide something really big and succeeding.
For hiding all your nefarious emails in the noise.
Half those old geezers will be dead before anyone get's around to reading them.
Great, now they've got to deal with the same sort of things we do. Archiving every bit of email that comes into the system, and making sure it's available online for searching and retrieval.
I'm interested in how they're going to be doing it. I've been looking at Global Relay for my own mail archiving. I wonder what they'll end up going with. I asked this a while ago on my blog, too.
Check out my sysadmin blog!
32,000,000 * 50 = 1,600,000,000 = 0x5F5E1000
0xFFFFFFFF - 0x5F5E1000 = 0xFA0A1EFF = 4,194,967,295
If they are using unsigned 32 bit quantities, they still have room to index at least nearly 4.2 billion more emails.
The bulk of the data is probably screenshots of popup messages on the Presidential PC, sent to White House tech support.
The future ain't what it used to be.
I think this country would rather just forget the last 8 years.
All the truly interesting stuff was sent through outside mailservers operated by the Republican party, anyway.
I'd bet a large part of that is uncompressed attachments and probably Word and/or Excel. Also, from Windows users I tend to get bitmaps as screenshots.
Although this is about white house e-mails, this sort of stuff shows how ridiculous it is trying ask ISPs to record all traffic. At least here tax payer money is being used, but an ISP simply does not have that sort of budget. I feel all to often the layman confuses IT with magic and the people in the field as magicians. We are lucky enough if manage to become a level one mage :)
Jumpstart the tartan drive.
Just tell them that NARA needs liberating, and that a precise attack using the Bush administration's archives will save them.
Tell them that 200-300TB of data will be necessary. They'll go in with 140TB and no exit strategy, and their e-mails will be in the archives for decades to come.
- RG>
Hey pal, this isn't a pleasantforest, so don't waste my time with pleasantries!
Why not make every aspect of government open, transparent, publicly archived, and participatory?
How much of that is spam? I can imagine they are not allowed to delete spam. Spam has increased, so this would mean that all of it is still there.
The rest can mean a lot of different things. I am forced to work (otherwise no food) with 150MB excel files that I would love to put in a database and would take up at least 10 times less space. And I am not even talking about speed increase and ease of use, because somebody else has the file open, so I can not change the content.
Or perhaps Clinton did not keep everything. Or ...
Don't fight for your country, if your country does not fight for you.
The Bush administration moved the White House from a Notes/Domino based system to a Microsoft Exchange based system.
Before moving, they'd had no downtime -- even when congress was taken out for 2 days by the code red word (they were on Exchange).
In moving, they mysteriously 'lost' all their backups for a period of time that was suspicious as hell, and now they can't scale to handle the capacity issues they face.
In a Notes/Domino world, this kind of archiving problem wouldn't be all that hard to deal with. You'd just need enough storage for it, and create archives per week/month/year (or an archive per individual's mailbox, or whatever) to put on as much hardware as was required. I single checkbox would be all that was needed to have it encrypted as well.
Oh well. I guess if conveniently "loosing" mail when you don't want it found is one of your design goals, than you probably want to migrate to something less reliable.
The problem with quotes on the internet, is that nobody bothers to check their veracity. -- Abraham Lincoln
n/t
you had me at #!
Something tells me that most of the data stored was rated as 'spam'. The messages were written in arabic and had some bomb images and something like 'we'll destroy you in the name of allah'
It has come to my attention that as I prepare to leave office my previous instructions to make all email and other documentation available to the shredder was incorrect. The correct policy is to make everything available to the archiver. If you have any concerns please feel free to pick up a copy of the standard presidential pardon boilerplate from my secretary's desk. Thank you, W
If the g'vt kept the data on you that google does you'd better believe you'd be calling it "doing evil"
So they have to archive 140TB of data? A corner computer shop has enough hard drives in stock. How is that hard?
Self-proclaimed "Most Advanced nation on earth" that doesn't have enough hard drives ... my ass.
See what happens when you keep sending that same Excel spreadsheet back and forth to the whole distribution list?
The only thing worse than a Democrat is a Republican.
Actually, 139TB is redundant data from endless logs of previous emails being top-posted over. Come on... you didn't expect bush's administration to actually be able to quote properly and tell youtube spam from important government work, did you?
I blame bulky MS-Word documents. If everyone used gzipped, utf-8 text files, you could save fifty terabytes right there.
CfkRAp1041vYQVbFY1aIwA== RV/hBCLKKcSTP5UFK3kqsg==
Part of the problem is the technical knowledge of some of the older members of his staff. The problem came to light when VP Dick Cheney demanded a shredder for his e-mails.
Perhaps you're too young to remember, but Clinton's administration had a problem with missing emails during investigations too (Lewinsky, why hundreds of FBI records on their political enemies ended up in the White House, illegal campaign donations from China, etc).
Yes, but there is a magnitude of difference in importance between lost emails about blow jobs and a little dirty money, and emails about the loss of privacy and civil liberties of US citizens, torture of POWs, and the various other nastiness that GWB et al are suspected of. Much different.
If you want news from today, you have to come back tomorrow.
I guess they had too many secretaries forwarding emails of pictures/jokes/power-point-shits ranging from 5 to 35 megs, to every of their co-workers. lmao
Isn't this why we have Google? Come on government, don't reinvent the wheel. Support our industries and contract Google to do this archival and indexing for you.
As with almost all problems where electronic/internet technologies bump into real life issues eg privacy, non-repudiability and simple confidence it is because the Law has not kept up with technology, and that in the USA is the responsibility of the Congress. Writing was thousands of years old, and the printing-press more than 300 years old when the Constitution was adopted in September 17, 1787. The drafters understood the technology.
Today we are blessed with ignorant self serving legislators who do not, and are far too happy to follow hard-case makes bad law hurd thought, eg children, porn, paedophilia, drugs and terrorism. The courts have long held that you can read post-cards, but that if your letter-in-an-envelope is opened then a felony is committed or the information is normally in-admissible.
For this to work people have to start encrypting and signing their e-mails and the Congress and the SCOTUS must enforce identical rules for electronic and hand-written communication.
Specifically you can not go out and discover the entire contents of someone's library and papers in a law suite, and expect to go on a search-engine enabled fishing expedition.
Riiight... Blame it on Exchange.
Seriously, if "conveniently [losing] mail" was the goal of the transition, they could have moved from Exchange to Domino and gotten the same effect.
Forget not, throwing storage (read: money) at any system tends to fix the problem given a competent staff. You don't make a very compelling argument.
Boot Windows, Linux, and ESX over the network for free.
Thez are not saying that misallocating a resource which costs less than a few missiles fired will hinder the reasonable transition of the american gouvernement.
> check out the blue dress! forward to everybody you know.
we need to save 140 Tb of THAT ?!?
if this is supposed to be a new economy, how come they still want my old fashioned money?
Yea, and there's an aesthetic feel to it too. If I'm in a 20 reply discussion, I like to edit out anything more than 2 exchanges old, and I change the subject title every two mails.
Nothing annoys me more than 20 mails titled "re: call"
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
A single Citadel server can replace dozens of MS Exchange servers. The BerkeleyDB used by Citadel can store up to 256 TB.
However, I guess that using a single server for the President's email would be to cheap and would be rejected in favour of a multi-billion dollar email system the size of a Google farm.
Ask Google to archive it.
This result has zero downtime, system change then lots of downtime and losses.
It could be done the other way around, but then they had already started on Domino and so moving to Domino was impossible as an excuse.
And saying that they could have gone the other way to lose email doesn't prove that they didn't lose email deliberately this time either, so why you said it is a mystery to me.
At the current presidential email growth rate, NTFS isn't gonna cut it for Obama.
There's an inherent architectural difference between storing mail in a database built on Microsoft's JET technology, and one which stores its data in something that is (although distinctly odd) very much like an xml data store. The Domino architecture makes segmenting the archive into manageable parts by date, by person, or by any combination thereof much simpler.
Essentially, the Domino architecture results in exactly what you describe -- throw more storage space at it and you can keep storing more data. The Microsoft architecture does not.
The problem with quotes on the internet, is that nobody bothers to check their veracity. -- Abraham Lincoln
netapp just posted a surprisingly upbeat 2009 earnings forecast.
Good people go to bed earlier.
It's simple. Here's how to play.
For, all of the top people in an administration, do:
Find out who they worked for. Then find out who owns or runs that organisation. Draw lines between the names to represent associations. Then simply count the number of associations each of the names gets.
For example. The shiny new Timothy Geithner worked for:
Kissinger Associates -> Which is a member of Council of the Americas -> Which was set up by David Rockefeller.
or ...
he's a member of the:
Council on Foreign Relations -> which David Rockefeller was a director of.
After you do that a few times with different people on both the democrats and republican sides, you find a small set of names start racking up larger numbers of associations with people in the administrations. The more "hits" they have, the more influence they are likely to have with that government.
You'll start to see the nature of the real politics going on. The political parties are just a sideshow.
Deleted
Just ship copies of the raw files to the Internet Archive. 140TB isn't that much; they put a petabyte in a rack.
Once the file formats have been translated, just point Google at the starting URL and wait a day while it indexes everything.
It's been eight years since the Clinton administration. This is 4x the doubling period based on Moore's Law. While Moore's Law relates to transistor density, Wikipedia says that it's roughly similar to gains in disk storage. So in the last eight years, we could estimate disk storage gains of 2^4 = 16x. This doesn't get you all the way to 50x, but it cuts out a big chunk of the gains.
Of course that happens when you embed the 1600x1200 raw image of dick cheney giving everyone the finger with each email
Non impediti ratione cogitationus.
Funny, maybe.
First, they broke the freaking law. Now, they've put the people who need to handle their data in a difficult situation by not being cooperative.
They're not mutually exclusive, IMO
Please stop stalking me, bro.
Personally I think it will be petty little corruption
But it's OK. The Clinton administration email was full of avis (movies), so it didn't compress very well. The Bush administration's email is much more along the lines of mpgs (phone conversations), so it will get great compression!
Or less using bigger disks. Claims that this is impossible is ridiculous when an average home user can have several terabytes of storage space in a single pc.
I can see from your e-mail address, @yahoo.com, the reason why you have a problem with 20 mails titled "re: call"
President Bush has a solution to the problem. He and Vice President Cheney help reduce the problem by using other email systems. It is ok because Cheney says he isn't in the executive branch, so it's ok. HeE is doing everything he can to use alternative channels to communicate because he has the highest regard for the National Archives and he doesn't want to make more work for them. I think he is very considerate because he has done quite a lot to reduce their workload. We will look back on President Bush and say he astounded us and that we had no idea how inventive he was and the lengths he would go to so that the people wouldn't be burdened knowing every petty little detail of his work, after all, he is the President.
Well, that too, but then I don't get into chats with Spammers.
It's actually a work thing except the title would be "re: invoice" (and always that vague) before totally drifting into totally different topics.
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
I prefer e-mail threads to maintain a consistent subject, especially if people are editing out old entries on the e-mail thread.
With the sheer quantity of e-mails we receive, the number of mailing lists, and the ridiculous number of rules I require to keep my e-mail sorted, I don't need people like you making it harder. If you have something important to say, make it easy to find.
Modding me -1 troll doesn't make me wrong.
Okay, for my reply I'll keep the same header.
Such are preferences. What's easier for one is harder for another. My intent was "to make the new point easy to find" for me.
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine