White House CIO Describes His 'Worst Day' Ever
dcblogs writes "In the first 40 days of President Barack Obama's administration, the White House email system was down 23% of time, according to White House CIO Brook Colangelo, the person who also delivered the 'first presidential Blackberry.' The White House IT systems inherited by the new administration were in bad shape. Over 82% of the White House's technology had reached its end of life. Desktops, for instance, still had floppy disk drives, including the one Colangelo delivered to Rahm Emanuel, Obama's then chief of staff and now Mayor of Chicago. There were no redundant email servers."
The problem is the procurement process. It takes a hell of a long time to get IT resources ordered, and often by the time they are actually put into service half of their warranty life-time has expired. It has nothing to do with a lack of knowledge on the OMB IT front, it's got everything to do with the red tape they have to cut through to make anything happen.
I recently took over for a staff which had been interned in their positions for the better part of a decade. Out with the old in-house staff, in with the new outsourced IT 'team'.
I can easily see how this happens, outside procurement and ineptitude problems on the part of the previous WH IT staff. When you've got what amounts to 'institutional knowledge death', with the institution carrying on, you've got to over-staff for some time or things fall apart completely while you play catch up. With a situation where you don't understand it all, are under staffed or under skilled, you're faced with only a couple options when you come in behind the curve, with aging equipment and software: you either start replacing everything you can, as you are able, as quick as you can, or you start suffering outages. It's even worse if things are mismanaged and things are failing all around you.
As for the claims of the article? Meh. I'm actually not that impressed by his claims to the poitn where I think 'this is bad':
In 2008, "floppy drives" weren't all that uncommon. I remember servicing Core machines which had floppy drives, still. We're not talking biege boxes with ISA slots here, necessarily - with a 4 year replacement schedule for desktops, floppy drives don't speak of ineptitude.
The 80-hour-week thing means nothing. It might mean he was understaffed, or that he's a workaholic. To me, it sounds like the meaningless words of a political appointee.
"Over 82% of the White House technology had reached end of life" means nothing. If they were on a 3-year replacement schedule for desktops and they had 10/100 switching, I can easily see where you'd come to that number.
He had one "data center", with no redundancy. A bit of a contradiction, yeah? This is made somewhat less impressive by the fact that this administration, in particular, was a bunch of Nancys when they came in with "oh woes, look at this mess", quite obviously overstating things for dramatic media effect.
"Our email servers went down for 21 hours" isn't a statement of disaster, it's a statement of ineptitude. If they got the mail servers back up, with the data intact, the problem wasn't with the environment but the people involved (or the lack of staffing). His BB starting to have mail incoming suggests a reinstall wasn't required, so safe to say BES was OK, so who knows what the real 'problem' was which caused a day of outage...
Sorry, I've got a very thin skin when it comes to management making any sort of technical claim. They're usually about 50% lie, and of the remaining 50% truth, only about 1/5th of that is factual with the rest being augmented by misunderstanding, disillusions of grandeur, and over-simplification to pull up the full 100%. Realize that a) this is a political appointee talking, b) it's a seemingly non-technical manager (he's up in his datacenter, lookin' for redundancy!), and c) this is the government we're talking about, after all. Anyone who's had any dealings with them on a technical level realizes that 'setbacks' and 'shortcomings' or 'difficult problems' or the like are (probably!) due to ineptitude. Yes, sadly, even amongst the elite (though not necessarily of their own doing - thank you bureaucratic bullshit).
Granted, this may not have been the case when BO came to the WH and took over. They may have had previous IT staffers who stayed through the transition, but I'm guessing they did not (due to political mistrust issues). It could've been a genuine clusterfuck. Sometimes it's nothing and people cry about the sky falling as they pull down the curtain; sometimes, it really is bad. (If you understand weather patterns, you may recognize a summer storm to not be the disaster that chicken little claims...)
~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
This is completely out of the question. Unless the email server also includes file sharing, calendaring, a contact database, all supporting multiple group and individual access rights, it simply can't be used for email.
And the product name must include "Windows" or "Live" in the title, preferably both. And if it can be configured to only support Windows machines, we'll pay double.
Sleep your way to a whiter smile...date a dentist!
Of course it all sucked, it was designed to.
It wasn't originally designed to suck, but when you refuse to spend money on infrastructure improvements,
you end up spending your time putting out fires instead of making improvements.
This applies equally to computer hardware/networks as it does to our highway/bridge, electrical, and water infrastructures.
FFS, there are critical metal pipes in DC's water distribution network that date back 150 years to Lincoln.
[Fuck Beta]
o0t!
Well and as I have learned the hard way lately, if it's going to cost 500k per year to run IT for a couple of hundred employee outfit when it's government money, someone will complain. When I did private sector stuff the biggest issue was downtime, a million dollars, no problem if that means good uptime. I used to go into insurance companies and banks at 4pm, the regular staff left at 5 -5:30, if it wasn't ready to go the next day by 8 or 9am you were in seriously trouble. In government it's all about how much money they have to explain to some jackass who wants to make political hay out of it.
The way I count it from http://www.washingtonpost.com/wp-srv/opinions/graphics/2006stafflistsalary.html the white house has about 400 employees. Figure 350k a year in desktop computers alone, for IT staff, another couple of hundred K in 'mobile' and accessory devices, ancillary office equipment you could easily be looking at 1.5 million or for just the non classified IT stuff. That isn't, in the grand scheme of things, a lot of money, but you have to know that whomever isn't in charge is going to want to curtail that spending, because it's 'wasteful'.
(how you count IT spending can vary wildly. When you're up into that many people you have a lot of dedicated IT staff in various sub groups who may or may not count towards the total and so on). On top of the mess that would be trying to deal with 400 spoiled brats who want everything their way (I'm sorry, executives who want to maximize their productivity), you have to try and plug into everything else in government and have the secured computers/networks as well. That isn't cheap.
Microsoft: where "five nines" means 9.9999%.
For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
23% down sounds about average for MSExchange servers.
Only on slashdot could such ignorance get modded up.
On a bad bad day as a consultant, I have to fix scenarios with Exchange where everything blew up and theyre down for a single day-- MAYBE 2-- out of several years uptime.
Thats with the clients who have no full time IT staff whatsoever and a shoestring budget.
Possibly if you have no idea what youre doing, or dont know anything about exchange, then yea 23% might be an OK guess.
Just a tip, if you ever want people (outside of a small echochamber) to take you seriously, you may want to grow up and stop referring to GW Bush as "Dubyah"-- its about as mature as calling Microsoft M$, or someone you dont like a doo-doo head.
Oh SHUT up.
The reason government can't get anything done, generally, is there's always some jackass out there questioning whether a thing is needed because it happens not to be exactly what they want, or why workers cost anything at all since their life is in the shitter so why should a government employee make money either?
There is a significant interest in this country in starving government, and then mocking it for under-performing. That's a combination of arguments only an imbecile would make.
If only... None of the HP machines we've bought at work in the past couple of years have had them and we buy both the slimline desktop variety and mini-tower PCs. The few Dells I've seen likewise don't have any floppy ports on the motherboard.
As for build-your-own PCs, or ones from companies that assemble generic parts into PCs, very few come with floppy ports on the motherboard. Indeed, the only non-industrial Intel motherboards I know of that have a floppy port are the ASRock Extreme boards - and that's powered by a SuperIO chip on the motherboard, as chipset support for floppies was dropped by Intel years ago.
Note: the reason I mention all this is because I'm looking at getting a Z77 motherboard in the next few months with a floppy connector, so that I can hook up a 5.25" floppy drive I've acquired (purely for the heck of it, before anyone asks - I've a big box of old disks from the early 90s that I wouldn't mind rummaging through, the PC I used for those having been chucked out years back). ASRock are pretty much the only option nowadays and I have no doubts that when Haswell comes out next year the old 37-pin floppy connector will be well and truly extinct.
It's a strong possibility. I've seen similar bizzare things based on what's written in a contract and in private industry not just goverment. My first encouter with contract crazyness was at a large telco in the mid 90's where I had I authorised putting 250MB disks into ~100 laptops with dead drives. However this upset a PHB somewhere in the money spending chain of command because the original maintenance contract stated 120MB disks (which were by this time out of production and as rare as hen's teeth). I tried explaining the supply problem and that the contrator was actually giving us twice the storage for no extra cost. In the end it was simpler to explain the situation to the contrator and (sheepishly) ask them to refomat the 250MB disks down to 120MB than it was to continue butting heads with a dick-swinging autocrat from the finance dept.
Office politics is really no different to real politics, the vast number of people who work for large organisations be they private, public or charitable are for the most part reletively efficient at whatever it is they do, but one or two clowns in the wrong position can turn the whole thing into a circus. In an evolutionary sense large organisations exist because they can do what no man can do alone. However our tribal instincts are still evolving such that we can live with and within groups of more than ~150 that are required to produce what a single mind can imagine, large groups (civilization,cities) simply did not exist until we invented agriculture and yet our current civilization(s) cannot function without them.
For example the multi-national I work for has about 175K people, a death in that "tribe" would happen quite frequently (say one a week), but it's only the handful of people I personally work with that I care (or even know) about. I think the fact that telecommunications have gone from simple morse code to their current star trek capabilities is part of that evolution, we are tool-makers, it's in our nature to invent tools to overcome the problems caused by inventing tools. So in a way that will probably upset bioligists it can be said that our tools and our instincts are co-evolving to accomplish greater feats, but our tools are evolving at a geometrical rate whereas our base instincts evolve at a glacial rate. So I'm betting our tools will evolve to the point where the size of the organisation is (almost) irrelevant to the effectiveness of its internal organisation long before our pumy minds can name, let alone care about, 175k individuals.
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
You miss the fact that he INHERITED that system. That's how politics works. The lame duck guy in charge of the White House (which would be the former President DIRECTLY) let the thing rot...
Part can be attributed to the old staff being done with the position and the new guy will just buy new stuff anyway... Almost fair?
Part of all the downtime was a FEATURE that the pervious administration used to their full advantage... They were über controll freaks... Controlling information of their own "trusted" employees was part of D.C.'s daily routine. The rot was deliberate to stop communications from being added to the archives.
Group scheduling and email are different applications. Combining them in one backend is shortsighted.
True, but Exchange isn't an "email" application--it is a group productivity application that includes email, group calendaring and scheduling, tasks, and collaboration.
I understand there are MS haters who will bash Exchange relentlessly, with any label on it, but let's try to be even a tiny bit accurate. Exchange isn't an "email" service and hasn't been exclusively that for nearly 15 years: Time to come up with some new criticisms, the old ones don't apply.
Who did what now?
This article is partially correct but leaves out the actual technical issues involved.
Someone *from* that Datacenter here at that time. Here's what really happened.
The old administration did not care about the existing IT infrastructure because they were on their way out. They wanted no changes made- just that things be left up. Yes the email system was old and past EOL, but the outages were really the perfect storm of everything that could hit the fan actually hitting the fan at the same time.
The facility was doing work on the power system- the UPS to be specific. Somewhere along the line they messed up, and cut the power. *All* of the power. Datacenter goes dark. They brought the power back up, but then tripped it again before bringing it up for good. This detail is what caused the weekend of hell.
The SAN that the clustered email servers (yes, clustered, they *were* redundant) had the stores on was an EMC Symmetrix. It has a built-in battery backup system so that if the SAN looses power it has enough stored to flush the cache to disk. The power going off started this process. The power going back on triggered the response to stop flushing the cache and start checking and rebuilding. Then the power went off again. This is the part where the specific details get hazy but in effect the SAN did not like this. I don't believe it had enough power to totally flush the cache and/or it did not have the logic built in to handle an outage while in recovery mode. The result was a downed SAN that *would not come back up*. Now all of the data was down and nothing could be done but wait for the vendor to show up and try to fix it.
At the same time we were dealing with *every* server being off and having to come back up. There were hundreds. Luckily most did. Some did not. Some were important, such as in the case of *both* the servers in a clustered system that would not boot- which just so happened to be the system that some of the say "more important" VIPs were on. These were old systems running Exchange 2000 on Windows 2000. Long past due, but kept up by the staff since the EOP would not approve a new email infrastructure.
Eventually the systems would be restored and everything would be back on-line. In the meantime though Brook thought it would be a good idea to spend untold amounts of money to bring in MS Engineers to look things. They cost a lot of money and made a bunch of reports but they didn't fix a damn thing. The staff that was already there found the issues with the servers and fixed them.
There were later headaches, such as when mentioned that the Sonnet was cut (thanks Verizon!) and further SAN maintenance but that was the weekend from hell.
Things to note:
More to the point, businesses haven't wanted an email service for nearly 15 years.
They want the group productivity application. But they don't call it that because the most visible part - the part they really see - is the email.