Power Problems Force Seattle To Throttle City Data Center For Days
Nerval's Lobster writes with an except from sister site SlashDataCenter: "On Aug. 23, Mayor Mike McGinn of Seattle informed residents that the city would partially shut down its municipal data center for five days including the Labor Day weekend. As a result, city residents will be unable to pay bills, apply for business licenses, or take advantage of other online services. In a Webcast press conference, McGinn isolated the issue as a failure in one of the electrical 'buses' that supplies power to the data center. Because that piece of equipment began overheating, the city had to begin taking servers and applications offline to prevent overloading the system. The maintenance will cost the city $2.1 million of its maintenance budget. A second power bus will remain operational, supplying enough electricity to power redundant systems for critical life and fire safety systems, including 911 services and fire dispatch. The city's Web sites should also be up and running in some capacity."
That should help the situation.
Interesting that this is not on the front page of the Seattle Times. In fact, I can't find it at Washington's biggest paper at all.
If you want news from today, you have to come back tomorrow.
If you lived in podunk nowhere then no probably not, if emergency services continue to operate it wouldn't be a big issue. But for such a large municipality to go dark for 5 days...would definitely be impactful locally and possibly regionally/nationally to a smaller degree. Emergency services are very important but the business of government (no matter how i feel about it from time to time) needs to continue and serve it's people...I am sure (at least i hope) that they looked into portable power generation, but it seems that this is a poor solution. just my 2 pennies.
Chief Thinker www.devotedskeptic.com
If power problems are downing the city's datacenter for a holiday weekend, couldn't they just rent a few $100/mo servers and run the city apps on them for the downtime and make the problems transparent to the end user? No one-place site is ever safe for important apps, we call that a Single Point of Failure around here.
I'm LostCluster but I lost my password to that user. Hey Slashdot, how about helping me get it back!
Seattle? The home of Amazon? Why on earth don't they just move their datacenter to Amazon Web Services? They could probably do it for less than the $2.1 million they're spending on this single part!
A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
I live south of Seattle, and work in the city.
Any political gridlock is largely because current Mayor McGinn is a joke. Seattle is a fairly liberal city, but McGinn was largely seen as too extremely left-wing to be electable even there; so he remade himself into a pragmatist - a change that lasted until he was sworn in. McGinn made specific promises pre-election that he wouldn't let his personal ideology affect policies where the citizenry clearly differed from him... then he turned around and spent most of his time fighting ideological political battles, ignoring real problems while devoting 100% of his time tilting at his personal windmills.
#DeleteChrome
What I'm trying to figure out is why 911 and emergency services didn't have a separate offsite backup. I mean, how much more mission critical can you get than that? Everytime I see one of these articles I think to myself: Why are they mentioning this if there wasn't some risk of failure? And the answer is... because quite obviously, there was some risk.
I don't want my cause of death to be "Your call could not be completed as dialed. Please check the number and try your call again later..."
#fuckbeta #iamslashdot #dicemustdie
I've had a similar issue with a private data center. There wasn't a UPD bypass switch because the UPS had an internal bypass switch (installed with the datacenter years before. But the UPS was old, and a new UPS was cheaper than replacing all the batteries (and more powerful with better features). So my coworker planned out the switch, 2 days outage over a weekend. Of course, since I took most of the classes to be an EE, I re-drew the plans and got the project done with half the labor time and two 30-second outages (well, both were about a second, but longer than the time a server could live without power, so it was safer to turn everything off as if it were a longer outage). The problem was caused by a stupid "cost saving" choice on installation.
Sounds like something similar here, where there's an issue with part of the redundancy, but it's not actually capable of running fully redundantly. Otherwise, cut everything over, then fix it. Or just turn it off and fix it (and the power will flow). I've seen it more than once in corporate world, so it's not an example of governmental oops, just IT oops.
Learn to love Alaska
The cloud doesn't need power?
Learn to love Alaska
Why don't they just fail over the critical life and fire safety systems to the backup datacenter, and keep normal services up at the primary datacenter while they do the work? They do have a second site, right? Surely no one would host a system deemed "critical" and "life safety" at a single site?
overheating power buses / wires are a fire risk and that comes from them being under sized for the load.
See the towering inferno to see where that can get you.
Seattle has great parking. You can park your car on I5 for several hours each day without concern that traffic might move forward while you're shopping.
Help stamp out iliturcy.
I had an almost identical situation happen to me this past spring, too. I was the sysadmin at one of the facilities. It happened right after I gave my two weeks, and damn was I busy. :P I ended up having to take all my UPSes off the mains and run them over some two phase at one point to get additional power onto a secondary genset, because the amp load simply was too high (oops, poor planning - someone forgot to figure high load overhead amperage requirements).
Unlike this situation, my situation only had a single power run due to the topographical location of where we were: on top of a hill/small mountain, on the edge of a park. There were 5 fairly sizeable facilities on the hill, some of which have some fairly significant power requirements due to the type of work they perform (lots of sciencey stuff).
Fortunately, all of the buildings had (100 KW+) gensets. Unfortunately, only one of the 5 was NG, and the others were diesel. This gets really costly, really quickly, since it's California, diesel's at something like $4.50/gallon, and the things will burn through a full 500 gallon tank in a day at around 60% utility. So we're talking ~$10k a day just to keep these things fueled (including an extra pulled up due to additional crunch demand).
Plant faculty - probably a good 30-60 people in all - were in the conduit going up the hill for a day trying to figure out where the fault was, and then another three days getting new cable run and relay substation. (God, I hate how slow many union workers work.) Turns out the relay fused up pretty solidly, welding itself nicely into the culvert.
I seem to recall talk back and forth that the total damage was going to be over $500,000, so it really doesn't surprise me that a large city's power infrastructure would cost a multiple of that. If cities are like some of the hospitals I've seen, they've got lecherous IT sales people at their door on an almost-daily basis. They also buy a lot of the crap the sales people are peddling, many of which seem to (still) require being run on their own propriety platform and/or a dedicated piece of hardware. And then, the old systems don't really go away until they die, and there's a cost incurred to recover the lost data - because they're non-profit, they don't really seem to understand cost of maintenance, depreciation, or anything like that. So, I can certainly see the power requirements for some poorly designed cluster for public facing things, a handful or three of interface systems to tie in with the governmenty systems, and so on.
In my mind, it makes sense that they just shut those services down temporarily. "Forced vacation use" for city workers, maybe? They'll save a lot more than 2.5 million that way, if they can do it, I'm sure (funny how government is able to cut costs when there's no alternative :P). I imagine it's too much of a cost and/or risk to try to move essential services (fire/PD/911) to the hot site, and really no reason to do so, especially when they've not yet tested their DR plan.
~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
The datacenter is on the 26th floor of the municipal tower and the overheating bus runs up to that floor. The power company in question is municipally owned, either way it would be the city's problem.
McGinn had quite a few facts wrong in the press conference. The equipment is working fine now and the overheating only caused a minor amount of downtime. The major issue though was the backup generator never kicked in because as it turns out, the electric starter for the diesel generator is connected to the same bus. Labor Day weekend was then choosen to fix this majorly obvious design deficiency.