What To Do During A Power Outage?
Chanc_Gorkon asks: "What do you do before and after a power outage? Does your security people call the IT people? Do your systems come back automagically? I mean yeah a PC server may not have a problem, but what if the server comes up before the backbone is available? What if there are certain things that must be done manually due to security policies or because that's the only way it can be done (our UPS has to be manually brought up after an outage as well as our S/390). Do you review how you did after the disaster or do you just say thank God something serious didn't get broken and go on with things? What do you do?" I think many places actually have backup generators for situations like this. Many other shops live and breathe by the UPS. What are your procedures for handling heavy elecrical storms, and what are the best ways to handle the occasional dissaster?
I've experienced a couple of (two) power outages at work, and here is what happened:
Background: ~250 users, 1 server room, everything in the server room had about 15-30 minutes of UPS time
#1: Everyone ran to the server room. (Time=0)
#2: Everyone just stood around for a bit. (Time=+1min)
#3: We started shutting down Unix machines. (Time+1.5min)
#4: Power came back online. (Time=+2min)
#5: We discussed it after the fact, and decided that it was appropriate to wait several minutes before shutting down the servers. (Time=+1day)
#6: Power went out again. (Time=+1.5weeks)
#7: See steps 1-5
Basically what happened is that one of the management yelled down something like 'Should we shut the servers down?' and we understood 'Shut the servers down'
On a lighter note, one of my fellow sysadmins thought it necessary to have the printers on UPS that way if the power went out, everyone could finish printing.
Question: What are the options for a small-to-medium server room? (about 15-30 middling sized computers) Right now we have a single largish UPS, which says it's running at 40-60% of capacity. (depending on if I have the staging DB powered up) As we gain 2 more of those systems, and several other power-hungry systems, I know that our current UPS will not provide more than a minute or two of power for everything. We are not even monitoring it at the moment (we just hope that the power doesn't go out).
What does everyone else use?
How expensive is it?
Can we upgrade or do we have to purchase new?
Nathan Brazil?
Our policy was that all local vital functions were on UPS. This included the switcher, the master control tape machines, the NT boxen that handled the automation, the Unix boxen that served the advertisements, the microwave antenna that sent the signal over to the transmitter, one of the small satellite dishes, and probably a couple of things I'm not thinking of. Plus the building had backup power for emergency lights for about 6 hours (after that the flashlights came out...that only happened once during the 2 years before the big generator was installed).
When I started working there, the station itself had a small generator for vital functions that would come on after ~15 minutes of failed power. A couple years in, we installed a bigger generator that could handle most of the normal functions also. It was, I believe, an 800HP diesel generator with a fuel tank big enough for about 30 hours of operation with no power.
The transmitter (being a UHF station, which requires TONS of power) couldn't run off of a UPS because I don't think they make them big enough, but it had a generator that would automatically start every time there was a brownout and stay on until it was needed, or until the power returned to normal and stayed that way for about half an hour. However, whenever a big storm was coming the generator at the transmitter would be turned on by the engineer on duty and someone would drive out to be prepared in case the automatic switchover to generator power didn't go smoothly. (Those big transmitter tubes are pretty quirky...it doesn't take much to throw them offline. A big voltage spike can cause them to overheat and shutdown. And sometimes they drop offline just for the fun of it.)
The generator at the transmitter was a 1600HP diesel with enough fuel for 4 days (!) of uninterrupted operation with two tubes online (normally, we ran two video tubes and one aural tube at full power...when at half power, one video tube was dropped offline and the aural tube was run at half power). This fuel tank exuberance was, in the words of the Chief Engineer, "In case someone can't get out to the transmitter for a while...nuclear war and plagues of frogs do happen. We'd like to stay on the air."
Oh, yeah, the transmitter was also connected to two power grids via direct lines (not shared with any other buildings and contracted by the two TV stations that were on the tower). We had a direct phone number for a 24 hour on call power company technician, that could be called when power failed. The power company tends to pay attention when a company with multimillion dollar a year power bills calls.
This probably isn't representative of all TV stations, the station I worked at was in Houston...the 5th largest TV market in the US. But I'm sure most have similar plans and equipment, if not the same excess.
Just thought it might be of interest. I don't really know how network folks handle such big jobs. We've got UPS power for our vital machines and quality surge suppressors on the rest, and that answers our inhouse power needs OK. ;-)
A funny, and interesting tidbit for those who stuck it out through this whole post: When a TV station must drop off the air for a few minutes at an unscheduled time (like maybe a tube is failing and needs to be switched out) it will be timed so that no commercials are missed. Now you know who the TV stations are looking out for. ;-)
My general plan is to ignore that there are humans anywhere around. They usually do the wrong thing and so they've been taught to keep there hands off the entire thing.
I never use more than 50% of the UPSes rated capacity. It gets better run times out of them and stresses them less. Often the UPSes are the only remaining power at a place and the temperatures in these places are often into the 90s while the UPSes are still doing their thing. Less load means they can effectively withstand more heat before they reach the end of the batteries. Further, things that matter more are more apt to be on their own UPSes, so they last the longest (given similar sized UPSes).
I avoid APC UPSes, since several I've had them fail in odd and unexpected ways (in particular boiling batteries so they reach end-of-life way too early.) APC is hitting a price point, not a quality point and has been unfriendly to the OSS movement. I've had very good luck with Best and SOLA
If you run Unix (or FreeBSD or Linux or...) I suggest UPSD to baby sit your systems. Most of the sites I take care of are effectively 'lights out' (i.e. nobody that is a systems person is there regularly) and UPSD has served me well. Power fails at site X and UPSD emails me. I can then call (being up to a continent away) and manage the problem as need be. Don't forget to UPS the phone system.
I've found that most of the time the outages come in two possible groups. 0-30 minutes and 2.5hrs and more. I make sure that all of the 0-30 kind of events are fully protected. Without generators the 2.5 hr kinds are not practically covered by normal UPSes. If you're prone to lots of power failures, then go look at Home Power Magazine and in particular Trace Engineering systems. The theme is generate your own power 24/7 and be much more reliable as long as one does some minimal maintenance.
Finally, all the UPSes are tested annually for run times and the batteries are replaced at slightly longer intervals than the manuals suggest. Also be sure the that users have flashlights or other emergency lights that work. I test those lights too, since no one else gives a damn about emergency lighting.
You missed the first step - figuring out what your power outages look like.
In my area (Boulder, Colo) I've noticed that nearly every outage has fallen into one of two categories:
<li>a momentary glitch which blinks the lights... and takes down any non-UPS'd computer, and
<li>major outages (due to snow-laden trees?) that frequently last 8 hours and up.
We're also starting to see a third category, rolling brownouts due to gross undercapacity in the local power grid (gee, didn't anyone at PSC notice that Colorado lead the national growth rate for several years running?), but those are still easily predicted because they're tied to unusually hot summer days.
Given this, if the lights go out you count to ten and then start shutting down computers. Only a generator will keep systems up for many hours. Deciding what to do would be far harder if you're in an area where 10-60 minute outages are common. That's long enough that a decent UPS may, or may not, suffice.
For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken