What To Do During A Power Outage?
Chanc_Gorkon asks: "What do you do before and after a power outage? Does your security people call the IT people? Do your systems come back automagically? I mean yeah a PC server may not have a problem, but what if the server comes up before the backbone is available? What if there are certain things that must be done manually due to security policies or because that's the only way it can be done (our UPS has to be manually brought up after an outage as well as our S/390). Do you review how you did after the disaster or do you just say thank God something serious didn't get broken and go on with things? What do you do?" I think many places actually have backup generators for situations like this. Many other shops live and breathe by the UPS. What are your procedures for handling heavy elecrical storms, and what are the best ways to handle the occasional dissaster?
Actually, I've heard of or worked at places which switch to the generators when there's any chance of a power outage or which have a second pair of power lines running to a different power grid.
And, of course, any good Disaster Preparedness Plan will cover power outages.
No, they're not PCI cards.
No, they're not ISA cards.
No, they're not PC Cards.
Oh, never mind...
I've experienced a couple of (two) power outages at work, and here is what happened:
Background: ~250 users, 1 server room, everything in the server room had about 15-30 minutes of UPS time
#1: Everyone ran to the server room. (Time=0)
#2: Everyone just stood around for a bit. (Time=+1min)
#3: We started shutting down Unix machines. (Time+1.5min)
#4: Power came back online. (Time=+2min)
#5: We discussed it after the fact, and decided that it was appropriate to wait several minutes before shutting down the servers. (Time=+1day)
#6: Power went out again. (Time=+1.5weeks)
#7: See steps 1-5
Basically what happened is that one of the management yelled down something like 'Should we shut the servers down?' and we understood 'Shut the servers down'
On a lighter note, one of my fellow sysadmins thought it necessary to have the printers on UPS that way if the power went out, everyone could finish printing.
Question: What are the options for a small-to-medium server room? (about 15-30 middling sized computers) Right now we have a single largish UPS, which says it's running at 40-60% of capacity. (depending on if I have the staging DB powered up) As we gain 2 more of those systems, and several other power-hungry systems, I know that our current UPS will not provide more than a minute or two of power for everything. We are not even monitoring it at the moment (we just hope that the power doesn't go out).
What does everyone else use?
How expensive is it?
Can we upgrade or do we have to purchase new?
Nathan Brazil?
I can associate with your printer problem...
One of our customers had their carpets cleaned over the weekend and had to unhook their computers and move them around to get them out of the way of the carpet people. They decided not to tell us this until Monday morning, so first thing I rush out there and hook everything back up, everything looks good, and I leave. Later that day, my boss calls me on the phone and asks me if I had seen anything unusual with the server, and of course everything had looked good to me. Apparently the server had been shutting down every time they printed something to the laser printer, and they were wondering if this was some sort of new "feature"... =P I had plugged the printer into the UPS on accident and everytime that sucker warmed up to start printing it would overload the power and shutdown the UPS and everything hooked up to it. Not one of my finer moments. (But hey, I like to keep everyone on their toes.)
BTW, fax machines and kitchen appliances are also not helpful.
We've always approached it from the point of view of protecting data. You /need/ enough UPS to get everything shut down clean (preferrably your UPS has a cable that hooks into the server and matching software that shuts down the server automatically when the power has been off a certain amount of time).
Even having the whole datacenter protected by one solution might not be enough. One thing I saw at one of our huge customers whose whole raised floor (2 mainframes, AS/400s, ~120 intel servers) was UPS'd by some huge horking industrial strenght thingie. When the power switched over to backup sometimes there was enough of a lag to knock out half of the Intel boxes, and at least twice when the UPS switched on the breakers blew on the raised floor. So we wound up putting small UPSes on all the critical boxes, just enough to keep'em going while big bertha came online.
The main thing you need to do is figure out what machines have to stay up, and what machines just need enough juice to shut down clean. All the major UPS suppliers have formulas that show you how to calculate the size of the UPS you need to keep things going. Make sure the UPSes you get have at least basic connectivity so that either when the power is out for a certain time, or (even better, but only from more advanced units) when there is only a certain amount of juice left, a signal is sent and your server can power down clean (and even tell other servers on the same UPS to power down, maybe even page you so you know to come in early and bring everything back up).
Our policy was that all local vital functions were on UPS. This included the switcher, the master control tape machines, the NT boxen that handled the automation, the Unix boxen that served the advertisements, the microwave antenna that sent the signal over to the transmitter, one of the small satellite dishes, and probably a couple of things I'm not thinking of. Plus the building had backup power for emergency lights for about 6 hours (after that the flashlights came out...that only happened once during the 2 years before the big generator was installed).
When I started working there, the station itself had a small generator for vital functions that would come on after ~15 minutes of failed power. A couple years in, we installed a bigger generator that could handle most of the normal functions also. It was, I believe, an 800HP diesel generator with a fuel tank big enough for about 30 hours of operation with no power.
The transmitter (being a UHF station, which requires TONS of power) couldn't run off of a UPS because I don't think they make them big enough, but it had a generator that would automatically start every time there was a brownout and stay on until it was needed, or until the power returned to normal and stayed that way for about half an hour. However, whenever a big storm was coming the generator at the transmitter would be turned on by the engineer on duty and someone would drive out to be prepared in case the automatic switchover to generator power didn't go smoothly. (Those big transmitter tubes are pretty quirky...it doesn't take much to throw them offline. A big voltage spike can cause them to overheat and shutdown. And sometimes they drop offline just for the fun of it.)
The generator at the transmitter was a 1600HP diesel with enough fuel for 4 days (!) of uninterrupted operation with two tubes online (normally, we ran two video tubes and one aural tube at full power...when at half power, one video tube was dropped offline and the aural tube was run at half power). This fuel tank exuberance was, in the words of the Chief Engineer, "In case someone can't get out to the transmitter for a while...nuclear war and plagues of frogs do happen. We'd like to stay on the air."
Oh, yeah, the transmitter was also connected to two power grids via direct lines (not shared with any other buildings and contracted by the two TV stations that were on the tower). We had a direct phone number for a 24 hour on call power company technician, that could be called when power failed. The power company tends to pay attention when a company with multimillion dollar a year power bills calls.
This probably isn't representative of all TV stations, the station I worked at was in Houston...the 5th largest TV market in the US. But I'm sure most have similar plans and equipment, if not the same excess.
Just thought it might be of interest. I don't really know how network folks handle such big jobs. We've got UPS power for our vital machines and quality surge suppressors on the rest, and that answers our inhouse power needs OK. ;-)
A funny, and interesting tidbit for those who stuck it out through this whole post: When a TV station must drop off the air for a few minutes at an unscheduled time (like maybe a tube is failing and needs to be switched out) it will be timed so that no commercials are missed. Now you know who the TV stations are looking out for. ;-)
When the power dives, and the UPS runs out of juice to power the company what do we do?
We retreat to our office, lock the door, and the entire IT staff gathers in the corners and begins to cry!
--
My general plan is to ignore that there are humans anywhere around. They usually do the wrong thing and so they've been taught to keep there hands off the entire thing.
I never use more than 50% of the UPSes rated capacity. It gets better run times out of them and stresses them less. Often the UPSes are the only remaining power at a place and the temperatures in these places are often into the 90s while the UPSes are still doing their thing. Less load means they can effectively withstand more heat before they reach the end of the batteries. Further, things that matter more are more apt to be on their own UPSes, so they last the longest (given similar sized UPSes).
I avoid APC UPSes, since several I've had them fail in odd and unexpected ways (in particular boiling batteries so they reach end-of-life way too early.) APC is hitting a price point, not a quality point and has been unfriendly to the OSS movement. I've had very good luck with Best and SOLA
If you run Unix (or FreeBSD or Linux or...) I suggest UPSD to baby sit your systems. Most of the sites I take care of are effectively 'lights out' (i.e. nobody that is a systems person is there regularly) and UPSD has served me well. Power fails at site X and UPSD emails me. I can then call (being up to a continent away) and manage the problem as need be. Don't forget to UPS the phone system.
I've found that most of the time the outages come in two possible groups. 0-30 minutes and 2.5hrs and more. I make sure that all of the 0-30 kind of events are fully protected. Without generators the 2.5 hr kinds are not practically covered by normal UPSes. If you're prone to lots of power failures, then go look at Home Power Magazine and in particular Trace Engineering systems. The theme is generate your own power 24/7 and be much more reliable as long as one does some minimal maintenance.
Finally, all the UPSes are tested annually for run times and the batteries are replaced at slightly longer intervals than the manuals suggest. Also be sure the that users have flashlights or other emergency lights that work. I test those lights too, since no one else gives a damn about emergency lighting.
You missed the first step - figuring out what your power outages look like.
In my area (Boulder, Colo) I've noticed that nearly every outage has fallen into one of two categories:
<li>a momentary glitch which blinks the lights... and takes down any non-UPS'd computer, and
<li>major outages (due to snow-laden trees?) that frequently last 8 hours and up.
We're also starting to see a third category, rolling brownouts due to gross undercapacity in the local power grid (gee, didn't anyone at PSC notice that Colorado lead the national growth rate for several years running?), but those are still easily predicted because they're tied to unusually hot summer days.
Given this, if the lights go out you count to ten and then start shutting down computers. Only a generator will keep systems up for many hours. Deciding what to do would be far harder if you're in an area where 10-60 minute outages are common. That's long enough that a decent UPS may, or may not, suffice.
For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
1) Learn from other people's mistakes.
2) Learn from your own mistakes.
3) Check your work.
Confidence is great, arrogance is stupid.
(Remember the time you made one simple typo in that autoexec.bat and the machine didn't reboot.)
A couple of years ago we had quite a few power problems as one company or another dug up our street. As tech support company you tend to be reliant on power.. so when the power went, the servers shutdown and bought Monopoly for use to wittle away the hours with. Just make sure you have enough batteries for torches. And remember, UPS' can be used for boiling a kettle... just :)
--
May contain traces of nut.
Biggest backup system I ever saw was back in the early 80s for a mainframe system. These guys were backup CRAZY (something to do with stock trading )
For their UPS system (besides the obligatory 2 independent power feeds from the grid), they had a UPS that could:
1)Run the Computers (2 identical boxes, one a hot backup for the other) for 2 hours while
2)Cranking the generators for those 2 hours (But the generator was supposed to start in less than 5 minutes)
3)AND running the Air conditioning (Remember folks, it's going to get HOT in that server room)
The generators had fuel for a week
The paranoid part? When you realize that they had 8 other IDENTICAL facilities spread around the world, (some in the middle of no where), "Just incase the major cities get nuked" (I wish I was kidding)
-- 73 de KG2V For the Children - RKBA! "You are what you do when it counts" - the Masso
I don't know how much "After Power Failure" up time you need, but one thing a lot of people forget is this - If you have a server farm that NEED the room cooled to keep the servers running reliably, you BETTER have a way to run your Air Conditioning unit. The servers will still be getting HOT running on the UPS. If you need real long term up time, you may need to power your own cooling
-- 73 de KG2V For the Children - RKBA! "You are what you do when it counts" - the Masso
OK, power's out, this is probably the biggest crisis you can face as a sysadmin or developer. Worse even than your net connection being down: you can't even browse the web cache.
First of all, you need a standby: a magazine or even better, a stash of back issues, can wile away hours. And since you "can't do anything without the PC", it needn't be computing-related. In severe circumstances, you may have to resort to opening all those free trade rags that have been building up on your desk for the last year.
It is essential to track the duration of the outage, find the cause and get an estimate of how long it will take to fix ASAP. For example, if there's a bulldozer outside your window with a large bundle of cables in its jaws, flailing around showering sparks everywhere while the driver lies slumped uselessly over the controls slowly turning black, you can probably go home early.
If that's not the case, have a quick wander around the building and make the most of this opportunity. Electronic door locks might be affected, allowing you into places you shouldn't normally be, like the machine room: no one's going to miss the odd server or router if they can't even tell it's up. And the stationary cupboard could be yours for the taking. Loss of lighting is a bonus, providing good cover for: redistribution of office furniture; "failover" to nearest watering hole; sex. But since you work in IT, you're unlikely to get much of the latter.
If you don't like the place you work, or your business urgently requires a substantial insurance claim to bolster profits, remember that careless use of candles and other naked flames often leads to major conflagrations.
One final tip: remember that when the power is out, the phones are often still working. Use this opportunity to establish the precise extent of the outage, by ringing friends who are progressively further away. People will be reassured to know that the outage has not affected distant continents.
Ade_
/
Big Bubbles (no troubles) - what sucks, who sucks and you suck
At work, we have a small UPS on our server. It doesn't need to stay up if the power goes out, but it's great for filtering out power bumps, and shutting the server down safely.
At home, since my PC has an ATX power supply, I set the BIOS to leave the computer off if the power goes out. Very useful considering I have 3 hard drives instead of one huge one.
(There are few things you want to hear less than the PC's fans slowing down for a second, and 1 or 2 hard drives going *SMACK* as the power somes back while they're spinning down.)
(It caused no damage, but it was certainly unpleasant to think about what might have happened.)
#1: Everyone ran to the server room. (Time=0)
:-)
#2: Everyone just stood around for a bit. (Time=+1min)
This shows why every middling sized computing and communication facility needs a well documented procedure in advance of any outage.
In real life, the power outage policy gets written after the first power outage or two
At least once per year you should test your security policy. Make it a big deal with management, decide on a weekend many months in advance and stack your workload to support that. It can really be an excuse to party over a weekend and play around with stuff you normally can't touch for fear of retributuion.
You should cut the power and time how long the UPS boxes keep the servers running. You should see which machines kept running during the switchover. Which machines cleanly shutdown in advance of the batteries dying. Then decide on whether you have to all run to the machine room and shut things down in a panic, or if you have 30 minutes to wait for the power to come back before starting to shut down.
Having it written down can really cover your ass if you've tested it.
the AC
Hemos is like...sci-fi fans;he thinks technology is cool, but he hasn't bothered to understand the science it's based on
Make sure your cooling system is on when your servers are on. It does no good to have your servers run for an hour on UPS is the temperature in the computer room climbs past the allowable operating temperature.
I ran into a situation where a company I was working with spent God knows how much on a huge rack-mount UPS that would keep their dozen or so servers and network hardware up for three hours in case of a power outtage.
Good planning on their part, right?
Unfortunately, the first time they had an outtage (July in Florida), the air conditioning to this 14 foot by 18 foot room died. When I opened the door an hour and a half into the outtage, I as hit with a blast of hot air like you wouldn't believe.
The room temperature hit 117F degrees. The internal temperature on some of the machines was certainly over 125F.
They burned out a Cisco 5500, two CPUs and 11 10k RPM drives. I'm surprised they didn't lose more. I know and hour and a half in there would have caused me to cease funtioning.
The irony is that the UPS was probably cranking out more heat than anything else in the room.
Moral of the story: Computers are big heaters. Make sure they stay cool.
InitZero
At the main office, where we do some accounting processing, and provide some checks & balances against the contractor systems, things are a little less critical. Our computer equipment (including 30-odd desktop PCs) are all hooked into a pair of refrigerator-sized UPS's. When the UPS's give out, there is a backup generator to take over. This setup protects everything during the 3-second blips, and gives us about an hour of UPS power to evaluate what we need to take down.
The first thing we do in a prolonged (more than a minute) outage is get everyone to shut down the PCs and twiddle their thumbs. Then we open the systems room door, and hook up the oscillating fan we use to keep our cubicles from getting too hot, and start pumping out the hot air (the AC is not on the UPS here). Depending on what is going on at the moment (jobs running, deadlines to meet, drawings to be held) we evaluate what we can shut down, and start powering stuff off.
when the power comes back on, the process is reversed, bringing stuff online before we tell the PC users they can log back in.
But if you want real backup, my friend works for a "wholely owned subsidiary" of a big bank, and their main DP center has a pair of giant FLYWHEELS that are under motor at all times during normal power. These things then spin down when there's no outside power and can run the building for days apparently.
Hey geek-boys (and girls) go get yourself an SO. ;-P
A wealthy eccentric who marches to the beat of a different drum. But you may call me "Noodle Noggin."
Quando Omni Flunkus Moritati
Okay this is my question:
:) Is there any way I can get one hooked up between the wall and my box w/o unplugging it?
I'd like to add UPS to my home server because it keeps getting boinked in brownouts and the occasional outage. But, the power has been real good (lately), and I've managed to get something like 250+ days of uptime. I don't wanna swallow my pride and just shut down the machine for five or ten minutes and install a UPS
--
I think there is a world market for maybe five personal web logs.
We didn't even bother with UPS and went straight to generators. Even individual users' workstations are powered by the generators, as I understand it. (And I recall a tanker truck parked near the building around last New Year's Eve...)
Of course, we also have a pretty good connection into the power grid. While I get brownouts and interruptions at home all the time, at work nothing of the sort (that I know of) happens.
Of course, we have to use generators. If OUR power goes out, many many thousands of people will lose part or all of their telephone service.
---
How am I supposed to fit a pithy, relevant quote into 120 characters?
I used to work at a mainframe center, and every year we would "test the BRS", or Big Red Switch. You know, the master power disconnect, right beside the Halon panel?
We would auction off the rights for pushing The Button. It was a really cool set up, geek heaven. A back-lit red push button about 2" in diameter with two safety covers on it, as well as a tamper-evident seal. All high tech lookin' "Wargames" type stuff. The only thing missing was the two key system ("TURN YOUR KEY, SIR! TURN YOUR KEY!")
The Operations staff always made it a big potluck/BBQ type thing and fed the programmers and other weenies... we found that it kept the whining down to a minimum.
Yes, long-term outages tend to be due to weather problems. It seems to require something on the scale of a weather event to damage many points of a power grid.
It's having many places which need repair that causes such long delays in restoring power. And a weather event which is measured in square miles is the type of thing which is required to cause the damage.
Actually, if the computer wants something such as 400Hz power this arrangement can be cheaper than a solid-state converter.