UPS Setup For a Small/Mid-Size Company?
An anonymous reader writes "We're a small company employing ~30 people and we are becoming increasingly reliant on virtual servers. Unfortunately, the hosts they are on don't have redundant power supplies because we simply don't have the capacity. We currently have one UPS per rack, which gives us about two minutes. This may have been enough time when they were put in — they've been there for some time — but it isn't really enough time to shut everything down in the event of a failure. Domain Controllers alone may take up to 15 minutes. So I'm looking at upgrading the UPSs to ones that would preferably give us around 15 minutes of breathing space and send an email or text alert when a failure is detected. Something that could trigger shutdowns automatically would also be nice. Of course cost is a key factor too. so given all of the above, what does Slashdot recommend?"
This is sort of off topic, but when was the last time you tested the UPS units that were installed "some time ago". The batteries can eventually go flat. You better check what you have ASAP. You may need to replace them sooner than you think.
I can't remember the brand, but some of the higher end UPS units I have used came with monitoring software. They software polled the UPS unit, and started the shutdown as soon as a power failure caused the switch over to battery.
HTH.
putting the 'B' in LGBTQ+
Get a generator that can power things from natural gas (or other available resource).
So when the power goes out, it will be seconds before the generator kicks on and the UPS are just there to keep power available until the generator is ready.
Mod me down, my New Earth Global Warmingist friends!
It's not about the amount of people, servers, or a fixed time limit to preserve power. First and foremost, you need to identify what the critical systems are that need to be protected. These may include the VM farms, NAS storage, obviously the underlying network infrastructure, and at the very least, some management terminals that can be used in the event of a failure. Once you identify these systems you need to reference the electrical in/output specifications. If possible, you would want to measure the real requirements in production with inline monitors or passive taps. After you have built your requirement set (mind you, you may decide it's better to have a few small UPS vs one very very large one) you need to explore what needs to be up, and for how long, and build yourself a model. There are dozens of UPS manufacturers, and tens of thousands of combinations for any sized company. Once you have an outline of the systems and their individual power requirements, coupled with your own requirements for their availability/protected power, it will be relatively easy to build yourself a good level of protection on a small budget. Mind you these devices (UPS) can often be found on the second hand market due to company refresh, datacenter closures, etc. Many can be easily re-certified by the manufacturer directly or a variety of 3rd party vendors who specialize in this type of infrastructure.
No matter how much battery capacity you have, it will eventually run out. If your site truly needs availability, you have to get a diesel generator.
I have 2 3000 watt APC SmartUPSes per rack. They have both Serial and USB notification. Since each rack has about 25 servers, I get around 25 to 40 minutes of runtime for each server. So I have a small PC for each rack that monitors those 2 devices. It connects by serial to the upses, and runs CentOS. Then I have APCUPSD installed and configured in multi-ups mode. On each server, I simply install APCUPSD (There is a windows version), and tell it which UPS it is on. I also configure the appropriate shutdown parameters (20 minutes of battery left for non-critical servers, 15 for DC, and 5 for other critical servers. I also hooked each UPS monitor into Nagios and Munin, so I can track each one's power output and time remaining. So far, it's worked great over 2 "brownouts", and 1 total power failure (a test where I simply tripped the appropriate breakers).
The rational behind having dedicated UPS monitors, is that I don't really care if the loose power while running, so I have them set to never shut down from UPS activity. Then, I simply implemented a script that on power restore issues a netboot command to each server under its control (configured with puppet for Linux, AD for Windows). That way, the whole system (all servers) automatically shut down, and turn themselves back on even if they never really lost power... So far, it's worked flawlessly (and with nagios, I get a text message on my cellphone within a minute or two of a UPS switching to battery (we have 2 dedicated internet connections that are on different power sources and different UPSs.
I hope this helps!
If a man isn't willing to take some risk for his opinions, either his opinions are no good or he's no good
I use a Su-Kam inverter at home. It powers a whole room, has a clean sine-wave output (unlike traditional UPSes), and its switchover delay is small enough that the SMPS in computers handle the switchover to battery power properly.
It uses two large lead-acid multi-cell batteries (~car batteries) for storing charge. The last time there was a major power cut, it powered my computer systems for 10 hours (yes you read that right... 10 hours.)
I was laughing at the old APC UPS which did 10 minutes before I had to power down.
This is India btw.. power cuts are common.
Banu
Co-locate your equipment at a carrier-grade data center in the nearest major city to your location and get a leased line to your premises. A decent data center will have proper battery backup and generators and know how to handle it. They'll also have the time and manpower to do proper tests, etc.
Learning HOW to think is more important than learning WHAT to think.
You're absolutely right. One place I worked had about 20 employees, 150 servers, but had an income of millions per year. The income averaged out to about $5,700/hr. 12 hours of outages per year could cost almost $70,000 in lost revenue. Is it worth $10k in extra equipment to mitigate that? Obviously.
Smaller companies have to evaluate their acceptable losses. Sometimes it's not worth $100 to make sure you stay up through power outages.
"5 9's" of reliability still leaves 1.14 hours per year of outages. Of course, that doesn't assume that it's all power related outages. Redundancy across physically diverse locations can and will help there.
Serious? Seriousness is well above my pay grade.
It's no act. I am happy to put up photographs of my setup if you want. It's been working well for me the last year, so I don't have issues recommending it. Apart from being a customer, I have no connections to any inverter/battery company. You OTOH are an anonymous coward. Here is my website. Go find more about the shill there.
Banu
I think he was talking about in-house servers, but I could be mistaken. it's good to be in a *GOOD* datacenter that has the proper redundancy. Most of the good ones have multiple generators and tens of thousands of gallons of fuel stored. They can stay running indefinitely, assuming they can get fuel supplied before they ran out.
I did work in one good one. They had a DC powerplant to supply at least 24 hours of power. They also had two diesel turbine generators, and something like 10,000 gallons of fuel, which would provide power for 7 days. In talking to the senior techs who had been there an awful long time, they said the generators had kicked on quite a few times. Only once in about 20 years had they needed to refuel. It got touchy. The power was out for about 14 days. It took 6 days to get a refueling truck in, because it was a nasty blizzard, and all the roads had been closed for days. They were starting to notify the customers of a potential power outage, when two fuel trucks finally arrived. One refilled their tank, and the second was left parked there, in case power wasn't restored in time.
That was a huge facility, and they had the power to say "bring us trucks now", and not be put off for larger customers.
I wasn't impressed by the advertised specs of the site. They were good, but it's easy to lie about the specs. I *was* impressed by the site, when I walked through, and was allowed (with an escort) to see their primary data room (many OC192's), the DC power room, and generators. I wasn't getting the sales tour. I was getting the tech tour, because the senior guys wanted to tell me all about their stuff, and we had a chance to talk about all of it.
I've been to many datacenters over the years, and many have failed to be as good as their advertising made them sound. N+1 generators can be a few 11Kw generators out by their dumpsters, or massive industrial generators. Maybe they test them once a year, or once a week. Maybe they work, maybe they don't. It's less than impressive to see the generators sitting outside, covered in rust, and looking like they were purchased 2nd hand and hadn't been maintained since 1950.
At one site (again, an impressive site), they had an absolutely huge DC room, and I was there a couple times when the received phone calls to turn on their generators because the power company needed the extra capacity. A couple 1Mw generators may make the difference between constant power, and widespread brownouts.
The impressive datacenters were way beyond anything I could possibly talk my management into doing in-house.
Serious? Seriousness is well above my pay grade.