UPS Setup For a Small/Mid-Size Company?
An anonymous reader writes "We're a small company employing ~30 people and we are becoming increasingly reliant on virtual servers. Unfortunately, the hosts they are on don't have redundant power supplies because we simply don't have the capacity. We currently have one UPS per rack, which gives us about two minutes. This may have been enough time when they were put in — they've been there for some time — but it isn't really enough time to shut everything down in the event of a failure. Domain Controllers alone may take up to 15 minutes. So I'm looking at upgrading the UPSs to ones that would preferably give us around 15 minutes of breathing space and send an email or text alert when a failure is detected. Something that could trigger shutdowns automatically would also be nice. Of course cost is a key factor too. so given all of the above, what does Slashdot recommend?"
should serve 2 purpose - give you temp power and keeps your IT guys fit
This is sort of off topic, but when was the last time you tested the UPS units that were installed "some time ago". The batteries can eventually go flat. You better check what you have ASAP. You may need to replace them sooner than you think.
I can't remember the brand, but some of the higher end UPS units I have used came with monitoring software. They software polled the UPS unit, and started the shutdown as soon as a power failure caused the switch over to battery.
HTH.
putting the 'B' in LGBTQ+
Get a generator that can power things from natural gas (or other available resource).
So when the power goes out, it will be seconds before the generator kicks on and the UPS are just there to keep power available until the generator is ready.
Mod me down, my New Earth Global Warmingist friends!
Not knowing the load required on the UPS makes it very hard to tell what kind of UPS you need. You need to know how many watts are used in the rack to be able to plan some proper UPS capacity.
apcupsd can be networked between machines and can trigger auto shutdowns of all of them, including VM guests.
Some virtual machine system can also suspend all VMs on shutdown which could be a better alternative then shutting them down. Again, without knowing which VM system you use it's hard to get into details.
It's not about the amount of people, servers, or a fixed time limit to preserve power. First and foremost, you need to identify what the critical systems are that need to be protected. These may include the VM farms, NAS storage, obviously the underlying network infrastructure, and at the very least, some management terminals that can be used in the event of a failure. Once you identify these systems you need to reference the electrical in/output specifications. If possible, you would want to measure the real requirements in production with inline monitors or passive taps. After you have built your requirement set (mind you, you may decide it's better to have a few small UPS vs one very very large one) you need to explore what needs to be up, and for how long, and build yourself a model. There are dozens of UPS manufacturers, and tens of thousands of combinations for any sized company. Once you have an outline of the systems and their individual power requirements, coupled with your own requirements for their availability/protected power, it will be relatively easy to build yourself a good level of protection on a small budget. Mind you these devices (UPS) can often be found on the second hand market due to company refresh, datacenter closures, etc. Many can be easily re-certified by the manufacturer directly or a variety of 3rd party vendors who specialize in this type of infrastructure.
No matter how much battery capacity you have, it will eventually run out. If your site truly needs availability, you have to get a diesel generator.
We have had good experiences with the HP R5500 XR. You may require a smaller and cheaper model like the R3000 or R1500 depending on your servers.
These UPS are fully supported by NUT.
The months are just too short. I can count the number of days on one hand.
I have 2 3000 watt APC SmartUPSes per rack. They have both Serial and USB notification. Since each rack has about 25 servers, I get around 25 to 40 minutes of runtime for each server. So I have a small PC for each rack that monitors those 2 devices. It connects by serial to the upses, and runs CentOS. Then I have APCUPSD installed and configured in multi-ups mode. On each server, I simply install APCUPSD (There is a windows version), and tell it which UPS it is on. I also configure the appropriate shutdown parameters (20 minutes of battery left for non-critical servers, 15 for DC, and 5 for other critical servers. I also hooked each UPS monitor into Nagios and Munin, so I can track each one's power output and time remaining. So far, it's worked great over 2 "brownouts", and 1 total power failure (a test where I simply tripped the appropriate breakers).
The rational behind having dedicated UPS monitors, is that I don't really care if the loose power while running, so I have them set to never shut down from UPS activity. Then, I simply implemented a script that on power restore issues a netboot command to each server under its control (configured with puppet for Linux, AD for Windows). That way, the whole system (all servers) automatically shut down, and turn themselves back on even if they never really lost power... So far, it's worked flawlessly (and with nagios, I get a text message on my cellphone within a minute or two of a UPS switching to battery (we have 2 dedicated internet connections that are on different power sources and different UPSs.
I hope this helps!
If a man isn't willing to take some risk for his opinions, either his opinions are no good or he's no good
Its time to break out the calculators and do some math. There are two main factors at work here, UPS load capacity and battery run time. I run a series of research clusters at a university, so only the core systems (landing pads, schedulers, auth, disk arrays) are on UPS and all the compute nodes just die at a power hit.
Retrofitting a datacenter for whole center UPS is a very daunting and expensive task, so odds are good you'll be replacing the current rack mounts with beefier units, either pedestal sized units next to their racks or rack mounted units.
When buying UPS gear for work, I aim to hit either 67% capacity with the planned load, or the smallest VA rating that takes 208V single phase, as long as its at least 1/3 under utilized for future expansion. That covers the VA rating. As for battery run time, most of the larger units accept external battery packs to increase the run time. I've never used them, since a 5KVA unit with my load gives me 20 minutes of run time, and if the power isn't back on by then, odds are good its not coming back any time soon.
Another option for extending UPS run time is to prioritize services/VMs. With the appropriate monitoring software on each host, you can configure each host to shutdown when the UPS estimates X minutes of battery time remaining or there have been Y minutes on battery, or both. Less load, more run time for the really important stuff. Almost every UPS I've used (APC, Tripp-lite, Powerware) comes with off the shelf software or there are opensource solutions (apcupsd, nut) for monitoring the UPS over serial, USB, or SNMP (Options vary with mfg and model). My shutdown schedule is: after 5 minutes on battery, power down the compute cluster landing pads. With 10 minutes remaining, power down the file servers with the archival data on them. With 6 minutes remaining, power down the primary file servers. With 2 minutes remaining, power down the auth box/network monitor/iLom control host (This is the only one that can't get powered on/monitored remotely).
I use a Su-Kam inverter at home. It powers a whole room, has a clean sine-wave output (unlike traditional UPSes), and its switchover delay is small enough that the SMPS in computers handle the switchover to battery power properly.
It uses two large lead-acid multi-cell batteries (~car batteries) for storing charge. The last time there was a major power cut, it powered my computer systems for 10 hours (yes you read that right... 10 hours.)
I was laughing at the old APC UPS which did 10 minutes before I had to power down.
This is India btw.. power cuts are common.
Banu
There's an awful lot to be said for redundancy. I think he's talking in-house applications, but I'm not positive.
One company I worked for, we maintained equipment in multiple datacenters, that were fully redundant. Normally, we served from all of them (no warm-standby sites). Over the years, we'd lose datacenters for various reasons. Sometimes it was power. Sometimes it was connectivity. Sometimes it was simple things, like our own hardware died. We've all seen where portions of the Internet can't reach other portions. Such redundancy will save you. It's better to have the reputation of "they just always work", rather than "they're down every time there's a problem in [insert area]".
Most users won't say "thank you", but they'll be more than happy to complain when you're down. If you have such a presence, you're probably making money on it, so an hour of downtime can easily cost more than the cost of a couple redundant datacenters. With say 3 datacenters, I always made sure we had capacity at each datacenter, in case we had two sites fail simultaneously. While it seems like an almost unheard of event, we did have it happen a couple times in a decade. The providers will apologize profusely, but that doesn't make up for the money lost during the outage.
Serious? Seriousness is well above my pay grade.
Co-locate your equipment at a carrier-grade data center in the nearest major city to your location and get a leased line to your premises. A decent data center will have proper battery backup and generators and know how to handle it. They'll also have the time and manpower to do proper tests, etc.
Learning HOW to think is more important than learning WHAT to think.
You're absolutely right. One place I worked had about 20 employees, 150 servers, but had an income of millions per year. The income averaged out to about $5,700/hr. 12 hours of outages per year could cost almost $70,000 in lost revenue. Is it worth $10k in extra equipment to mitigate that? Obviously.
Smaller companies have to evaluate their acceptable losses. Sometimes it's not worth $100 to make sure you stay up through power outages.
"5 9's" of reliability still leaves 1.14 hours per year of outages. Of course, that doesn't assume that it's all power related outages. Redundancy across physically diverse locations can and will help there.
Serious? Seriousness is well above my pay grade.
It's no act. I am happy to put up photographs of my setup if you want. It's been working well for me the last year, so I don't have issues recommending it. Apart from being a customer, I have no connections to any inverter/battery company. You OTOH are an anonymous coward. Here is my website. Go find more about the shill there.
Banu
I think he was talking about in-house servers, but I could be mistaken. it's good to be in a *GOOD* datacenter that has the proper redundancy. Most of the good ones have multiple generators and tens of thousands of gallons of fuel stored. They can stay running indefinitely, assuming they can get fuel supplied before they ran out.
I did work in one good one. They had a DC powerplant to supply at least 24 hours of power. They also had two diesel turbine generators, and something like 10,000 gallons of fuel, which would provide power for 7 days. In talking to the senior techs who had been there an awful long time, they said the generators had kicked on quite a few times. Only once in about 20 years had they needed to refuel. It got touchy. The power was out for about 14 days. It took 6 days to get a refueling truck in, because it was a nasty blizzard, and all the roads had been closed for days. They were starting to notify the customers of a potential power outage, when two fuel trucks finally arrived. One refilled their tank, and the second was left parked there, in case power wasn't restored in time.
That was a huge facility, and they had the power to say "bring us trucks now", and not be put off for larger customers.
I wasn't impressed by the advertised specs of the site. They were good, but it's easy to lie about the specs. I *was* impressed by the site, when I walked through, and was allowed (with an escort) to see their primary data room (many OC192's), the DC power room, and generators. I wasn't getting the sales tour. I was getting the tech tour, because the senior guys wanted to tell me all about their stuff, and we had a chance to talk about all of it.
I've been to many datacenters over the years, and many have failed to be as good as their advertising made them sound. N+1 generators can be a few 11Kw generators out by their dumpsters, or massive industrial generators. Maybe they test them once a year, or once a week. Maybe they work, maybe they don't. It's less than impressive to see the generators sitting outside, covered in rust, and looking like they were purchased 2nd hand and hadn't been maintained since 1950.
At one site (again, an impressive site), they had an absolutely huge DC room, and I was there a couple times when the received phone calls to turn on their generators because the power company needed the extra capacity. A couple 1Mw generators may make the difference between constant power, and widespread brownouts.
The impressive datacenters were way beyond anything I could possibly talk my management into doing in-house.
Serious? Seriousness is well above my pay grade.
Here they are: inverter1.jpg, inverter2.jpg
I'm sorry it's pretty dusty, but this has been exposed to the elements for the past year. It has to be kept outside because of the lead-acid batteries, which need to be ventilated. The stand is an old TV stand reused to host this. The inverter is on top. The batteries are at the bottom.
The little yellow alien looking caps that you see filters that the acidic fumes from the batteries. Each cap tops a cell. The little stick on top indicates the liquid level inside the cell. After about a month, the levels go down in some of the caps and I call the local shop to come and top-up distilled water. Basically the water evaporates whereas the acid is still there, so they fill in distilled water. I could do this myself, but the local shop does it for about $1 so I just let the experts handle it :).
Banu