Why Power Failures Can Always Lead To Data Loss
bigsmoke writes "So, all your servers run on RAID. You back up religiously. You're even sure that your backups are recoverable. But do you also need a UPS? According to Halfgaar (on Slashdot before to promote better Linux backup practices), yes, usually you do. He argues that despite technological advancements such as file system journaling, power failures can still cause data loss in most setups."
UPS is more than just saving your data.
Ok, people who don't just read the executive summary knew this all along, but perhaps it's necessary that someone spells it out for the rest: Journaling and RAID do not prevent data loss in case of a power outage (and many more circumstances). If you know why, just skip the article. If you're wondering how you can lose data if you write everything to two disks and your filesystem guarantees its own consistency, then perhaps this is the wake up call that you need.
APC is the only UPS maker on the market that has at least spent some small effort so that their UPSs can be properly integrated with a Linux machine. I made the mistake of purchasing an Ultra UPS as it was cheaper than the APC.
"Thanks for all the money you paid to us. We've used it to buy off ISO among other things" -Microsoft
Computer power supplies should be built with enough spare capacitance to run things long enough for the computer to save critical data
Here's a question for you: Calculate the size of the capacitor needed that can hold enough power to run a 200W load for 5 minutes and maintain a voltage level within a specific usable range.
Hint: its BIG. batteries are more space efficient, but the chemicals and outgassing make them inappropriate for location INSIDE the computer box.
Yes, quite. It can't handle the substantial inrush current needed by the laser printer.
The "click" you hear in the UPS when the laser printer warms up is the UPS noting the drops on the power mains, which gives you some idea just how much current that printer needs.
I have a Samsung ML2150, and have noticed the same thing. Lights flicker, etc. whenever I submit a print job and the printer transitions from standby to active. The various UPSes in my office sense that, and respond with clicks and beeps.
Take the laser printer off the UPS. If you really need printer capability during a power failure, switch to an ink jet.
b.g.
Other than the lack of communication at present between the PSU and the rest of the system (on a hardware and software level), what you're describing really seems to be the computer equivalent of throwing your hands in front of your nuts as you spot the incoming baseball. It helps the immediate problem of data (or testicle) loss, but it's really just a small amount of damage control.
This is why a proper UPS that can trigger a full system shutdown once you hit a certain power remaining threshold is far preferable. Granted I'd rather have a controlled crash than the risky nonsense that would come from the power cord being yanked, but (right now) computers can only go so far to help themselves in a couple-second window.
How are sites slashdotted when nobody reads TFAs?
Always? Maybe if you are using Linux. Not if you are using an OS that runs ZFS filesystems.
--AC
This morning we had a planned shutdown of 100 servers for eletricity works, all were on the same 40 kVA UPS. All went fine, we shutdown all servers to be safe, and kept some stuff online for montoring and the like, then main power was shut off. The UPS gladly took the load, with an estimated battery life of 75 minutes, more than what was needed for the electrical work. Once this was done, the electrician put the main power back on, and... the UPS shutdown !
Since all servers were stopped already we didn't lose anything, but we had to put the UPS in bypass mode for a while, then back on, and now we hope for the best waiting for the UPS to be repaired, crossing most of our fingers because of the holidays...
In summary : testing that the UPS can handle the power coming back is as important as testing for it to be able to handle the power shutting down.
Votez ecolo : Chiez dans l'urne !
Last night we had a power outage. I shut down the desktop and was able to continue working for almost 2 hours on the laptop because with the Desktop down the UPS was only carrying the DSL router and the WiFi box.
good uptime for a laptop. got a second battery? (I know I do)
Inverters for the Servers. (DC PSUs are available for some of the servers we use but at so high a premium that the inverters are cheaper.)
that's because it just has to invert it before it can step it up or down. If you supply DC you are actually introducing another necessary step. It gets hard to cram 2x the electronics into the PS. Inverters are definitely the way to go.
We can handle a dozen Power cuts in a day with no service interruption or data loss ("Tested" 2 weeks ago) and we can stay up without external power for more than a week. After that we have to start trucking in additional diesel.
Yep. That's right. With sufficient fuel we can be online indefinably. Which we will have to do if we get hit by a major hurricane.
Might want to rethink how easy it is to get a truck in during a hurricane. ;) Unless it's more of a boat, think Katrina.
Imagine a server, where UPS #2 is down for repairs, UPS #1 fails during a power cut, When everything comes back up we find 2 failed hard drives in the RAID 5 on the email server. despite previous testing and confirmation that the backups work the most recent tapes failed to read.
um, ouch?
Best advise? Memorize all your important data. That way if you loose your mind, you are not responsible for the lost Data (or anything else).
Was going to say, all of the above is moot if an EF5 rolls through town. Better add "offsite backup" to your list if it's not already there. With the EF5 that ran through here last month, some people got their backups turned into "offsite" backups. (maintenance guy was here last week, said they are still looking for their dump truck )
I work for the Department of Redundancy Department.
Any professional server or data center setup that does not include a UPS for a graceful shutdown... is almost by definition NOT professional.
The typical small UPS system has some amount of surge protection built-in. But it's typically only good for at most a couple thousand joules. But then, if you get a spike that is big enough to blow a varister, you also get to buy a new ups.
A better solution is to put a "whole house" surge protector on the circuit-breaker panel. It protects everything, with a much higher number of joules. Five or six pounds of varisters can absorb a lot more shock than one ounce of varisters. They cost about $100, and can be found at most big hardware stores or electrical supply houses. That doesn't eliminate the need for a ups. It does protect the ups, along with the other equipment, from most voltage spikes.
Last year, lightning hit the power pole 20 feet from my house. We know where it hit because the pole caught fire. My next-door neighbors on both sides lost every single piece of electrical equipment -- not just computers, TV's, and stereos, but also fridge, microwave, water heater, and range. All of it was damaged beyond repair. We barely noticed the hit, except for the bright flash of light, and had no damage at all.
Real text editors will recover gracefully from such situations. :-)
(I'm think along the lines of @UEDIT on OS2200 which saves its entire virtual memory state to disk periodically and can recover it with ease at the next startup, or the old EDT editor on VMS which saved the commands one entered and could replay them when a recovery was specified).
I'm surprised more text editors don't have a similar feature. I think vim does, tho...?
Mainframe/UNIX Bit Twiddler and long time Windows/Linux Hobbyist.
The Theorem Theorem: If If, Then Then.
Are you sure your disks are in write-through mode? Have you checked? Brad Fitzpatrick (of LiveJournal, memcache, OpenID, etc. fame) discovered that many disks lie about being in write-through mode, and wrote a utility to check it.
Why 5 minutes? It usually takes less than a second to run a sync on the disks depending on how active they are. A couple seconds of runtime should be enough to do an "emergency shutdown" and avoid data corruption.
####@johncash:~$ time sync
real 0m0.004s
user 0m0.004s
sys 0m0.000s
That will sync the disks, but it won't stop the database from accepting incoming data. It won't stop cron jobs which might be just about to trigger. It won't deal with tasks that are in the middle of a big operation which involves a lot of writing to disk.
The hard drives and DMA controller however, will run a bit longer; so if data is being written to disk, the DMA controller will keep reading data from memory, but it has no idea that this data is corrupted.
Pretty sure that's wrong. It used to be (20 years ago) that hard drives losing power in this way had a chance of the heads crashing against the platters (the fabled "hard drive crash"). To solve this, modern drives are very sensitive to the power input. As soon as power fails the drives extract power from the spinning platters to move the heads over to the parked position. Regardless of what the DMA controller thinks it should be doing, the hard drive is busy parking the heads.
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
Actually when power drops the "power good" line from the power supply goes low, which causes a system reset and locks everything up.
This is also how the computer knows how long to keep the reset line engaged on startup, it stays asserted until the power supply says the power is good, and everything has proper voltage.
Just because it CAN be done, doesn't mean it should!
I don't think this has been true since... maybe 8-10 years now? Definitely since MR drives came on the market (ages ago).
Modern drives have:
It does NOT go writing crap all over whatever's between your data and the parked position, unless the drive is a defective design. The emergency park is a fairly brutal affair, and you'll typically see the datasheet list a maximum number that's notably lower than the max power cycles.
It's also essential these days because:
Normally that holds true. I've seen some drives (1.0" and 1.8" miniature ones) which suffered from head-on-platter but that was due to misdesign in the power supply feeding it (e.g voltage rails going slightly negative, draining the cap early).
But anyway, the worst you'll get with the power going out is a partially written sector, which will then be marked bad, probably permanently. Or maybe a bunch of sectors. Or maybe bad in a different order to what the OS sent due due to caching.
If you had a drive and/or RAID fail due to power outage, you should get a refund. You might lose a tiny amount of data, not the whole lot.
What I have is a Tripp-Lite SB-2000, which is an oldie but a goodie. Only link I can find now is here. It runs on 24v external power, so I just set two car batteries on top of it. Picked it up years ago for a song on ebay.
That unit though really is meant to have massive batteries on it. (looks like 24v golf cart batteries maybe, it has large binding posts on it for the external battery, there is no internal battery)
You can't just hook a car battery up to some old APC you have sitting around. It may run on it, but there are two factors to keep in mind:
1) UPS's are designed with cooling in mind. Sure you can put a monster battery on it so it has a runtime (at max output) of an hour instead of 10 minutes, but is it going to catch on fire or just plain overheat and shut down at 30 minutes in?
2) if it runs off the batteries, it has to charge them back up. The charge circuit faces the same limitations as the inverter in terms of capacity and cooling. Your UPS may run fine for 45 minutes, but then when power comes back, the charge circuit may fry after an hour of continuous load trying to bring the battery back up to full.
and of course 3) installing a larger battery doesn't affect your maximum output (watts), it only affects your maximum uptime (watt-hours)
I suppose also 4) is worth considering... not all hardware LIKES to run off a UPS. The power tends to be kinda nasty. I don't even want to know what my old tripp-lite puts out for power but I'm pretty sure it's very dirty. Fortunately all the hardware that's on it doesn't seem to mind. (yet) The longer you run something on a UPS, the more likely you are to damage it if it's not tolerant. I once tried placing a harmonic filter on my tripp-lite. Worked like a charm, put out a nearly perfect and clean sine wave. For about 6 minutes. Then it smoked. The power was simply too nasty for it to filter. Newer UPSs of course do better here. They usually advertise a "modified sine wave", same as you see stamped on inverters.
Final note: no, you cannot stack UPS's. The line filters on modern UPS's don't like the power coming from a UPS and will switch on when the upstream UPS turns on.
I work for the Department of Redundancy Department.