Why Power Failures Can Always Lead To Data Loss
bigsmoke writes "So, all your servers run on RAID. You back up religiously. You're even sure that your backups are recoverable. But do you also need a UPS? According to Halfgaar (on Slashdot before to promote better Linux backup practices), yes, usually you do. He argues that despite technological advancements such as file system journaling, power failures can still cause data loss in most setups."
Power losses can cause data loss? Gee, you mean that my system that relies on electricity for everything it does can be adversely effected by power outages even if I take precautions? That's some good admin work there, Lou -- if only there was some sort of law that covered the tendency of things that can go wrong to go wrong...
Next week: Fires can make things warm, floods can make things wet.
Every year during my review, I just pray the words "slashdot.org" aren't mentioned.
I remember a discussion on the PostgreSQL hacker's list about recoverability and transaction logs.
You can't make a system that will not lose data, you can only make a system that knows the last save point of 100% integrity.
There are too many variables and too much randomness on a cold hard power failure. You absolutely need a UPS that gives you time to shut down cleanly.
is a weak spot in the design of most computers.
Computer power supplies should be built with enough spare capacitance to run things long enough for the computer to save critical data, and operating systems and critical apps should be able to handle an emergency shutdown and save critical data in very short order.
This is old hat in embedded systems.
"Prefiero morir de pie que vivir siempre arrodillado!"
The funny part is someone had to have thought they were safe without a UPS for this to become news.
Wanna fight ? Bend over, stick your head up your ass, and fight for air.
In my company, everything is behind UPSs. Our SAN is even behind 2 separate UPSs. We thought everything was configured properly, but you'd be surprised what comes to roost when you test everything.
We recently had a test night where all we did was test the UPS system and shutdown procedures, and there was a couple gotchas. Interestingly, by default the APC powerchute app we were using defaulted to shutting down the UPS completely after the [first] server went down - not good. This was buried fairly deeply in the configuration.
Equally important to any protection measure, be it RAID, Power Protection, whatever - is testing!
I really can't understand people who don't have a UPS. Don't you care about your data? At all? The UPS is not very expensive (My BackUPS 900 is very nice and only $100), and will last a long time (you just replace the batteries now and then). Once you are on UPS, you can stop worrying about any power issues, journalling file systems, crash recovery, and all that. The computer will never fail due to power. If you run Linux, it will also never fail due to the OS. If you are a normal user, that means your computer will never fail, period. Seriously, there is no excuse for not having a UPS. Go and get one right now!
WTF is wrong with your power installations, guys? Flickering lights, brownouts whenever the printer warms up, voltage spikes every day? Is your electricity produced by hamsters in wheels and delivered through bell wire? Perhaps you should stop sprinkling the place with UPSs and pay someone to redo your electrical installation instead.
If you're not at the machine, or don't know how to shutdown without a CRT, the disk can get messed up when the UPS runs out of power. Unless you only have a desktop machine with no network applications writing to disk (no BitTorrent); then you might be OK if you just walk away from your keyboard and let the system become quiescent before it loses power.
1) You build a RAID5 array
2) You backup
3) You test your backups
4) You plug your server DIRECTLY INTO THE WALL?!?!
Ummm DUH! Of course you need a UPS - what kind of yutz does 1-3 and then powers the server off of unconditioned wall power?
---- "Logoff! That cookie shit makes me nervous!" - A. Soprano
I'm not convinced that whole-house protection helps much either. A few years ago, there was some event during a thunderstorm - we never quite figured out what - that fried two TiVo modems, a garage door opener (the circuit board was visibly burned and light bulb shattered), a few Wirsbo hot-water thermostats (not even connected to the mains power, just low-voltage from the boiler), a few Vantage whole-house dimmer modules, an intercom, and a printer.
The house was, at the time, "protected" with two Cutler-Hammer CHSP suppressors (MOV). After the incident, their "protection working fine" LED was still lit! The only room with no damage was my recording studio, which had Equitech balanced-power panels; the ginormous-hunk-of-iron transformer probably saved me there. The power company had no reports of direct lightning strikes, other than one hit that took out a transformer (and since my power didn't go out, I apparently wasn't on that circuit).
I recall doing some reading about lightning arrestors, ground grids, and such, and eventually came to the conclusion that it (a) surge suppressors are fairly useless, because they don't always present the quickest path to ground, and (b) it would be 10x cheaper to let stuff die and replace it than to set up a proper lightning protection system.