Slashdot Mirror


Why Power Failures Can Always Lead To Data Loss

bigsmoke writes "So, all your servers run on RAID. You back up religiously. You're even sure that your backups are recoverable. But do you also need a UPS? According to Halfgaar (on Slashdot before to promote better Linux backup practices), yes, usually you do. He argues that despite technological advancements such as file system journaling, power failures can still cause data loss in most setups."

85 of 456 comments (clear)

  1. Well no shit, Sherlock by Skyshadow · · Score: 5, Insightful

    Power losses can cause data loss? Gee, you mean that my system that relies on electricity for everything it does can be adversely effected by power outages even if I take precautions? That's some good admin work there, Lou -- if only there was some sort of law that covered the tendency of things that can go wrong to go wrong...

    Next week: Fires can make things warm, floods can make things wet.

    --
    Every year during my review, I just pray the words "slashdot.org" aren't mentioned.
    1. Re:Well no shit, Sherlock by Anonymous Coward · · Score: 5, Funny

      I don't know about you, but my servers run on the power of cotton candy and happy thoughts.

    2. Re:Well no shit, Sherlock by Skyshadow · · Score: 5, Funny

      I don't know about you, but my servers run on the power of cotton candy and happy thoughts.

      As a former sysadmin, I would think that any machine reliant on 'happy thoughts' would be the most crash-prone system in the history of computing.

      --
      Every year during my review, I just pray the words "slashdot.org" aren't mentioned.
    3. Re:Well no shit, Sherlock by Anonymous Coward · · Score: 5, Informative

      Ok, people who don't just read the executive summary knew this all along, but perhaps it's necessary that someone spells it out for the rest: Journaling and RAID do not prevent data loss in case of a power outage (and many more circumstances). If you know why, just skip the article. If you're wondering how you can lose data if you write everything to two disks and your filesystem guarantees its own consistency, then perhaps this is the wake up call that you need.

    4. Re:Well no shit, Sherlock by Midnight+Thunder · · Score: 2, Funny

      if only there was some sort of law that covered the tendency of things that can go wrong to go wrong.

      I hear Murphy might have one :)

      --
      Jumpstart the tartan drive.
    5. Re:Well no shit, Sherlock by Timothy+Brownawell · · Score: 5, Funny
      No, it really does have some interesting observations, with some very scary implications:

      One of the first things that will happen, is that the memory DIMMs will no longer be refreshed properly (DRAM needs to be refreshed constantly otherwise it will loose it's data) and very rapidly, the memory will contain only garbage. The hard drives and DMA controller however, will run a bit longer; so if data is being written to disk, the DMA controller will keep reading data from memory, but it has no idea that this data is corrupted.

      However, we've recently seen that RAM holds state well enough to preserve crypto keys thru a power cycle. This has very scary implications: the RAM knows what's happening, and behaves differently (loses data immediately on power-off or remembers it for several seconds) in order to cause the most difficulty for the owner of the machine.

      Not only are computer components intelligent and self-aware, they're also out to get us!

    6. Re:Well no shit, Sherlock by NFN_NLN · · Score: 4, Funny

      My servers run on Electricity but the RAID controller has battery backed up RAM so any cached data will persist a power failure and the disks are in writethrough mode.

      I like this setup, but please. Tell me more about this cotton candy technology? Is it superior.

    7. Re:Well no shit, Sherlock by MightyMartian · · Score: 3, Insightful

      My servers run on Electricity but the RAID controller has battery backed up RAM so any cached data will persist a power failure and the disks are in writethrough mode.

      That is until the 10,000 volt spike when the power company improperly brings the grid back up bakes the RAM, the battery, RAID controller and the hard drives.

      --
      The world's burning. Moped Jesus spotted on I50. Details at 11.
    8. Re:Well no shit, Sherlock by ArsonSmith · · Score: 2, Funny

      We just need to get that guy that declared Pluto is no longer a planet to declare that electricity no longer causes data loss.

      Side note: He also declared that north is no longer a direction, blue is no longer a color, and your sister is no longer a virgin.

      --
      Paying taxes to buy civilization is like paying a hooker to buy love.
    9. Re:Well no shit, Sherlock by Anonymous Coward · · Score: 5, Funny

      I can offer you a Happy Thought UPS. It's a box of puppies. Be careful though, it only has 500 puppy Amps of capacity.

    10. Re:Well no shit, Sherlock by Anonymous Coward · · Score: 5, Funny

      Your mom loves you and pays for the electricity. That doesn't mean that your servers run on love.

    11. Re:Well no shit, Sherlock by ArsonSmith · · Score: 4, Funny

      Except the server that runs http://youporn.com/

      --
      Paying taxes to buy civilization is like paying a hooker to buy love.
    12. Re:Well no shit, Sherlock by fbjon · · Score: 3, Funny

      You do now.

      --
      True confidence comes not from realising you are as good as your peers, but that your peers are as bad as you are.
    13. Re:Well no shit, Sherlock by RobertM1968 · · Score: 2, Insightful

      Ok, people who don't just read the executive summary knew this all along, but perhaps it's necessary that someone spells it out for the rest: Journaling and RAID do not prevent data loss in case of a power outage (and many more circumstances). If you know why, just skip the article. If you're wondering how you can lose data if you write everything to two disks and your filesystem guarantees its own consistency, then perhaps this is the wake up call that you need.

      Any Server Admin who didnt realize that isnt really a server admin. And the rest of the world probably doesnt care or need to know.

      Just a thought... ;-)

    14. Re:Well no shit, Sherlock by Evro · · Score: 3, Interesting

      That's why any datacenter worth putting your servers in pipes its power through a flywheel or some other electricity "cleaner". A 1-ton lead ball spinning at 10,000 RPMs isn't going to speed up that much on a spike like that.

      --
      rooooar
    15. Re:Well no shit, Sherlock by Darkk · · Score: 2, Insightful

      I lost an entire RAID 5 disk array due to bad ram. It was running Windows 2003 64bit server and one day I turned the screen on and noticed some artifacts and a completely locked up machine. To be sure it wasn't some freeze up in the GUI I tried accessing the shares which didn't respond.

      So I was like ok great..time to do a hard shut down and reboot. Well, when it came back up I noticed my RAID array is no longer showing up in the shares or in disk manager. I was like..aww crap. I tried to rebuild the array via the built-in tools of the raid controller and it didn't work. Somehow it totally fuber the disk array tables to the point everything on my 5 320gig disks are trashed. Good thing the OS runs on a separate non-raid hard disk right off the motherboard's disk controller.

      Nothing wrong with the raid controller and the drives. Just at the point of writing stuff to the drives RAM had to take a dump and totally froze the server.

      Least to say I swapped out the ram modules with known good ones and never had a problem since. Lucky I regularly make backups of my critical stuff to another set of hard drives elsewhere.

      I follow this moral code as my second religion, "Don't put all your eggs in one basket!"

    16. Re:Well no shit, Sherlock by supersat · · Score: 5, Informative

      Are you sure your disks are in write-through mode? Have you checked? Brad Fitzpatrick (of LiveJournal, memcache, OpenID, etc. fame) discovered that many disks lie about being in write-through mode, and wrote a utility to check it.

    17. Re:Well no shit, Sherlock by deraj123 · · Score: 3, Insightful

      I'm curious...how can one opto-isolate server components from the power source?

    18. Re:Well no shit, Sherlock by mweather · · Score: 4, Funny

      I tried one of those. You gotta keep adding food to it or it stops working after a week or two. Starts stinking, too.

    19. Re:Well no shit, Sherlock by Phroggy · · Score: 2, Insightful

      Any Server Admin who didnt realize that isnt really a server admin. And the rest of the world probably doesnt care or need to know.

      Just a thought... ;-)

      The fact that they're not really server admins doesn't stop them from running servers, though!

      --
      $x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
      $x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
    20. Re:Well no shit, Sherlock by Frank+T.+Lofaro+Jr. · · Score: 2, Informative

      Actually when power drops the "power good" line from the power supply goes low, which causes a system reset and locks everything up.

      This is also how the computer knows how long to keep the reset line engaged on startup, it stays asserted until the power supply says the power is good, and everything has proper voltage.

      --
      Just because it CAN be done, doesn't mean it should!
  2. Illiteracy by carou · · Score: 5, Funny

    From TFA:

    (DRAM needs to be refreshed constantly otherwise it will loose it's data)

    Fly, little data! Be free!

    1. Re:Illiteracy by Ngarrang · · Score: 2, Funny

      Get off my lawn, you little bits!

      --
      Bearded Dragon
  3. can always lead to data loss? by internerdj · · Score: 5, Funny

    Definitely maybe?

  4. UPS - more than just a backup. by Zebadias · · Score: 4, Informative
    UPS smooths out all those nasty spikes as well as stopping your servers from going down to a 1 second power cut.

    UPS is more than just saving your data.

    1. Re:UPS - more than just a backup. by linuxpyro · · Score: 4, Informative

      It's also important to get a decent UPS too, if you're using it for something like a server. I think the cheapy ones basically just use a transfer relay, where as the higher end ones actually run the hardware off of the battery via the inverter all the time. While I would think that with the former (called "standby" UPSs maybe?) the transfer time wouldn't be enough to cause too many problems, you still don't have the buffer that you'd get with a true uninterruptible power supply.

      I think a lot of the cheaper ones don't put out a true sine wave either, though for their intended purpose of letting you shutdown your desktop cleanly again they're probably fine.

      --
      Saying "I'll probably get modded down for this" in a post is the best way to get it modded up.
    2. Re:UPS - more than just a backup. by Anonymous Coward · · Score: 2, Informative

      >UPS smooths out all those nasty spikes as well as stopping your servers from going down to a 1 second power cut.

      A true UPS smooths out the spikes. Most of today's UPSes (at least consumer models) are off-line supplies. The batteries don't kick in unless the power is out. Worse than that, the cheap ones don't output sine waves, they output square waves. These UPSes also take some time to switch to batteries, leaving your computer without power for that time.

      Now, some of those UPSes have filtering technology like you find in expensive powerbars, sure. But it isn't the same as an always-on UPS at all.

    3. Re:UPS - more than just a backup. by SuperQ · · Score: 4, Informative

      Yup the 3 major types of battery UPSs I know of:

      Offline - Relay or simple failover. (APC Backups)

      Line Interactive - Can correct line over/under voltage to a point (APC Smartups)

      Online - Full AC -> DC -> AC conversion. (APC Symetra, Liebert, anything that doesn't suck)

      Basically outside of home use you want an online type UPS.

      There are other systems like motor/generator flywheel types, but they need a very fast backup generator to sustain anything more than 30 seconds of outage. But they're great for smoothing out some types of line issues.

  5. Duh! by mlwmohawk · · Score: 4, Insightful

    I remember a discussion on the PostgreSQL hacker's list about recoverability and transaction logs.

    You can't make a system that will not lose data, you can only make a system that knows the last save point of 100% integrity.

    There are too many variables and too much randomness on a cold hard power failure. You absolutely need a UPS that gives you time to shut down cleanly.

    1. Re:Duh! by sm62704 · · Score: 3, Insightful

      You're still hosed if your server's power supply goes titsup. Or if your hard drive crashes. Or if the building burns down.

      Gotta love these slashvertisements, I wonder whose UPSes they're pimping? Its not like we don't all know you need a UPS. What's next, a FA about how you need fire insurance?

      --
      mcgrew's razor: Never attribute to stupidity that which can be explained by greedy self-interest
  6. Well of course you need UPSs, but by pembo13 · · Score: 5, Informative

    APC is the only UPS maker on the market that has at least spent some small effort so that their UPSs can be properly integrated with a Linux machine. I made the mistake of purchasing an Ultra UPS as it was cheaper than the APC.

    --
    "Thanks for all the money you paid to us. We've used it to buy off ISO among other things" -Microsoft
    1. Re:Well of course you need UPSs, but by bruceg · · Score: 2, Interesting

      been there, and done that! We recently moved a few servers this way. Just be careful, and go slow.

  7. What this really points out... by JesseL · · Score: 2, Insightful

    is a weak spot in the design of most computers.

    Computer power supplies should be built with enough spare capacitance to run things long enough for the computer to save critical data, and operating systems and critical apps should be able to handle an emergency shutdown and save critical data in very short order.

    This is old hat in embedded systems.

    --
    "Prefiero morir de pie que vivir siempre arrodillado!"
    1. Re:What this really points out... by mlwmohawk · · Score: 4, Informative

      Computer power supplies should be built with enough spare capacitance to run things long enough for the computer to save critical data

      Here's a question for you: Calculate the size of the capacitor needed that can hold enough power to run a 200W load for 5 minutes and maintain a voltage level within a specific usable range.

      Hint: its BIG. batteries are more space efficient, but the chemicals and outgassing make them inappropriate for location INSIDE the computer box.

    2. Re:What this really points out... by JesseL · · Score: 4, Insightful

      Who the hell is talking about 5 minutes!? I'm saying you should be able to get a clean shutdown in 5 seconds if you prioritize it correctly.

      --
      "Prefiero morir de pie que vivir siempre arrodillado!"
    3. Re:What this really points out... by Macman408 · · Score: 5, Interesting

      This is old hat in embedded systems.

      Yes, but embedded systems usually have lower power requirements, or at the very least, a smaller range of power requirements. You can't add 3 PCIe cards, a few extra drives, and a few more GB of RAM to most embedded systems.

      I worked on the design of an embedded system a few years ago that had a holdup spec - I think it was supposed to survive for 50 ms with no power. So a 50 ms power interruption would result in continued operation, while an outage longer than that was allowed to reset the board. However, the power draw on the board was around 200 Watts; being able to supply that much power for that long in a fairly compact form factor was a huge hurdle. It also caused airflow problems, because the giant capacitors would prevent air from getting to other components on the board, like the CPU. In the next version of the spec, I believe the holdup requirement was eliminated - apparently we weren't the only ones having trouble meeting that requirement.

    4. Re:What this really points out... by Locklin · · Score: 3, Insightful

      Why 5 minutes? It usually takes less than a second to run a sync on the disks depending on how active they are. A couple seconds of runtime should be enough to do an "emergency shutdown" and avoid data corruption.

      ####@johncash:~$ time sync

      real 0m0.004s
      user 0m0.004s
      sys 0m0.000s

      --
      "Knowledge is the only instrument of production that is not subject to diminishing returns" -Journal of Political Econom
    5. Re:What this really points out... by natoochtoniket · · Score: 2, Interesting

      The problem is that different applications systems have different amounts of stats that must be saved. An RT app usually only has a memory buffer that can be written in a small number of IO's. Many business apps have relatively lots of data, in non-contiguous buffers, that require hundreds of IOs to store. Many business systems have hundreds of such apps running in the machine at the same time. Some systems can have gigs of data, in thousands of buffers, in their write-behind cache. And, some businesses have systems that must not shut down, except for actual emergencies like fire or flood.

      How does the hardware designer of a general-purpose computer guess what kinds of apps will run in that machine? He/she cannot.

      The external power supply (aka, the UPS) can be configured to accommodate the needs of the application. An application that needs lots of power for a long time can be configured with a big UPS. And, an app that doesn't need it, doesn't have to pay for it.

    6. Re:What this really points out... by Firehed · · Score: 3, Informative

      Other than the lack of communication at present between the PSU and the rest of the system (on a hardware and software level), what you're describing really seems to be the computer equivalent of throwing your hands in front of your nuts as you spot the incoming baseball. It helps the immediate problem of data (or testicle) loss, but it's really just a small amount of damage control.

      This is why a proper UPS that can trigger a full system shutdown once you hit a certain power remaining threshold is far preferable. Granted I'd rather have a controlled crash than the risky nonsense that would come from the power cord being yanked, but (right now) computers can only go so far to help themselves in a couple-second window.

      --
      How are sites slashdotted when nobody reads TFAs?
    7. Re:What this really points out... by droopycom · · Score: 2, Insightful

      You mean, the battery location on my laptop is not appropriate ?

      I know laptop and servers are very different but still, if my laptop can run 2 or 3 hours on a battery (including the LCD), it should not be that difficult to use the same technology to power a server for a 5 minutes (with no screen needed).

    8. Re:What this really points out... by JesseL · · Score: 2, Interesting

      I think you're making it more complicated than it needs to be.

      If the system gets a signal that power is going away very, very soon, drops everything else, and just devotes its last seconds to getting things in order - it should be doable in a few seconds and be vastly preferable to the alternative of just having power go away without warning.

      Obviously a UPS is an even better option, but it's not every place that could use a UPS is ever going to get one and it would be good if we could work on the problem from the other end too. Most PCs and casual servers are way more vulnerable to momentary power outages than they ought to be. 10-20 Farads worth of 5V caps and some thoughtful programming would make things a lot less delicate.

      --
      "Prefiero morir de pie que vivir siempre arrodillado!"
    9. Re:What this really points out... by jimicus · · Score: 4, Informative

      Why 5 minutes? It usually takes less than a second to run a sync on the disks depending on how active they are. A couple seconds of runtime should be enough to do an "emergency shutdown" and avoid data corruption.

      ####@johncash:~$ time sync

      real 0m0.004s
      user 0m0.004s
      sys 0m0.000s

      That will sync the disks, but it won't stop the database from accepting incoming data. It won't stop cron jobs which might be just about to trigger. It won't deal with tasks that are in the middle of a big operation which involves a lot of writing to disk.

    10. Re:What this really points out... by Bender0x7D1 · · Score: 2, Interesting

      I agree with you.

      My point was that just because a battery can power a laptop for several hours doesn't mean a single battery can supply a server for 5 minutes. So, the GP was claiming that because: (laptop power consumption) * (2-3 hours) == (server power consumption) * (5 minutes) it shouldn't be hard for the same battery to power both. The point I was trying to make is that a device that provides a certain range of performance, (in this case the car at 70 MPH), doesn't mean it is easy for it to perform well outside that range, (operating at 420 MPH).

      --
      Reading code is like reading the dictionary - you have to read half of it before you can go back and understand it.
  8. It happened to someone by Joebert · · Score: 4, Insightful

    The funny part is someone had to have thought they were safe without a UPS for this to become news.

    --
    Wanna fight ? Bend over, stick your head up your ass, and fight for air.
    1. Re:It happened to someone by Verteiron · · Score: 4, Funny

      Yes. My first reaction upon reading the summary was.. "Duh?" What, did they have it plugged into the wall before that? A UPS becomes MORE critical, not less, as the cost of hardware (RAID arrays are expensive) goes up.

      --
      End of lesson. You may press the button.
  9. Don't for get to test people, TEST! by sco_robinso · · Score: 5, Insightful

    In my company, everything is behind UPSs. Our SAN is even behind 2 separate UPSs. We thought everything was configured properly, but you'd be surprised what comes to roost when you test everything.

    We recently had a test night where all we did was test the UPS system and shutdown procedures, and there was a couple gotchas. Interestingly, by default the APC powerchute app we were using defaulted to shutting down the UPS completely after the [first] server went down - not good. This was buried fairly deeply in the configuration.

    Equally important to any protection measure, be it RAID, Power Protection, whatever - is testing!

    1. Re:Don't for get to test people, TEST! by Darkk · · Score: 4, Interesting

      I 100% agree with the idea of testing under controlled conditions. The oops you guys discovered is a good thing to be caught early on. I can imagine the look on your support team's faces when the UPS suddenly turned itself off while the remaining servers still trying to perform a safe shutdown. I'm sure the secondary UPS was left running as a precaution until the test is successful.

      I have seen a screw up where somebody cut into a live power cord thinking it was a tie wrap caused a major short in the PDU. The guy thought he was safe until he discovered whoever installed the servers didn't double check the power connections and loads so it created a cascade failure in several racks and lost several tons of data. Recovery took awhile.

      Least to say it was not a good day.

  10. On the other hand.. by m0i · · Score: 2, Interesting

    you can recover your RAM minutes after loosing power.. no kidding! http://citp.princeton.edu/memory/

    --
    have you been defaced today?
  11. Get a UPS by Chemisor · · Score: 3, Insightful

    I really can't understand people who don't have a UPS. Don't you care about your data? At all? The UPS is not very expensive (My BackUPS 900 is very nice and only $100), and will last a long time (you just replace the batteries now and then). Once you are on UPS, you can stop worrying about any power issues, journalling file systems, crash recovery, and all that. The computer will never fail due to power. If you run Linux, it will also never fail due to the OS. If you are a normal user, that means your computer will never fail, period. Seriously, there is no excuse for not having a UPS. Go and get one right now!

    1. Re:Get a UPS by LBArrettAnderson · · Score: 2, Insightful

      Unless your PSU breaks...

    2. Re:Get a UPS by GuldKalle · · Score: 2, Interesting

      Depends on where you live. Here in Denmark I've only experienced two power outages in my lifetime. One was in a house in the middle of nowhere, during a winter storm, the other was due to an unpaid bill. Under those circumstances I've got a lot of other stuff to spend 100$ on.
      If we were talking about a datacenter, then yes, UPS on everything important. But for home use, nah.

      --
      What?
  12. Is this bring your kid to work day? by alta · · Score: 4, Funny

    Ok, now everyone has something to give to your kid for the sysadmin-in-traning class.

    For the rest of us... back to work, nothing here you didn't learn your first year.

    For the poster... Shame shame... Turn in your card.

    --
    Do not meddle in the affairs of sysadmins, for they are subtle, and quick to anger.
  13. Re:Not me! by sm62704 · · Score: 4, Funny

    If there's clouds in your server room, your server's probably been slashdotted and is on fire!

    --
    mcgrew's razor: Never attribute to stupidity that which can be explained by greedy self-interest
  14. Carefully proofredded article by Intron · · Score: 2, Funny

    "3.2. (Ecrypted) file systems"

    Please tell me more about these ecrypted file systems. Do they also do gurnalling?

    --
    Intron: the portion of DNA which expresses nothing useful.
  15. Re:That's what I always say sometimes by alta · · Score: 5, Interesting

    Rule #1.

    NEVER plug a laser printer into a UPS. The power that the fuser draws is WAY too much.

    Look at some of the cheap office units, they show little pictures on them, notice the printer icon is on the surge side, NOT battery/surge side.

    If the power goes out, you should NOT be trying to print.

    http://articles.techrepublic.com.com/5100-10878_11-6085460.html See #6

    http://arstechnica.com/guides/other/ups.ars/3

    http://www.jetcafe.org/npc/doc/ups-faq.html#0405 see 04.05

    Would you put a space heater on a UPS? Shredder? Vacuum? Table Saw? If you put a laser printer on it, you may as well.

    --
    Do not meddle in the affairs of sysadmins, for they are subtle, and quick to anger.
  16. And this is what ZFS looks out for by E-Lad · · Score: 3, Interesting

    ...by design. TFA doesn't delve into too much detail, but a sudden power loss on such software RAID systems is a condition that ZFS accounts for. Its Copy-on-write (COW) and write-length stiping strategy prevents things such as the RAID5 write hole condition, a condition that has the biggest chance of occurring when a power loss event happens.

  17. no, that's not the scary thing by ScentCone · · Score: 2, Funny

    The scary thing is that yet one more person can't feakin' tell the difference between "loose" and "lose." It's becoming an epidemic.

    --
    Don't disappoint your bird dog. Go to the range.
    1. Re:no, that's not the scary thing by JustOK · · Score: 5, Funny

      its not worth loosing you're cool about grammer misteaks and etc.

      --
      rewriting history since 2109
  18. Re:That's what I always say sometimes by bgat · · Score: 2, Informative

    Yes, quite. It can't handle the substantial inrush current needed by the laser printer.

    The "click" you hear in the UPS when the laser printer warms up is the UPS noting the drops on the power mains, which gives you some idea just how much current that printer needs.

    I have a Samsung ML2150, and have noticed the same thing. Lights flicker, etc. whenever I submit a print job and the printer transitions from standby to active. The various UPSes in my office sense that, and respond with clicks and beeps.

    Take the laser printer off the UPS. If you really need printer capability during a power failure, switch to an ink jet.

    --
    b.g.
  19. A UPS is good to have. Even at home. by Forge · · Score: 3, Interesting

    Last night we had a power outage. I shut down the desktop and was able to continue working for almost 2 hours on the laptop because with the Desktop down the UPS was only carrying the DSL router and the WiFi box.

    At work. Power is a whole enterprise within the company I work for.

    Dual gas powered Generators at each location, Rooms full of Batteries for the Telecoms gear (most is straight DC) and Inverters for the Servers. (DC PSUs are available for some of the servers we use but at so high a premium that the inverters are cheaper.)

    We can handle a dozen Power cuts in a day with no service interruption or data loss ("Tested" 2 weeks ago) and we can stay up without external power for more than a week. After that we have to start trucking in additional diesel.

    Yep. That's right. With sufficient fuel we can be online indefinably. Which we will have to do if we get hit by a major hurricane.

    Which means the phone network is a lot more reliable than the Power grid where I live.

    As for Data loss. I have over the years done a lot of recovery work. "Morfy" of "Murfy's Law" fame isn't a guy or a girl. He is a deamon from the darkest pits of hell sent to torment the souls of IT workers everywhere.

    Imagine a server, where UPS #2 is down for repairs, UPS #1 fails during a power cut, When everything comes back up we find 2 failed hard drives in the RAID 5 on the email server.

    despite previous testing and confirmation that the backups work the most recent tapes failed to read.

    Eventually we sent the failed drives off to a Data recovery company in Florida because

    #1. The customer can afford it.
    #2. Simply "skipping" a few days of Email is not an option for a bank (hence the ability to afford data recovery).

    So yeah. A UPS is essential. Just like RAID, Clustering and Backups but in the end it can all fail.

    Best advise? Memorize all your important data. That way if you loose your mind, you are not responsible for the lost Data (or anything else).

    --
    --= Isn't it surprising how badly I spell ?
  20. Other reasons to run a UPS by rwa2 · · Score: 3, Interesting

    UPS units are relatively cheap, it's well worthwhile to invest in one, not just to protect from data loss:

    * Hardware loss: I've seen a lot of hardware blown up from power interruptions. Do you trust your power company that much to provide clean power to you? Sure surge protectors help a bit, but a decent UPS costs maybe twice as much as a good surge protector.

    * Time lost restoring your session after blackouts / brownouts: OK, maybe you're used to restarting your computer every morning anyway. But I like to leave things open and return to my desktop just the way I left it arranged.

    * Stats: Using NUT and Munin, you get to monitor and log your power, so you can see things like exactly when your electricity went out and for how long, what load your PC is drawing after that last upgrade, etc. e.g.: http://hairball.bumba.net/cgi-bin/nut/upsstats.cgi?host=apc@localhost

    * Graceful shutdown: you have a chance to tell your buddies that your power just went out, and you'll be coming back once it's restored.

    Frankly, I'm a little surprised a backup battery isn't built into PC power supplies already, so they'd work a bit more like laptops. Same with networking gear.

    1. Re:Other reasons to run a UPS by Darkk · · Score: 2, Interesting

      It's human nature. We tend to not think about the future until "oh shit" happens THEN something is done about it. It happens with everything these days.

      Years ago UPS used to be very expensive item and was not the norm for home user to actually own one. Now it's becoming more affordable but the same users who couldn't afford a UPS back then think, "Well, I've been without one for years so why should I need one now?". Same logic applies to what I said above.

    2. Re:Other reasons to run a UPS by v1 · · Score: 2, Informative

      What I have is a Tripp-Lite SB-2000, which is an oldie but a goodie. Only link I can find now is here. It runs on 24v external power, so I just set two car batteries on top of it. Picked it up years ago for a song on ebay.

      That unit though really is meant to have massive batteries on it. (looks like 24v golf cart batteries maybe, it has large binding posts on it for the external battery, there is no internal battery)

      You can't just hook a car battery up to some old APC you have sitting around. It may run on it, but there are two factors to keep in mind:

      1) UPS's are designed with cooling in mind. Sure you can put a monster battery on it so it has a runtime (at max output) of an hour instead of 10 minutes, but is it going to catch on fire or just plain overheat and shut down at 30 minutes in?

      2) if it runs off the batteries, it has to charge them back up. The charge circuit faces the same limitations as the inverter in terms of capacity and cooling. Your UPS may run fine for 45 minutes, but then when power comes back, the charge circuit may fry after an hour of continuous load trying to bring the battery back up to full.

      and of course 3) installing a larger battery doesn't affect your maximum output (watts), it only affects your maximum uptime (watt-hours)

      I suppose also 4) is worth considering... not all hardware LIKES to run off a UPS. The power tends to be kinda nasty. I don't even want to know what my old tripp-lite puts out for power but I'm pretty sure it's very dirty. Fortunately all the hardware that's on it doesn't seem to mind. (yet) The longer you run something on a UPS, the more likely you are to damage it if it's not tolerant. I once tried placing a harmonic filter on my tripp-lite. Worked like a charm, put out a nearly perfect and clean sine wave. For about 6 minutes. Then it smoked. The power was simply too nasty for it to filter. Newer UPSs of course do better here. They usually advertise a "modified sine wave", same as you see stamped on inverters.

      Final note: no, you cannot stack UPS's. The line filters on modern UPS's don't like the power coming from a UPS and will switch on when the upstream UPS turns on.

      --
      I work for the Department of Redundancy Department.
  21. Our Tandem by PIPBoy3000 · · Score: 5, Interesting

    This reminds me of my favorite power loss story. The facility was doing a generator test, where we were supposed to switch over from city power to the generator. Unfortunately it didn't happen smoothly and the UPS kicked in. Sadly it turned out that so many servers had been added since the original design, the UPS was really only good for fifteen minutes or so. The final problem was that our operator didn't notice the issue quickly enough and so the next thing everyone in IT knew is that our main data center just lost power.

    We spent most of the day getting our servers back up from various states of disrepair (confirming the article, power loss is superbad). It turns out that our main medical software ran on a Tandem. Though the drives and such lost power, the CPU had a backup of D-batteries and survived the power loss just fine. Needless to say, we stopped making fun of their seemingly primitive emergency backup power.

    1. Re:Our Tandem by Alioth · · Score: 2

      This is my favorite power loss story.

      It's great.

      http://www.alioth.net/tmp/vaxen.html

  22. It can be done! by GameboyRMH · · Score: 3, Funny

    ...If you're a Mac fanboy running a network of Apple computers. If anything goes wrong, it's an artistic expression and anyone who criticizes the problem is a closed-minded square who "doesn't get it." Then you sit back in self satisfaction listening to alternative pop, thinking about how hip and different and enlightened you are.

    Happy thoughts power supply: Dead stable.

    Linux networks can run on happy thoughts as well as long as you run on electricity during the setup and installation stages and then switch to happy thoughts once everything's running properly...you just have to make sure you never, ever run emacs, vi, or Gpaint.

    --
    "When information is power, privacy is freedom" - Jah-Wren Ryel
  23. Chose your UPS carefully, and TEST it the hard way by jalet · · Score: 4, Informative

    This morning we had a planned shutdown of 100 servers for eletricity works, all were on the same 40 kVA UPS. All went fine, we shutdown all servers to be safe, and kept some stuff online for montoring and the like, then main power was shut off. The UPS gladly took the load, with an estimated battery life of 75 minutes, more than what was needed for the electrical work. Once this was done, the electrician put the main power back on, and... the UPS shutdown !

    Since all servers were stopped already we didn't lose anything, but we had to put the UPS in bypass mode for a while, then back on, and now we hope for the best waiting for the UPS to be repaired, crossing most of our fingers because of the holidays...

    In summary : testing that the UPS can handle the power coming back is as important as testing for it to be able to handle the power shutting down.

    --
    Votez ecolo : Chiez dans l'urne !
  24. Re:A UPS is good to have. Even at home. by v1 · · Score: 2, Informative

    Last night we had a power outage. I shut down the desktop and was able to continue working for almost 2 hours on the laptop because with the Desktop down the UPS was only carrying the DSL router and the WiFi box.

    good uptime for a laptop. got a second battery? (I know I do)

    Inverters for the Servers. (DC PSUs are available for some of the servers we use but at so high a premium that the inverters are cheaper.)

    that's because it just has to invert it before it can step it up or down. If you supply DC you are actually introducing another necessary step. It gets hard to cram 2x the electronics into the PS. Inverters are definitely the way to go.

    We can handle a dozen Power cuts in a day with no service interruption or data loss ("Tested" 2 weeks ago) and we can stay up without external power for more than a week. After that we have to start trucking in additional diesel.

    Yep. That's right. With sufficient fuel we can be online indefinably. Which we will have to do if we get hit by a major hurricane.

    Might want to rethink how easy it is to get a truck in during a hurricane. ;) Unless it's more of a boat, think Katrina.

    Imagine a server, where UPS #2 is down for repairs, UPS #1 fails during a power cut, When everything comes back up we find 2 failed hard drives in the RAID 5 on the email server. despite previous testing and confirmation that the backups work the most recent tapes failed to read.

    um, ouch?

    Best advise? Memorize all your important data. That way if you loose your mind, you are not responsible for the lost Data (or anything else).

    Was going to say, all of the above is moot if an EF5 rolls through town. Better add "offsite backup" to your list if it's not already there. With the EF5 that ran through here last month, some people got their backups turned into "offsite" backups. (maintenance guy was here last week, said they are still looking for their dump truck )

    --
    I work for the Department of Redundancy Department.
  25. Thanks Captain Obvious by bravecanadian · · Score: 2, Informative

    Any professional server or data center setup that does not include a UPS for a graceful shutdown... is almost by definition NOT professional.

  26. Voltage Spikes by natoochtoniket · · Score: 5, Informative

    The typical small UPS system has some amount of surge protection built-in. But it's typically only good for at most a couple thousand joules. But then, if you get a spike that is big enough to blow a varister, you also get to buy a new ups.

    A better solution is to put a "whole house" surge protector on the circuit-breaker panel. It protects everything, with a much higher number of joules. Five or six pounds of varisters can absorb a lot more shock than one ounce of varisters. They cost about $100, and can be found at most big hardware stores or electrical supply houses. That doesn't eliminate the need for a ups. It does protect the ups, along with the other equipment, from most voltage spikes.

    Last year, lightning hit the power pole 20 feet from my house. We know where it hit because the pole caught fire. My next-door neighbors on both sides lost every single piece of electrical equipment -- not just computers, TV's, and stereos, but also fridge, microwave, water heater, and range. All of it was damaged beyond repair. We barely noticed the hit, except for the bright flash of light, and had no damage at all.

    1. Re:Voltage Spikes by Jay+L · · Score: 2, Insightful

      I'm not convinced that whole-house protection helps much either. A few years ago, there was some event during a thunderstorm - we never quite figured out what - that fried two TiVo modems, a garage door opener (the circuit board was visibly burned and light bulb shattered), a few Wirsbo hot-water thermostats (not even connected to the mains power, just low-voltage from the boiler), a few Vantage whole-house dimmer modules, an intercom, and a printer.

      The house was, at the time, "protected" with two Cutler-Hammer CHSP suppressors (MOV). After the incident, their "protection working fine" LED was still lit! The only room with no damage was my recording studio, which had Equitech balanced-power panels; the ginormous-hunk-of-iron transformer probably saved me there. The power company had no reports of direct lightning strikes, other than one hit that took out a transformer (and since my power didn't go out, I apparently wasn't on that circuit).

      I recall doing some reading about lightning arrestors, ground grids, and such, and eventually came to the conclusion that it (a) surge suppressors are fairly useless, because they don't always present the quickest path to ground, and (b) it would be 10x cheaper to let stuff die and replace it than to set up a proper lightning protection system.

    2. Re:Voltage Spikes by natoochtoniket · · Score: 4, Informative

      The path-to-ground is really important, as is the quality of the ground. The length of the path is the reason why whole-house devices are installed at the service entrance panel. But, that assumes that your service-entrance ground is a good ground.

      If your ground is not good, shorting to ground won't do much good. A lot of houses around here are grounded to plumbing pipe that is buried just 12" deep. During a dry spell a few years ago, I detected variable voltage where it shouldn't have been. The voltage problems cleared up after I added an 8-foot vertical ground rod to the system.

      The thing that kills a surge protector is too many amps for too long. If it shorts the power to ground (low-resistance), but the ground is not really well-grounded, then the whole thing can float close to line-voltage. In that case, that voltage can destroy your other devices, while the surge unit never gets enough current to burn the varisters.

    3. Re:Voltage Spikes by russotto · · Score: 2, Informative

      I'm not convinced that whole-house protection helps much either. A few years ago, there was some event during a thunderstorm - we never quite figured out what - that fried two TiVo modems, a garage door opener (the circuit board was visibly burned and light bulb shattered), a few Wirsbo hot-water thermostats (not even connected to the mains power, just low-voltage from the boiler), a few Vantage whole-house dimmer modules, an intercom, and a printer.

      Common-mode spike. The power line was fine, but your ground got knocked up to a few kilovolts by a nearby strike.

    4. Re:Voltage Spikes by natoochtoniket · · Score: 2, Informative

      Neutral and ground are supposed to be bonded at the service entrance panel, and not anywhere else. If the ground is actually grounded, with a big copper wire to a big copper spike that goes deeper than the water table, that will normally provide the path of least resistance for the electricity to follow.

      A lot of houses don't have a good ground connection. Most building codes (and the NEC) allow 25 ohms resistance on the ground connection. But it's hard to measure, so the building inspectors don't measure it. In order to measure it, you have to install an additional 8-foot spike ten feet away from the ground connection you want to measure.

      Plumbing systems used to be metal pipe, so a connection to plumbing was an adequate ground. But, now, most new plumbing is plastic, an insulator. A few years ago they tore up the streets in my neighborhood to install new water pipes (plastic of course). After they did that, the only ground on my house was the short length of metal pipe that ran from the house to the meter. And that pipe was less than 12 inches deep, in dry sandy soil.

      The easy way to be sure that you have a good ground is to install two new 8-foot spikes, at least 10 feet apart (from each other, and from any existing ground spike). Measure the ohms between them to be sure you have less than 25 ohms. Then bond BOTH of them to the existing ground at your service-entrance panel using bronze clamps and 6-gauge or larger copper wire. Costs less than $100, and can be done in just an hour or two.

  27. Re:Don't forget the simple case... by Richard+Steiner · · Score: 2, Informative

    Real text editors will recover gracefully from such situations. :-)

    (I'm think along the lines of @UEDIT on OS2200 which saves its entire virtual memory state to disk periodically and can recover it with ease at the next startup, or the old EDT editor on VMS which saved the commands one entered and could replay them when a recovery was specified).

    I'm surprised more text editors don't have a similar feature. I think vim does, tho...?

    --
    Mainframe/UNIX Bit Twiddler and long time Windows/Linux Hobbyist.
    The Theorem Theorem: If If, Then Then.
  28. He forgot UPS-triggered shutdown by SleptThroughClass · · Score: 5, Insightful
    The author did not mention having the system set up to have the UPS trigger an automatic shutdown.

    If you're not at the machine, or don't know how to shutdown without a CRT, the disk can get messed up when the UPS runs out of power. Unless you only have a desktop machine with no network applications writing to disk (no BitTorrent); then you might be OK if you just walk away from your keyboard and let the system become quiescent before it loses power.

  29. I dont get it by bizitch · · Score: 2, Insightful

    1) You build a RAID5 array
    2) You backup
    3) You test your backups
    4) You plug your server DIRECTLY INTO THE WALL?!?!

    Ummm DUH! Of course you need a UPS - what kind of yutz does 1-3 and then powers the server off of unconditioned wall power?

    --
    ---- "Logoff! That cookie shit makes me nervous!" - A. Soprano
  30. Mmmm! Puppies!!! by Gription · · Score: 4, Interesting

    Less filling but tastes great!


    Ok back on subject
    A UPS isn't even a panacea... I had a server lose 3 out of 4 HDs in a 4 hour period. (The 3rd drive went at 4:57 PM Thursday Dec 11th 1997. Not that I would remember...) When I looked at the service history on it it had been losing drives for 8 months at an accelerating rate.

    Turns out that the 3000va rack mount wonder UPS from that big, well known vendor was the problem. The switching unit in it was sending spikes into the equipment.

    They wouldn't warranty it so I ended up putting a Triplite ISObar surge suppressor between it and the server in our test environment and it was in service for years after that.

    Never trust any piece of equipment...

    1. Re:Mmmm! Puppies!!! by maglor_83 · · Score: 4, Funny

      They wouldn't warranty it so I ended up putting a Triplite ISObar surge suppressor between it and the server in our test environment and it was in service for years after that.

      Never trust any piece of equipment...

      You mean like a Triplite ISObar surge suppressor?

    2. Re:Mmmm! Puppies!!! by Gription · · Score: 2, Interesting

      I have had hundreds of APC UPSes that never had a problem. The one that ate my server just happened to be the one for the core database running a mail order company... 14 days before Christmas. At that point we were doing $90k a day.

      The reason I remember the exact minute it failed was I had my bag in hand and was walking toward the door when the server alarm went off.

      18 hours later I found out from the backup software vendor that there was a bug in the software that meant it wouldn't restore any rights information so the server configuration was totally lost.

      The backup that saved us was a DOS batch file that copied everything down to a PC. 43 hours later I was able to actually go home.

      After the blowup they finally approved the request for a secondary server.

  31. What I did by kilodelta · · Score: 2

    I was heavily involved in the planning for moving our I.T. infrastructure to a different place.

    It went from what was essentially a closet in a basement with a single AC unit and individual UPS's on each server.

    So I decided redundancy was key. We had redundant AC, but the best part was power.

    All servers (70 of them at last reckoning) are attached to an APC Symmetra that nominally gives 40 minutes of battery power. The Symmetra in turn is backed up by a 125kW natural-gas fired generator that spools up within 10 seconds.

    It was decided we could suffer a brief AC outage so that was simply attached to the generator. There were two 2 ton AC units in place.

    Even had the foresight to extend a tendril out to the MDF in the building so that our telecom and ISP could plug their UPS into the generator circuit.

    And what was the fly in the ointment? Our DNS services were provided by an outside entity. So one day we had a power failure that hit a very large swath of the city and included us and the entity that provided DNS services.

    So while everything in our shop was running, nobody from the outside could see our public services, and nobody inside could get out.

    We actually got hold of the DNS zone and had our own after that.

  32. Ah, that's easy by jcochran · · Score: 5, Funny

    All you need to do is have the grid power feed some high wattage light bulbs. And near the light bulbs is some solar cells. The output from the solar cells is used to charge batteries which feed an inverter that actually powers the computer. Of course there is some power loss in the conversion process, and you need to have some (ok, a lot), of the input power to the system commited towards running a cooling unit to keep things at a reasonable temperature. But the resulting device provides clean power with no possibility of any surges getting thru to the protected equipment.

    Of course, if you go to this level of trouble for your power source, then I'd also suggest opto-isolating all signal lines to and from the server. And enclose the server in a well grounded faraday cage. And it wouldn't be a bad idea to have a dedicated comm link to a duplicate server located else where. Preferably on a different tectonic plate.

    1. Re:Ah, that's easy by rcw-work · · Score: 3, Interesting

      All you need to do is have the grid power feed some high wattage light bulbs. And near the light bulbs is some solar cells.

      You now have a 1% efficient power supply.

      A slightly more practical option (with better isolation than a standard electromagnetic transformer, but unfortunately also some inductive effects) would be to couple two motors with an insulative shaft.

  33. Wrong, I think. by Spazmania · · Score: 2, Informative

    The hard drives and DMA controller however, will run a bit longer; so if data is being written to disk, the DMA controller will keep reading data from memory, but it has no idea that this data is corrupted.

    Pretty sure that's wrong. It used to be (20 years ago) that hard drives losing power in this way had a chance of the heads crashing against the platters (the fabled "hard drive crash"). To solve this, modern drives are very sensitive to the power input. As soon as power fails the drives extract power from the spinning platters to move the heads over to the parked position. Regardless of what the DMA controller thinks it should be doing, the hard drive is busy parking the heads.

    --
    Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
  34. Re:Losing power while writing to a HD == lost HD by pslam · · Score: 2, Informative

    I don't think this has been true since... maybe 8-10 years now? Definitely since MR drives came on the market (ages ago).

    Modern drives have:

    • A capacitor that stores enough charge to "emergency park" the heads.
    • Low voltage detection that kicks in, disables the head, and dumps the capacitor into the seek coil.

    It does NOT go writing crap all over whatever's between your data and the parked position, unless the drive is a defective design. The emergency park is a fairly brutal affair, and you'll typically see the datasheet list a maximum number that's notably lower than the max power cycles.

    It's also essential these days because:

    • The head should (of course) never touch the platter.
    • The drive can't actually spin up if the head is resting on the platter.
    • So the drive is designed with the assumption the head NEVER touches the platter in its lifetime.

    Normally that holds true. I've seen some drives (1.0" and 1.8" miniature ones) which suffered from head-on-platter but that was due to misdesign in the power supply feeding it (e.g voltage rails going slightly negative, draining the cap early).

    But anyway, the worst you'll get with the power going out is a partially written sector, which will then be marked bad, probably permanently. Or maybe a bunch of sectors. Or maybe bad in a different order to what the OS sent due due to caching.

    If you had a drive and/or RAID fail due to power outage, you should get a refund. You might lose a tiny amount of data, not the whole lot.