Slashdot Mirror


Why Power Failures Can Always Lead To Data Loss

bigsmoke writes "So, all your servers run on RAID. You back up religiously. You're even sure that your backups are recoverable. But do you also need a UPS? According to Halfgaar (on Slashdot before to promote better Linux backup practices), yes, usually you do. He argues that despite technological advancements such as file system journaling, power failures can still cause data loss in most setups."

456 comments

  1. Well no shit, Sherlock by Skyshadow · · Score: 5, Insightful

    Power losses can cause data loss? Gee, you mean that my system that relies on electricity for everything it does can be adversely effected by power outages even if I take precautions? That's some good admin work there, Lou -- if only there was some sort of law that covered the tendency of things that can go wrong to go wrong...

    Next week: Fires can make things warm, floods can make things wet.

    --
    Every year during my review, I just pray the words "slashdot.org" aren't mentioned.
    1. Re:Well no shit, Sherlock by Anonymous Coward · · Score: 5, Funny

      I don't know about you, but my servers run on the power of cotton candy and happy thoughts.

    2. Re:Well no shit, Sherlock by dreamchaser · · Score: 1

      Yes, where is the 'Duh' tag when you really need it? Or maybe slownewsday...

    3. Re:Well no shit, Sherlock by Skyshadow · · Score: 5, Funny

      I don't know about you, but my servers run on the power of cotton candy and happy thoughts.

      As a former sysadmin, I would think that any machine reliant on 'happy thoughts' would be the most crash-prone system in the history of computing.

      --
      Every year during my review, I just pray the words "slashdot.org" aren't mentioned.
    4. Re:Well no shit, Sherlock by Anonymous Coward · · Score: 5, Informative

      Ok, people who don't just read the executive summary knew this all along, but perhaps it's necessary that someone spells it out for the rest: Journaling and RAID do not prevent data loss in case of a power outage (and many more circumstances). If you know why, just skip the article. If you're wondering how you can lose data if you write everything to two disks and your filesystem guarantees its own consistency, then perhaps this is the wake up call that you need.

    5. Re:Well no shit, Sherlock by Midnight+Thunder · · Score: 2, Funny

      if only there was some sort of law that covered the tendency of things that can go wrong to go wrong.

      I hear Murphy might have one :)

      --
      Jumpstart the tartan drive.
    6. Re:Well no shit, Sherlock by Timothy+Brownawell · · Score: 5, Funny
      No, it really does have some interesting observations, with some very scary implications:

      One of the first things that will happen, is that the memory DIMMs will no longer be refreshed properly (DRAM needs to be refreshed constantly otherwise it will loose it's data) and very rapidly, the memory will contain only garbage. The hard drives and DMA controller however, will run a bit longer; so if data is being written to disk, the DMA controller will keep reading data from memory, but it has no idea that this data is corrupted.

      However, we've recently seen that RAM holds state well enough to preserve crypto keys thru a power cycle. This has very scary implications: the RAM knows what's happening, and behaves differently (loses data immediately on power-off or remembers it for several seconds) in order to cause the most difficulty for the owner of the machine.

      Not only are computer components intelligent and self-aware, they're also out to get us!

    7. Re:Well no shit, Sherlock by NFN_NLN · · Score: 4, Funny

      My servers run on Electricity but the RAID controller has battery backed up RAM so any cached data will persist a power failure and the disks are in writethrough mode.

      I like this setup, but please. Tell me more about this cotton candy technology? Is it superior.

    8. Re:Well no shit, Sherlock by MightyMartian · · Score: 3, Insightful

      My servers run on Electricity but the RAID controller has battery backed up RAM so any cached data will persist a power failure and the disks are in writethrough mode.

      That is until the 10,000 volt spike when the power company improperly brings the grid back up bakes the RAM, the battery, RAID controller and the hard drives.

      --
      The world's burning. Moped Jesus spotted on I50. Details at 11.
    9. Re:Well no shit, Sherlock by ArsonSmith · · Score: 2, Funny

      We just need to get that guy that declared Pluto is no longer a planet to declare that electricity no longer causes data loss.

      Side note: He also declared that north is no longer a direction, blue is no longer a color, and your sister is no longer a virgin.

      --
      Paying taxes to buy civilization is like paying a hooker to buy love.
    10. Re:Well no shit, Sherlock by linuxpyro · · Score: 1

      My servers run on Electricity

      My servers run on Love.

      --
      Saying "I'll probably get modded down for this" in a post is the best way to get it modded up.
    11. Re:Well no shit, Sherlock by Anonymous Coward · · Score: 5, Funny

      I can offer you a Happy Thought UPS. It's a box of puppies. Be careful though, it only has 500 puppy Amps of capacity.

    12. Re:Well no shit, Sherlock by Anonymous Coward · · Score: 5, Funny

      Your mom loves you and pays for the electricity. That doesn't mean that your servers run on love.

    13. Re:Well no shit, Sherlock by sootman · · Score: 1

      If ever there was an article that deserved to be tagged 'duh,' this is it. And even so, it even managed to skip over two key points--even if you could perfectly restore a system and not lose a byte of data, unexpectedly cold-rebooting a server 1) is downtime and 2) restoring is a pain in the ass. Sometimes less painful, sometimes more painful, but a pain nonetheless. UPSs are very cheap insurance.

      --
      Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
    14. Re:Well no shit, Sherlock by yukk · · Score: 1

      Well known fact. The Reg has been carefully tracking this phenomenon for quite a while now.

      --
      The trouble with the rat race is that even if you win, you're still a rat." Lily Tomlin
    15. Re:Well no shit, Sherlock by ArsonSmith · · Score: 4, Funny

      Except the server that runs http://youporn.com/

      --
      Paying taxes to buy civilization is like paying a hooker to buy love.
    16. Re:Well no shit, Sherlock by dkeisling · · Score: 1

      I've shipped my data with UPS many times and they've never lost anything.

    17. Re:Well no shit, Sherlock by alta · · Score: 1

      I, once again, ask for the ability for us to be able to Mod stories in addition to comments.

      Any story that reaches -1 (as this would have nearly instantly) will come off the front page.

      --
      Do not meddle in the affairs of sysadmins, for they are subtle, and quick to anger.
    18. Re:Well no shit, Sherlock by frank_adrian314159 · · Score: 1

      Tell me more about this cotton candy technology? Is it superior.

      Yes it is... quite superior. However, you also need teams of ponies and a rainbow to power the cotton candy machine. Providing these might cause a slight delay in implementation. Well worth it, though...

      --
      That is all.
    19. Re:Well no shit, Sherlock by neurovish · · Score: 1

      My servers run on Electricity but the RAID controller has battery backed up RAM so any cached data will persist a power failure and the disks are in writethrough mode.

      I like this setup, but please. Tell me more about this cotton candy technology? Is it superior.

      Only before 11am. Once the end of the day is starting to roll around, the happy thoughts tend to take up the slack though.

    20. Re:Well no shit, Sherlock by NatasRevol · · Score: 0, Redundant

      What about http://youboob.com/ !!

      --
      There are two types of people in the world: Those who crave closure
    21. Re:Well no shit, Sherlock by clone53421 · · Score: 1

      Does it have automatic garbage cleanup? I imagine those puppies cause a lot of data leakage.

      --
      Alexander Peter Kristopeit bought his basement from his mommy for one dollar.
    22. Re:Well no shit, Sherlock by Intron · · Score: 1

      "the RAID controller has battery backed up RAM"

      And does it also have telepathy to capture the data in memory when power goes down? You won't lose a journaling filesystem, but you can lose the data in any file open for write or update.

      --
      Intron: the portion of DNA which expresses nothing useful.
    23. Re:Well no shit, Sherlock by BigGar' · · Score: 1

      Hence the mandatory Exstacy dose for everyone upon arrival at his workplace.

      --


      Shop smart, Shop S-Mart.
    24. Re:Well no shit, Sherlock by Fred_A · · Score: 1

      Power losses can cause data loss? Gee, you mean that my system that relies on electricity for everything it does can be adversely effected by power outages even if I take precautions?

      Bah, I'll just tell Igor to go whip the children who pedal on the generator downstairs more often. Big deal.

      And it's all green !

      --

      May contain traces of nut.
      Made from the freshest electrons.
    25. Re:Well no shit, Sherlock by clone53421 · · Score: 1

      I don't have a sister, you insensitive clod!

      --
      Alexander Peter Kristopeit bought his basement from his mommy for one dollar.
    26. Re:Well no shit, Sherlock by AJWM · · Score: 1

      My servers run on Electricity

      My servers run on Love.

      Hah, mine run on Steam. Now excuse me while I check the oil and replace some worn gears.

      --
      -- Alastair
    27. Re:Well no shit, Sherlock by Anonymous Coward · · Score: 0

      My server runs on Hookers and Blackjack, oh, and forget the blackjack...

    28. Re:Well no shit, Sherlock by The+Clockwork+Troll · · Score: 0

      Opto-isolation FTW

      --

      There are no karma whores, only moderation johns
    29. Re:Well no shit, Sherlock by fbjon · · Score: 3, Funny

      You do now.

      --
      True confidence comes not from realising you are as good as your peers, but that your peers are as bad as you are.
    30. Re:Well no shit, Sherlock by Xcruciate · · Score: 1

      My servers are run by leprechauns riding stationary bikes attached to generators and the server processors are liquid cooled with unicorn piss.

      --
      It's like "looking busy" at your employment - it's actually easier to do real work than to fake it. - bmo
    31. Re:Well no shit, Sherlock by juiceboxfan · · Score: 1

      No, it really does have some interesting observations, with some very scary implications:

      One of the first things that will happen, is that the memory DIMMs will no longer be refreshed properly (DRAM needs to be refreshed constantly otherwise it will loose it's data) and very rapidly, the memory will contain only garbage. The hard drives and DMA controller however, will run a bit longer; so if data is being written to disk, the DMA controller will keep reading data from memory, but it has no idea that this data is corrupted.

      What's really scary is that these hardware "facts" that are being used as a proof are based on an email from some random web site.

      All systems I have used in the past 5-10 years, even the cheap ones, had built in monitoring (lm_sensors). It's hard to believe that these voltage monitoring chips are not driving either an interrupt or the main reset line on the motherboard when one of the voltages goes below a certain threshold.

    32. Re:Well no shit, Sherlock by strelitsa · · Score: 1

      Can BOFHs even legally have puppies? (Except of course as the main course at dinner)?

      --
      No mod points, no meta-moderating/Firehose/all the other free work Slashdot wants me to do.
    33. Re:Well no shit, Sherlock by QuantumRiff · · Score: 1

      My servers run on Love.
      Ahh, so nice to see a Gentoo user!

      --

      What are we going to do tonight Brain?
    34. Re:Well no shit, Sherlock by Amouth · · Score: 1

      this seems to me to be a basic duh.. i was stopped and had to think.

      when was the last time i saw a place that had raid and good backups but NOT a ups?? never..

      normaly they will stick a UPS on it and then say screw it with raid or backups or both..

      a UPS is the first thing you do.. before you even bother for RAID and backups...

      --
      '...if only "Jumping to a Conclusion" was an event in the Olympics.'
    35. Re:Well no shit, Sherlock by RobertM1968 · · Score: 0

      Most real servers (not the off-the-rack, PC in a box called a server stuff) can easily, on their own (with no UPS), handle a 10,000 volt spike (of course, most still require a UPS to handle power outages greater than a few seconds...).

      Or perhaps I am too used to using IBM servers instead of some fast PC workstation being called a server because it has some server-like components in it (like SCSI).

    36. Re:Well no shit, Sherlock by RobertM1968 · · Score: 0, Redundant

      My servers run on Electricity

      My servers run on Love.

      LoL! Remind me to never touch your servers... You can "love" them all you want ;-)

    37. Re:Well no shit, Sherlock by RobertM1968 · · Score: 2, Insightful

      Ok, people who don't just read the executive summary knew this all along, but perhaps it's necessary that someone spells it out for the rest: Journaling and RAID do not prevent data loss in case of a power outage (and many more circumstances). If you know why, just skip the article. If you're wondering how you can lose data if you write everything to two disks and your filesystem guarantees its own consistency, then perhaps this is the wake up call that you need.

      Any Server Admin who didnt realize that isnt really a server admin. And the rest of the world probably doesnt care or need to know.

      Just a thought... ;-)

    38. Re:Well no shit, Sherlock by SlipperHat · · Score: 1

      There's always enough cotton candy to burn, but who's your supplier of happy thoughts. I've had to make do with broken dreams for the past couple of months. Sure, it burns less cleanly, but the prices are rock-bottom!

    39. Re:Well no shit, Sherlock by Evro · · Score: 3, Interesting

      That's why any datacenter worth putting your servers in pipes its power through a flywheel or some other electricity "cleaner". A 1-ton lead ball spinning at 10,000 RPMs isn't going to speed up that much on a spike like that.

      --
      rooooar
    40. Re:Well no shit, Sherlock by cbreaker · · Score: 1

      I guess this guys' general idea is that it doesn't matter WHAT the controller is - if the system RAM is read at the split-microsecond before turning off completely, there's a chance that it will take garbage from that RAM read, and commit it to the SCSI controller.

      While yes - you can get data corruption from a power failure - it's becoming increasingly more rare. There's a lot of different levels of protection:

      - UPS systems
      - Battery backed RAID RAM
      - Journaling filesystems
      - Transactional/Journaling databases
      - Backups
      - Sometimes: Real-time backups

      It's been a looooooong time since I've had to repair an Exchange database or a MySQL database because of a power failure.

      I wouldn't worry about it too much, honestly. Much more important things to worry about.

      --
      - It's not the Macs I hate. It's Digg users. -
    41. Re:Well no shit, Sherlock by Anonymous Coward · · Score: 0
    42. Re:Well no shit, Sherlock by Anonymous Coward · · Score: 1, Insightful

      Why use a UPS when your servers are clearly powered by an endless supply of smug?

      Seriously, get over yourself. Commodity servers are powering the internet as we know it, and your IBM dinosaurs are filling landfills.

    43. Re:Well no shit, Sherlock by Darkk · · Score: 2, Insightful

      I lost an entire RAID 5 disk array due to bad ram. It was running Windows 2003 64bit server and one day I turned the screen on and noticed some artifacts and a completely locked up machine. To be sure it wasn't some freeze up in the GUI I tried accessing the shares which didn't respond.

      So I was like ok great..time to do a hard shut down and reboot. Well, when it came back up I noticed my RAID array is no longer showing up in the shares or in disk manager. I was like..aww crap. I tried to rebuild the array via the built-in tools of the raid controller and it didn't work. Somehow it totally fuber the disk array tables to the point everything on my 5 320gig disks are trashed. Good thing the OS runs on a separate non-raid hard disk right off the motherboard's disk controller.

      Nothing wrong with the raid controller and the drives. Just at the point of writing stuff to the drives RAM had to take a dump and totally froze the server.

      Least to say I swapped out the ram modules with known good ones and never had a problem since. Lucky I regularly make backups of my critical stuff to another set of hard drives elsewhere.

      I follow this moral code as my second religion, "Don't put all your eggs in one basket!"

    44. Re:Well no shit, Sherlock by RobertM1968 · · Score: 0

      No smugness intended. Nor are my "IBM dinosaurs" or others like them filling landfills.

      I should perhaps have expanded that post. Servers, like UPS's are often purchased at the whim of some "IT Manager"

      Heck, I sell "servers" of the same sort (but choose to run better ones), as most people wont or cant spend the money on server class hardware.

      My point was, a server built from true server class hardware, is usually designed to handle a 10KV spike all by itself.

      Maybe you are a smug person in real life... but dont push your normal intents on me, simply because I could have worded my post a little better.

    45. Re:Well no shit, Sherlock by LWATCDR · · Score: 1

      What I like is.
      "Can always"
      Sort of like jumbo shirmp.

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    46. Re:Well no shit, Sherlock by supersat · · Score: 5, Informative

      Are you sure your disks are in write-through mode? Have you checked? Brad Fitzpatrick (of LiveJournal, memcache, OpenID, etc. fame) discovered that many disks lie about being in write-through mode, and wrote a utility to check it.

    47. Re:Well no shit, Sherlock by Darkk · · Score: 1

      I've shipped my data with UPS many times and they've never lost anything.

      Yet.....

    48. Re:Well no shit, Sherlock by Vanders · · Score: 1

      the RAID controller has battery backed up RAM

      What could possibly go wrong with that?

    49. Re:Well no shit, Sherlock by deraj123 · · Score: 3, Insightful

      I'm curious...how can one opto-isolate server components from the power source?

    50. Re:Well no shit, Sherlock by BronsCon · · Score: 1

      And I can personally verify the claim regarding her virginity.

      --
      APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
    51. Re:Well no shit, Sherlock by Anonymous Coward · · Score: 0

      If you are wondering how you can lose data if you lose power then you shouldn't have keys to the server room - hell, you shouldn't be allowed to know what floor it is on.

    52. Re:Well no shit, Sherlock by bhalter80 · · Score: 1

      That's great that you have that cache memory but it ends up being entirely unused for writes as you run in writethrough mode. Writes are the slow ops reads are significantly faster so by running in write through mode you're unnceessarily compromising the performance of your array. Most RAID controllers with battery backup are good for 3+ days of backup if you're so concerned about being power free for longer than that I suggest you need a generator my friend.

    53. Re:Well no shit, Sherlock by AK+Marc · · Score: 1

      Haven't you heard all the people talk about RAID on here? I've seen multiple people say things to the effect that you don't need backups if you have RAID because RAID is a backup. Of course, I point out that corrupted data is quite common and something that RAID will faithfully "backup" for you, but that doesn't mean it is useful. Things like this need to be said because of the people that claim RAID is useful for anything other than uptime. That's all it was created for, and that's all it should be used for (well, if you really want to set RAIDs for performance improvements, I've heard of people doing that too, but that's like selling a $10,000 motorcycle that does 0-60 in 4 seconds for a $500,000 supercar that does 0-60 in 3.99 seconds).

    54. Re:Well no shit, Sherlock by clone53421 · · Score: 1

      ...or "guaranteed savings of up to 40%"

      --
      Alexander Peter Kristopeit bought his basement from his mommy for one dollar.
    55. Re:Well no shit, Sherlock by mweather · · Score: 4, Funny

      I tried one of those. You gotta keep adding food to it or it stops working after a week or two. Starts stinking, too.

    56. Re:Well no shit, Sherlock by Anonymous Coward · · Score: 0

      I am your sister, you insensitive clod!

    57. Re:Well no shit, Sherlock by Anonymous Coward · · Score: 0

      OH SHIT! I hope it doesn't flood.

    58. Re:Well no shit, Sherlock by moderatorrater · · Score: 1

      I hope it can also use self-loathing, otherwise there's a large segment of the youporn population that's not contributing.

    59. Re:Well no shit, Sherlock by Phroggy · · Score: 2, Insightful

      Any Server Admin who didnt realize that isnt really a server admin. And the rest of the world probably doesnt care or need to know.

      Just a thought... ;-)

      The fact that they're not really server admins doesn't stop them from running servers, though!

      --
      $x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
      $x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
    60. Re:Well no shit, Sherlock by myowntrueself · · Score: 1

      As a former sysadmin, I would think that any machine reliant on 'happy thoughts' would be the most crash-prone system in the history of computing.

      I can verify this.

      At one time I had two computers, a Linux one and one running Windows 98.

      The Linux one I named "Linus"

      The Windows one I named "Bill".

      It wasn't until I changed its name that the Windows 98 machine started being reliable.

      --
      In the free world the media isn't government run; the government is media run.
    61. Re:Well no shit, Sherlock by Bryansix · · Score: 1

      Link? What kind of IBM servers? I like IBM but I've looked at their regular 1U servers and they don't seem much different from something I can buy from SuperMicro for way less.

    62. Re:Well no shit, Sherlock by RobertM1968 · · Score: 1

      Link? What kind of IBM servers? I like IBM but I've looked at their regular 1U servers and they don't seem much different from something I can buy from SuperMicro for way less.

      Any Netfinity or x Series that is true server class (ie: not their Netfinity/Intellistation duplicate labelled - or their x Series/Intellistation duplicate labelled).

      To find the appropriate info, you need to check out each's Hardware Maintenance Manual or User Guide or Technical Specifications Manual. As a for instance, a Netfinity 7000 M10 is designed/rated to handle 10KV. The power supplies seem to do some level of filtering as well, and store 10-15 seconds of charge to do that. Of course, they also do auto switching in the event of loss of power on one PS (or in case of PS failure).

      I think the key in your statement is "I like IBM but I've looked at their regular 1U servers..."

      Like everyone else's offerings, they too sell low end server solutions that are not necessarily "server class" hardware. Their larger ones are highly redundant (and VERY expensive as well - I prefer picking up a 2 year old model that's been refurbished and recertified... but that's not an option for many of our clients who want "new yet cheap, cant afford redundancy").

      Besides, a good backup solution makes such protection redundant.

      But, if you or I or whatever server admin had their choice, I am pretty sure we would all choose (1) a highly redundant, internally over protected server and (2) a very good backup scenario including high end filtered, sine wave corrected, full conversion, always run from battery UPS and a generator option that put out clean stable power as it's backup.

      But we can hope... :-)

    63. Re:Well no shit, Sherlock by Kent+Recal · · Score: 1

      Probably not for many database servers because bypassing the write-cache is *the* surefire way to kill performance. It's usually much cheaper to rely on UPS, battery backed controllers and a good backup strategy than to run twice as many servers in write-through mode. Furthermore you need the (offsite) backup-strategy anyways for the usual "nuke over datacenter" scenarios...

    64. Re:Well no shit, Sherlock by Anonymous Coward · · Score: 0

      So true. Don't worry so much about the voltage, but think carefully about the amps! Those puppies will kill you!

    65. Re:Well no shit, Sherlock by Chris+Burke · · Score: 1

      Side note: He also declared that north is no longer a direction, blue is no longer a color, and your sister is no longer a virgin.

      Well one out of three ain't bad.

      --

      The enemies of Democracy are
    66. Re:Well no shit, Sherlock by Frank+T.+Lofaro+Jr. · · Score: 2, Informative

      Actually when power drops the "power good" line from the power supply goes low, which causes a system reset and locks everything up.

      This is also how the computer knows how long to keep the reset line engaged on startup, it stays asserted until the power supply says the power is good, and everything has proper voltage.

      --
      Just because it CAN be done, doesn't mean it should!
    67. Re:Well no shit, Sherlock by russotto · · Score: 1

      However, we've recently seen that RAM holds state well enough to preserve crypto keys thru a power cycle. This has very scary implications: the RAM knows what's happening, and behaves differently (loses data immediately on power-off or remembers it for several seconds) in order to cause the most difficulty for the owner of the machine.

      How long have you been using computers? Anyone who has debugged a few nontrivial problems already knows that the computer will behave in order to cause the most difficulty for the user -- gremlins are the usual explanation.

      Anyway, I find the "DMA controller writing bad RAM to disk" scenario somewhat unlikely. The drive must have protection against writing while the disk isn't spinning at the right speed (or every "yank the plug while writing" event would result in significant corruption), and the disk is going to start spinning down on power loss long before the DRAM starts losing its memory. Which isn't to say there couldn't be other problems.

    68. Re:Well no shit, Sherlock by somersault · · Score: 1

      Power losses can cause data loss

      Gah! I knew I was overlooking something! Well, at least I have the Backup Tape, and tape deck to play HHGTTG music to remind everyone not to panic in a disaster scenario..

      --
      which is totally what she said
    69. Re:Well no shit, Sherlock by somersault · · Score: 1

      The mystery of the frosty piss is finally explained! Thankyou. You should be more careful of it leaking around slashdot summaries in future.

      --
      which is totally what she said
    70. Re:Well no shit, Sherlock by Bryansix · · Score: 1

      Well quite frankly I'd like to co-locate all my servers in a data center and have them worry about power and fire protection and I worry about running the servers. The Internet just isn't fast enough yet though to do that when you need to write large amounts of data to the servers.

    71. Re:Well no shit, Sherlock by supersat · · Score: 1

      As they've discovered at least a few times, UPSes can be instantly killed with an Emergency Power Off switch (aka "The Big Red Button"). Their particular problem was that the RAID controller did all of the write caching, and was backed up by a seperate battery. Since the RAID controller did the write caching, it made little sense for the disks to do write caching as well, and once the drives told the RAID controller the data was committed, the controller no longer kept the data in its RAM. Unfortunately, the drives lied about committing the data, and they were down for half a day trying to repair the corruption.

    72. Re:Well no shit, Sherlock by rubycodez · · Score: 1

      mine runs on Hatred, pure and deep.

    73. Re:Well no shit, Sherlock by MikeBabcock · · Score: 1

      I'm still confused about how this made Slashdot at all. Who here doesn't know they could lose data in a power failure? Okay, hopefully those with their hands up don't run data centers.

      Moving along, UPSs are cheap, especially for home usage. You can buy a decent UPS with excellent surge protection for a home PC for under $100.

      --
      - Michael T. Babcock (Yes, I blog)
    74. Re:Well no shit, Sherlock by Anonymous Coward · · Score: 0

      Actually when power drops the "power good" line from the power supply goes low, which causes a system reset and locks everything up.

      Hey, yeah that's right. So how does this blogger rant rate the front page of /.?

      The whole idea is just wrong!

    75. Re:Well no shit, Sherlock by Anonymous Coward · · Score: 0

      I don't know about you, but my servers run on the power of cotton candy and happy thoughts.

      So you're running a Windows system?

    76. Re:Well no shit, Sherlock by Anonymous Coward · · Score: 0

      Any database system thats worth a damn actually flushes uncommitted buffers all the way through to the physical disk medium as transactions are committed. When the lights go out the worse that should ever happen is for any incomplete transactions to be rolled back.

      Yes you can loose data if you buffer writes, or act retarded and ignore flush() at any level to 'optimize' performance but this result would just be a reflection of your own stupidity.

      TFA fails to mention most dedicated RAID systems have internal battery backups to safely handle power failures.

    77. Re:Well no shit, Sherlock by Bu11etmagnet · · Score: 1

      Hah, mine run on Steam.

      But how do you keep the WCC out ?

      --
      Life is complex, with real and imaginary parts.
    78. Re:Well no shit, Sherlock by Anonymous Coward · · Score: 0

      His point is correct, but the explanation is bad. DRAM stores bits as charges, which slowly leak away and during refresh very little of it has got to stay there to restore your porn, so you'll get all of it back if you stop playing with the power switch. However you need more charge left there to get the data out of the chip, so your uber RAID controller overwrites everything with garbage it gets from memory. All your porn are belong to sdfrzzz...

    79. Re:Well no shit, Sherlock by The_reformant · · Score: 1

      It depends what you're looking at. If you're looking at a system of geographically dispersed machines then in a correctly designed app a power outage shouldnt lead to any data loss. A transactional model puts a strict precondition on participating resources asserting that they must be able to commit or roll back, this effectively means that everything is logged before it is committed meaning a power outage at any stage in the transaction is recoverable.

      --
      I have discovered a truly remarkable sig which this post is too small to contain.
    80. Re:Well no shit, Sherlock by FireFury03 · · Score: 1

      However, we've recently seen that RAM holds state well enough to preserve crypto keys thru a power cycle.

      Reading from DRAM is destructive - if you don't try to read then your data might be safe for a while, but if you can't refresh it after a read then the data will be trashed.

      I'm not entirely convinced by the article's assertion that the hard disk will still be writing data after the DRAM refresh stops (remember - it doesn't matter that you are still DMAing data to the hard drive if the drive is never going to write it), but I wouldn't want to test it on my data.

    81. Re:Well no shit, Sherlock by clone53421 · · Score: 1

      You must've been really drunk... because that was my brother (you insensitive clod!)...

      --
      Alexander Peter Kristopeit bought his basement from his mommy for one dollar.
    82. Re:Well no shit, Sherlock by BronsCon · · Score: 1

      Insensitive I may be, but I'll vouch for his sensitivity.

      --
      APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
    83. Re:Well no shit, Sherlock by clone53421 · · Score: 1

      It wasn't until I changed its name that the Windows 98 machine started being reliable.

      Was that also when you upgraded to SP2? ;)

      --
      Alexander Peter Kristopeit bought his basement from his mommy for one dollar.
    84. Re:Well no shit, Sherlock by Cerberus7 · · Score: 1

      This is now my sig. Would that you weren't A/C, I'd credit you, but now I can steal it for my own and nobody need know! Muahahahahaha!

      --
      I don't know about you, but my servers run on the power of cotton candy and happy thoughts. -Anonymous Coward
    85. Re:Well no shit, Sherlock by clone53421 · · Score: 1

      Not with that face, you're not!

      --
      Alexander Peter Kristopeit bought his basement from his mommy for one dollar.
    86. Re:Well no shit, Sherlock by Anonymous Coward · · Score: 0

      Off on a tangent... but Murphy's Law is originally slightly different. It doesn't say "Whatever can go wrong, will go wrong"; it says "If there is a possibility to do something the wrong way, somebody will do it". The difference in attitude there is subtle but significant. He didn't mean it to be pessimistic but constructive: encouragement to design systems that can only be used the right way -- asymmetric cable connectors and so on --leading to better methodologies and products.

  2. Not me! by Anonymous Coward · · Score: 0

    What if your data's on the cloud?

    First!

    1. Re:Not me! by sm62704 · · Score: 4, Funny

      If there's clouds in your server room, your server's probably been slashdotted and is on fire!

      --
      mcgrew's razor: Never attribute to stupidity that which can be explained by greedy self-interest
  3. Illiteracy by carou · · Score: 5, Funny

    From TFA:

    (DRAM needs to be refreshed constantly otherwise it will loose it's data)

    Fly, little data! Be free!

    1. Re:Illiteracy by Anonymous Coward · · Score: 0

      Thank you!

      To the author of TFA, it's "lose," dammit, not "loose"! Oh, and the previous sentence shows how to properly use "it's".

    2. Re:Illiteracy by Ngarrang · · Score: 2, Funny

      Get off my lawn, you little bits!

      --
      Bearded Dragon
    3. Re:Illiteracy by Anonymous Coward · · Score: 0

      Loose data... do they mean porn?

    4. Re:Illiteracy by Anonymous Coward · · Score: 1, Funny

      I don't know about you, but I don't want my precious data to be loose, gallivanting around with the wrong kind of crowd. I'm looking at you, float32.

    5. Re:Illiteracy by JediN8 · · Score: 1

      Information wants to be free yo!

    6. Re:Illiteracy by Anonymous Coward · · Score: 0

      The singular of 'data' is 'datum,' for future reference. When you're being extremely pedantic, don't forget to apply the same standards to yourself.

    7. Re:Illiteracy by carou · · Score: 1

      Exactly which of my words implied that anything was in the singular form?

  4. can always lead to data loss? by internerdj · · Score: 5, Funny

    Definitely maybe?

    1. Re:can always lead to data loss? by jpellino · · Score: 1

      No, it's "maybe definitely."
      Sloppy title either way.
      Perhaps a better headline might be "Pray to God but row towards shore."

      --
      "Win treats sysadmins better than users. Mac treats users better than sysadmins. Linux treats everyone like sysadmins."
    2. Re:can always lead to data loss? by Neodudeman · · Score: 1

      Almost always!

    3. Re:can always lead to data loss? by holden+caufield · · Score: 1

      It's not simply sloppy, it's unintelligible. Conditionally, this will always happen?

      Next slashdot headline: "Why breathing can always lead to Death"

      --
      I'll create an amusing sig when I have something meaningful to post.
    4. Re:can always lead to data loss? by Anonymous Coward · · Score: 0

      'definitely' or 'maybe' maybe?

    5. Re:can always lead to data loss? by Trogre · · Score: 1

      It does sound funny, but it's not as tautological as it seems. It's saying that when you have a power outage, there is /always/ a nonzero /risk/ of losing data.

      --
      "Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife
  5. UPS - more than just a backup. by Zebadias · · Score: 4, Informative
    UPS smooths out all those nasty spikes as well as stopping your servers from going down to a 1 second power cut.

    UPS is more than just saving your data.

    1. Re:UPS - more than just a backup. by linuxpyro · · Score: 4, Informative

      It's also important to get a decent UPS too, if you're using it for something like a server. I think the cheapy ones basically just use a transfer relay, where as the higher end ones actually run the hardware off of the battery via the inverter all the time. While I would think that with the former (called "standby" UPSs maybe?) the transfer time wouldn't be enough to cause too many problems, you still don't have the buffer that you'd get with a true uninterruptible power supply.

      I think a lot of the cheaper ones don't put out a true sine wave either, though for their intended purpose of letting you shutdown your desktop cleanly again they're probably fine.

      --
      Saying "I'll probably get modded down for this" in a post is the best way to get it modded up.
    2. Re:UPS - more than just a backup. by Anonymous Coward · · Score: 2, Informative

      >UPS smooths out all those nasty spikes as well as stopping your servers from going down to a 1 second power cut.

      A true UPS smooths out the spikes. Most of today's UPSes (at least consumer models) are off-line supplies. The batteries don't kick in unless the power is out. Worse than that, the cheap ones don't output sine waves, they output square waves. These UPSes also take some time to switch to batteries, leaving your computer without power for that time.

      Now, some of those UPSes have filtering technology like you find in expensive powerbars, sure. But it isn't the same as an always-on UPS at all.

    3. Re:UPS - more than just a backup. by Sandbags · · Score: 1

      Exactly. Even more important than simple power backup is AVR. The microfractures that can be created in system chips by voltage as little as +/- 10 volts, over time, will cause systems to malfunction.

      I can't find the article, but at one point as a reseller for APC (maybe it was Leibert), their marketing material used to state that 95% of all system component failure was due to voltage irregularity, and that properly filtered line voltage could extend the life of electronics 3 fold.

      You should not only have your systems and servers on UPS, but at this point, all your Home theater equipment too, TVs and all. Battery backup is not so important for these other devices, but AVR and line conditioning is.

      He's right about drives continuing to write bad data. Sure, you can restore from backup, but you still loose what was created since that point. Critical databases can be replicated and mirrored in real time, and come close to preventing 100% of data loss, but not so for most other system uses, and realtime record level syncing is out of the budget for most companies.

      Beyond UPS, not a bad idea to have an on-site generator too...

      --
      There is no contest in life for which the unprepared have the advantage.
    4. Re:UPS - more than just a backup. by SuperQ · · Score: 4, Informative

      Yup the 3 major types of battery UPSs I know of:

      Offline - Relay or simple failover. (APC Backups)

      Line Interactive - Can correct line over/under voltage to a point (APC Smartups)

      Online - Full AC -> DC -> AC conversion. (APC Symetra, Liebert, anything that doesn't suck)

      Basically outside of home use you want an online type UPS.

      There are other systems like motor/generator flywheel types, but they need a very fast backup generator to sustain anything more than 30 seconds of outage. But they're great for smoothing out some types of line issues.

    5. Re:UPS - more than just a backup. by rcw-work · · Score: 1

      Worse than that, the cheap ones don't output sine waves, they output square waves.

      filtering technology like you find in expensive powerbars

      If you get these pieces separately, do not plug them into each other! The capacitors/inductors on the filter will have to eat about one-third of the UPS's output power, and chances are they don't have heat sinks capable of it. One third of a 1500W UPS is 500W, which is the same thing as one kitchen stove burner on medium-high.

    6. Re:UPS - more than just a backup. by ProfessionalCookie · · Score: 1
      I've always thought servers and even desktops should have at least 10 seconds of buffered power along with a decent voltage regulator. At the very least it should be a common option.

      I thought it was notable that power flickered off for about a half second last spring and everything in the house reset except my iMac (White, C2D) which even blink. Anuone else have a similar experience?

    7. Re:UPS - more than just a backup. by aaarrrgggh · · Score: 1

      Great points, but the best solution is a mix of technologies-- my preference is a DC flywheel in parallel with batteries on a double-conversion UPS.

  6. Duh! by mlwmohawk · · Score: 4, Insightful

    I remember a discussion on the PostgreSQL hacker's list about recoverability and transaction logs.

    You can't make a system that will not lose data, you can only make a system that knows the last save point of 100% integrity.

    There are too many variables and too much randomness on a cold hard power failure. You absolutely need a UPS that gives you time to shut down cleanly.

    1. Re:Duh! by sm62704 · · Score: 3, Insightful

      You're still hosed if your server's power supply goes titsup. Or if your hard drive crashes. Or if the building burns down.

      Gotta love these slashvertisements, I wonder whose UPSes they're pimping? Its not like we don't all know you need a UPS. What's next, a FA about how you need fire insurance?

      --
      mcgrew's razor: Never attribute to stupidity that which can be explained by greedy self-interest
    2. Re:Duh! by GXTi · · Score: 1

      The goal of transactions is to make the window of data loss on the database's side infinitesimally small. PostgreSQL's default configuration will not tell you it has committed a transaction until it can guarantee that nothing short of lying hardware will cause that transaction to be lost. Hence, it's up to the software to handle errors (like the database sever disappearing) by informing the user that their action failed, or to put the work aside until the database comes back. That said, the probability of losing data is very very small but still not zero (primarily due to lying hardware like RAID controllers), so you don't want to take any chances. Unless you're running MySQL, a few minutes is all you need to cleanly stop the database and shut down.

    3. Re:Duh! by Anonymous Coward · · Score: 0

      You're still hosed if your server's power supply goes titsup. Or if your hard drive crashes. Or if the building burns down.

      Redundant power supplies in 90% of modern servers. RAID in 99% and offsite backups performed by any good admin. I fail to see any point in your comment.

    4. Re:Duh! by smbarbour · · Score: 1

      Power supply failure - Redundant power supplies
      Hard Drive failure - RAID array (Though losing multiple drives at the same time... Very bad, unless you are using an "exotic" RAID level such as RAID 6 or 5+1 (Striped set with double distributed parity and mirrored striped set with distributed parity, respectively))

      Building burns down - You have bigger problems than just losing some data.

    5. Re:Duh! by Domint · · Score: 1

      You're still hosed if your server's power supply goes titsup. Or if your hard drive crashes. Or if the building burns down.

      Redundant power supplies, RAID, off-site replication.

      But thanks for playing.

    6. Re:Duh! by Anonymous Coward · · Score: 0

      I remember a discussion on the PostgreSQL hacker's list about recoverability and transaction logs.

      You can't make a system that will not lose data, you can only make a system that knows the last save point of 100% integrity.

      To me, that means not losing data. The database doesn't report to the application that the transaction completed until (in your words) it is in a "save point of 100% integrity". That is part of ACID compliance.

      Of course, if you're using a toy database like mySQL, you might have issues.

      There are too many variables and too much randomness on a cold hard power failure. You absolutely need a UPS that gives you time to shut down cleanly.

      Of course you should. UPSes are cheap.

    7. Re:Duh! by Prof.Phreak · · Score: 1

      You can't make a system that will not lose data, you can only make a system that knows the last save point of 100% integrity.

      How does one define `lose data'?

      If something is -in- the database (committed), then it will not get lost, no matter how abruptly you pull the plug on the box.

      Similarly, unless it's committed, you can't consider it to be -in- the database---so, it's not something that you can lose (as it's not in the database).

      If by losing data they mean stuff that's partly in the database while it crashes, then... d0h, the source of the data should be responsible for maintaining a copy before it gets a commitment confirmation.

      Ie: you can (and many do) build reliable systems that will not lose data no matter what.

      --

      "If anything can go wrong, it will." - Murphy

    8. Re:Duh! by turbidostato · · Score: 1

      "If something is -in- the database (committed), then it will not get lost, no matter how abruptly you pull the plug on the box."

      1. Transaction commited
      2. Abruptly plugged off machine
      3. Corrupted filesystem
      4. Last backup is from last night

      There: something was -in- the database but it's still lost.

    9. Re:Duh! by Prof.Phreak · · Score: 1

      2. Abruptly plugged off machine
      3. Corrupted filesystem

      Why would you still be using FAT32? :-/

      Modern file systems (unless they have bugs) should be as robust as databases when it comes to transactions and recovery from sudden power failures. ie: data is either written or not. There's no corruption.

      If data does get lost or corrupted, you can be sure it's the result of some bug, or lack of transactions---which doesn't fall under the category of ``system cannot be built to not lose data''.

      --

      "If anything can go wrong, it will." - Murphy

    10. Re:Duh! by Frank+T.+Lofaro+Jr. · · Score: 1

      If you are running MySQL, you don't care about data integrity anyway.

      --
      Just because it CAN be done, doesn't mean it should!
    11. Re:Duh! by kv9 · · Score: 1

      You're still hosed if your server's power supply goes titsup. Or if your hard drive crashes. Or if the building burns down.

      that's correct. but I've experienced power failures much more often than fried PSUs, drive crashes and fucking fires.

    12. Re:Duh! by jeremyp · · Score: 1

      No. The goal of transactions is to keep the data that is in the database consistent.

      In fact, using transactions will increase the amount of lost data, assuming that the database is the only persistent data store in your system. Let's say you are inserting five records into your database and the power fails just after writing the second record. If there is no transaction, on reboot, the database will contain the two records written. If the five writes are wrapped in a transaction, on reboot, the database will contain the two records successfully written, but the DBMS will roll the database back to just before the first record was written, thus deliberately destroying some data.

      The loss of data is a price that the developer chooses to pay because the integrity of the data is more important.

      --
      All I want is a secure system where it's easy to do anything I want. Is that too much to ask ~~ Randall Munroe
    13. Re:Duh! by enoz · · Score: 1

      I couldn't decide whether to mod you Insightful, Troll or just plain Funny.

    14. Re:Duh! by sam0737 · · Score: 1

      From the TFA, it's not talking about lossing the last minute incoming data, but in fact that the integrity could be doomed and corrupted (means potentially need to mkfs again), this is the case even if you have FS journal turned on, and/or using the database transaction, and/or using RAID.

      The most interesting idea is that the DMA controller and Harddrive could lives a milliseconds longer than your RAM, so garbage in the RAM (because if there is no refresh current for more than a second, everything in the RAM will be garbaged) could make their way into the harddrive, corrupting the journal.

    15. Re:Duh! by RegularFry · · Score: 1

      You can't make a system that will not lose data, you can only make a system that knows the last save point of 100% integrity.

      Even that is *much* harder than it looks if you've got to worry about the validity of the process that checks the integrity of a save point that's just been written. At some point, you've just got to hit and hope.

      --
      Reality is the ultimate Rorschach.
    16. Re:Duh! by mlwmohawk · · Score: 1

      You can't make a system that will not lose data, you can only make a system that knows the last save point of 100% integrity.

      Even that is *much* harder than it looks if you've got to worry about the validity of the process that checks the integrity of a save point that's just been written. At some point, you've just got to hit and hope.

      The PostgreSQL MVCC and the WAL stuff is an amazingly well done. It took a while to get there, but I'd call it one of the best implementation's I've seen in open source.

  7. Of course by Naurgrim · · Score: 1

    As my nieces would say, Durrrr! Yes, of course - you need a UPS. Next question please.

    --
    .......You Are,
    ...What You Do,
    When It Counts.
  8. So, big HD writes by Trigun · · Score: 1

    into a huge cache on the drive don't get written permanently if the power quits? Why didn't somebody tell me about this before?

  9. Silly Me! by Anonymous Coward · · Score: 0

    I always thought Gremlins caused data loss.

    Since when did power have anything to do with it?

  10. Well of course you need UPSs, but by pembo13 · · Score: 5, Informative

    APC is the only UPS maker on the market that has at least spent some small effort so that their UPSs can be properly integrated with a Linux machine. I made the mistake of purchasing an Ultra UPS as it was cheaper than the APC.

    --
    "Thanks for all the money you paid to us. We've used it to buy off ISO among other things" -Microsoft
    1. Re:Well of course you need UPSs, but by Anonymous Coward · · Score: 0

      You only need a UPS if the chance of a power failure is bigger than the chance of your UPS failing. The decision isn't always as clear cut as in the US.

    2. Re:Well of course you need UPSs, but by pembo13 · · Score: 1

      Well I am studying in the US, and there have been quite a few power surges that have been subdued by the UPS.

      --
      "Thanks for all the money you paid to us. We've used it to buy off ISO among other things" -Microsoft
    3. Re:Well of course you need UPSs, but by Anonymous Coward · · Score: 0

      I dunno about that. Just about 4:30 every morning, the power flickers for about half a second where I live. If it weren't for the ups, every morning my server would be rebooting.

    4. Re:Well of course you need UPSs, but by raddan · · Score: 1, Interesting

      Actually, UPS devices are useful for other kinds of things as well. Need to distribute load more evenly across your circuits? If you have the machine plugged into a UPS, you simply unplug the UPS and plug it into the other circuit. Heck, you could even do something really dumb like physically move the machine while it's running if you had it connected to a UPS.

    5. Re:Well of course you need UPSs, but by bruceg · · Score: 2, Interesting

      been there, and done that! We recently moved a few servers this way. Just be careful, and go slow.

    6. Re:Well of course you need UPSs, but by Anonymous Coward · · Score: 0

      MGE Systems supports the NUT project which supports a number of different UPSs. They now own APC, but didn't when they started supporting NUT.
      Since buying APC they don't seem to be selling MGE branded units in the US, but I bought a couple of Nova 1100s a while back and they seem to work OK.

    7. Re:Well of course you need UPSs, but by Anonymous Coward · · Score: 0

      The best (indeed only) real use my cheapo UPC ever got was in the big blackout of a couple of years ago that took down power in Ontario and some northeastern states for a couple of days - we plugged a clock radio into it and were able to keep up to date on the news (we had a battery powered radio, but loaned it to our neighbours who didn't).

    8. Re:Well of course you need UPSs, but by kriston · · Score: 1

      Yes, but APC (American Power Conversion) only did that after watching us reverse-engineer their data ports for 8 years.

      --

      Kriston

    9. Re:Well of course you need UPSs, but by Anonymous Coward · · Score: 0

      I have a line interactive opti-ups at home configured to cleanly shut down a debian linux server. It was less expensive than an offline apc-ups.

    10. Re:Well of course you need UPSs, but by Anonymous Coward · · Score: 0

      I moved a Frogger game like that a few years ago myself. Almost got it home, too. Unfortunately it got hit by a truck as I crossed the street with it.

    11. Re:Well of course you need UPSs, but by VanessaE · · Score: 1

      I'm sure APC is good, but they're not the only one. Powercom also makes some decent ones, and they come with a very simple userspace Linux driver and control scripts on the CD. Mine's a Black Knight BNT-800AP (800VA) line interactive type with a serial cable for the data connection. Like any decent UPS, it can order the computer to do a proper shutdown after the time limit you give it, and then it will even shut itself off a few minutes after that if you want it to. Surge protection for power and my DSL feed, power step up/down (-13% to +15%) as needed, etc. etc. I believe even the battery is user-replaceable. The documentation claims about an hour of runtime at typical loads, though I've never tested this. It claims a spike handling of 480 joules over 2ms, though I've thankfully never tested that. Power-off, low-battery, and overload alarms, soft on/off, etc etc. Seems to have everything one should need in a UPS, really. The only thing I found lacking is the grammar as found in the driver's messages and the manual - someone needs to learn proper English (as if mine's any better).
      .
      Er.. Can you tell I'm pleased with it?

    12. Re:Well of course you need UPSs, but by Anonymous Coward · · Score: 0

      FWIW, we're running a MGE Pulsar EXtreme that works well (enough) with the serial cable and NUT.

      I do however strongly second that you should investigate the possibilities of hooking up the UPS to your system(s) before purchasing it.

  11. What this really points out... by JesseL · · Score: 2, Insightful

    is a weak spot in the design of most computers.

    Computer power supplies should be built with enough spare capacitance to run things long enough for the computer to save critical data, and operating systems and critical apps should be able to handle an emergency shutdown and save critical data in very short order.

    This is old hat in embedded systems.

    --
    "Prefiero morir de pie que vivir siempre arrodillado!"
    1. Re:What this really points out... by mlwmohawk · · Score: 4, Informative

      Computer power supplies should be built with enough spare capacitance to run things long enough for the computer to save critical data

      Here's a question for you: Calculate the size of the capacitor needed that can hold enough power to run a 200W load for 5 minutes and maintain a voltage level within a specific usable range.

      Hint: its BIG. batteries are more space efficient, but the chemicals and outgassing make them inappropriate for location INSIDE the computer box.

    2. Re:What this really points out... by JesseL · · Score: 4, Insightful

      Who the hell is talking about 5 minutes!? I'm saying you should be able to get a clean shutdown in 5 seconds if you prioritize it correctly.

      --
      "Prefiero morir de pie que vivir siempre arrodillado!"
    3. Re:What this really points out... by Macman408 · · Score: 5, Interesting

      This is old hat in embedded systems.

      Yes, but embedded systems usually have lower power requirements, or at the very least, a smaller range of power requirements. You can't add 3 PCIe cards, a few extra drives, and a few more GB of RAM to most embedded systems.

      I worked on the design of an embedded system a few years ago that had a holdup spec - I think it was supposed to survive for 50 ms with no power. So a 50 ms power interruption would result in continued operation, while an outage longer than that was allowed to reset the board. However, the power draw on the board was around 200 Watts; being able to supply that much power for that long in a fairly compact form factor was a huge hurdle. It also caused airflow problems, because the giant capacitors would prevent air from getting to other components on the board, like the CPU. In the next version of the spec, I believe the holdup requirement was eliminated - apparently we weren't the only ones having trouble meeting that requirement.

    4. Re:What this really points out... by Locklin · · Score: 3, Insightful

      Why 5 minutes? It usually takes less than a second to run a sync on the disks depending on how active they are. A couple seconds of runtime should be enough to do an "emergency shutdown" and avoid data corruption.

      ####@johncash:~$ time sync

      real 0m0.004s
      user 0m0.004s
      sys 0m0.000s

      --
      "Knowledge is the only instrument of production that is not subject to diminishing returns" -Journal of Political Econom
    5. Re:What this really points out... by Anonymous Coward · · Score: 0

      If only there were some kind of...battery-powered...device that might perform this function, allowing machines to run for a time after electric mains failed. Hmm...

    6. Re:What this really points out... by mlwmohawk · · Score: 1

      Who the hell is talking about 5 minutes!? I'm saying you should be able to get a clean shutdown in 5 seconds if you prioritize it correctly.

      I'm not sure what your system is, but for this to be a general purpose device, it needs to work within the realm of real life systems. Have you ever typed "sync" on a busy system and had it go away for a minute or more?

      5 minutes is a "safe" number. It takes time to detect a power failure more than a mere "spike." You don't want to start a shut down and suddenly have the power come back on. How would you know to restart the system?

      If this were a real product, it would need a hell of a lot more than just a big capacitor.

    7. Re:What this really points out... by mlwmohawk · · Score: 1

      A couple seconds of runtime should be enough to do an "emergency shutdown" and avoid data corruption.

      As I said in another post, it is very much more complicated than just a few seconds.

    8. Re:What this really points out... by natoochtoniket · · Score: 2, Interesting

      The problem is that different applications systems have different amounts of stats that must be saved. An RT app usually only has a memory buffer that can be written in a small number of IO's. Many business apps have relatively lots of data, in non-contiguous buffers, that require hundreds of IOs to store. Many business systems have hundreds of such apps running in the machine at the same time. Some systems can have gigs of data, in thousands of buffers, in their write-behind cache. And, some businesses have systems that must not shut down, except for actual emergencies like fire or flood.

      How does the hardware designer of a general-purpose computer guess what kinds of apps will run in that machine? He/she cannot.

      The external power supply (aka, the UPS) can be configured to accommodate the needs of the application. An application that needs lots of power for a long time can be configured with a big UPS. And, an app that doesn't need it, doesn't have to pay for it.

    9. Re:What this really points out... by JesseL · · Score: 1

      I'm not saying that UPS are completely unnecessary, I'm saying that most computers are excessively vulnerable.

      --
      "Prefiero morir de pie que vivir siempre arrodillado!"
    10. Re:What this really points out... by Firehed · · Score: 3, Informative

      Other than the lack of communication at present between the PSU and the rest of the system (on a hardware and software level), what you're describing really seems to be the computer equivalent of throwing your hands in front of your nuts as you spot the incoming baseball. It helps the immediate problem of data (or testicle) loss, but it's really just a small amount of damage control.

      This is why a proper UPS that can trigger a full system shutdown once you hit a certain power remaining threshold is far preferable. Granted I'd rather have a controlled crash than the risky nonsense that would come from the power cord being yanked, but (right now) computers can only go so far to help themselves in a couple-second window.

      --
      How are sites slashdotted when nobody reads TFAs?
    11. Re:What this really points out... by droopycom · · Score: 2, Insightful

      You mean, the battery location on my laptop is not appropriate ?

      I know laptop and servers are very different but still, if my laptop can run 2 or 3 hours on a battery (including the LCD), it should not be that difficult to use the same technology to power a server for a 5 minutes (with no screen needed).

    12. Re:What this really points out... by JesseL · · Score: 2, Interesting

      I think you're making it more complicated than it needs to be.

      If the system gets a signal that power is going away very, very soon, drops everything else, and just devotes its last seconds to getting things in order - it should be doable in a few seconds and be vastly preferable to the alternative of just having power go away without warning.

      Obviously a UPS is an even better option, but it's not every place that could use a UPS is ever going to get one and it would be good if we could work on the problem from the other end too. Most PCs and casual servers are way more vulnerable to momentary power outages than they ought to be. 10-20 Farads worth of 5V caps and some thoughtful programming would make things a lot less delicate.

      --
      "Prefiero morir de pie que vivir siempre arrodillado!"
    13. Re:What this really points out... by caluml · · Score: 1

      You only really need the time to sync the file systems, flush the unsaved data to disk, and maybe park the heads.

    14. Re:What this really points out... by caluml · · Score: 1

      Just noticed the same comment further down the thread. Meh.

    15. Re:What this really points out... by mlwmohawk · · Score: 1

      You mean, the battery location on my laptop is not appropriate ?

      Not in a server environment, no.

    16. Re:What this really points out... by lluBdeR · · Score: 1

      You mean, the battery location on my laptop is not appropriate ?

      Your laptop doesn't use lead acid batteries, my UPSes do and frankly, I wouldn't have it any other way. Beats the hell out of Li-* blowing up my basement or several short outages (the norm here) causing memories in Ni-* batteries.

    17. Re:What this really points out... by clone53421 · · Score: 1

      You don't want to start a shut down and suddenly have the power come back on. How would you know to restart the system?

      Um, yes you do. The alternative is an immediate harsh shutdown. If you can figure out how to make it boot up again when the power comes back, more power to you, but that's icing on the cake.

      An in-box power failsafe wouldn't be intended to smooth out brief power losses, and it wouldn't be a take-your-time-and-shut-down sort of thing. It would be a cold, hard, quick shut down that does the bare minimum to avoid data loss. That might be as simple as dumping the RAM to flash. People who want to run smoothly through a 2 minute power outage or perform a take-it-easy shutdown can get a battery backup UPS like they've already been doing.

      --
      Alexander Peter Kristopeit bought his basement from his mommy for one dollar.
    18. Re:What this really points out... by Kingrames · · Score: 1

      5 seconds? pussy. pull the plug.

      --
      If you can read this, I forgot to post anonymously.
    19. Re:What this really points out... by Chabil+Ha' · · Score: 1

      It's not a weak spot, it's an abstraction. The components expect DC, and the power supply acts as an interface to get it. It's fulfilling its purpose. If you really wanted to be technical about it, the responsibility for providing power ought to trace back to its provider, the nearest being the breaker, then the main from the outside. If you want to put your reliability anywhere, it'd be there.

      --
      We're all hypocrites. We all have hidden parts, it's the contrast between them that make us more a hypocrite than others
    20. Re:What this really points out... by Bender0x7D1 · · Score: 1, Insightful

      Apples and oranges. I'll use a car analogy since they are always appropriate.

      If my car can run 6 or 7 hours at 70 MPH, it should not be that difficult to use the same car to run at 420 MPH for 1 hour.

      --
      Reading code is like reading the dictionary - you have to read half of it before you can go back and understand it.
    21. Re:What this really points out... by drsmithy · · Score: 1

      I know laptop and servers are very different but still, if my laptop can run 2 or 3 hours on a battery (including the LCD), it should not be that difficult to use the same technology to power a server for a 5 minutes (with no screen needed).

      It's not. They're called UPSes.

    22. Re:What this really points out... by fmaresca · · Score: 1

      No, you can't only do a sync, in most production environments you need to stop services before syncing your disks, and this will take much longer. What's the point of syncing if the apps will continue dirting buffers beyond the sync time?

    23. Re:What this really points out... by jimicus · · Score: 4, Informative

      Why 5 minutes? It usually takes less than a second to run a sync on the disks depending on how active they are. A couple seconds of runtime should be enough to do an "emergency shutdown" and avoid data corruption.

      ####@johncash:~$ time sync

      real 0m0.004s
      user 0m0.004s
      sys 0m0.000s

      That will sync the disks, but it won't stop the database from accepting incoming data. It won't stop cron jobs which might be just about to trigger. It won't deal with tasks that are in the middle of a big operation which involves a lot of writing to disk.

    24. Re:What this really points out... by kayditty · · Score: 1

      It shouldn't be that difficult for your car to have the appearance of running at 420mph while you're under the influence of certain substances.

    25. Re:What this really points out... by Anonymous Coward · · Score: 0

      you forgot to post anonymously, dumbass.

    26. Re:What this really points out... by XnavxeMiyyep · · Score: 1

      Five seconds aught to be enough for anybody!

      --
      I put the 't' in electrical engineering.
    27. Re:What this really points out... by FlyingBishop · · Score: 1

      The most serious problem pointed out is the possibility that the disk may keep writing garbage from memory- if the power supply could buy enough time to throw a full kernel panic, that should be sufficient to let any journalling prevent corruption, even if kernel panic is all the system can do before power disappears.

    28. Re:What this really points out... by rcw-work · · Score: 1

      However, the power draw on the board was around 200 Watts; being able to supply that much power for that long in a fairly compact form factor was a huge hurdle.

      Of course I don't know the voltage tolerances you were working with in your design, but many PCs survive even 200ms just fine, although they store most of their power in (fairly small) inductors. If your design voltage is 5V and your draw is 40A, then you need 12.5mH to store 10 joules (although you'd need several times more than that because you need power within tolerances for that entire 50msec) and a freewheeling diode to source those electrons from ground. One bad thing about inductors is they need special consideration to prevent voltage spiking or arcing on load disconnect.

    29. Re:What this really points out... by DamnStupidElf · · Score: 1

      If my car can run 6 or 7 hours at 70 MPH, it should not be that difficult to use the same car to run at 420 MPH for 1 hour.

      Well, except for wind resistance and kinetic energy increasing geometrically instead of linearly, you can make basically the same argument. If you took 6 or 7 copies of your car, took all the engines out and mounted them in one beefed up car, you could probably get an incredible speed increase for an hour on one tank of gas. Sticking a bunch of parallel laptop batteries and an inverter on a server would run it just fine.

    30. Re:What this really points out... by Frank+T.+Lofaro+Jr. · · Score: 1

      So, having a special system call which disables task switching and doing that, and then sync would stop those issues.

      --
      Just because it CAN be done, doesn't mean it should!
    31. Re:What this really points out... by Macman408 · · Score: 1

      It's true (at least anecdotally) that PCs can survive a couple hundred milliseconds... The main constraint we had was size. Any capacitors we added took away from board space we could use for useful logic. I believe the input was 48V, and the power supply would work with input down to maybe 32V or so - so we had to store almost twice the energy that we were going to be needing. Not to mention, finding capacitors with a high enough voltage rating and low enough height limits you to a much lower capacitance than is ideal. And even then, as I mentioned, it caused airflow problems.

    32. Re:What this really points out... by rcw-work · · Score: 1

      Not to mention, finding capacitors with a high enough voltage rating and low enough height limits you to a much lower capacitance than is ideal.

      That's true. You can run them in series, but it's the same energy density either way (and getting too close to the voltage limit will shorten capacitor life).

    33. Re:What this really points out... by multipartmixed · · Score: 1

      Sun T3s (storage arrays) use batteries inside the power supply. These are a ~4U box with 9 disks on fibre.

      Effing expensive batteries ($1200), one in each power supply, and they have to replaced every three years. *grumble*

      --

      Do daemons dream of electric sleep()?
    34. Re:What this really points out... by Anonymous Coward · · Score: 0

      >Hint: its BIG. batteries are more space efficient, but the chemicals and outgassing make them inappropriate for location INSIDE the computer box.

      Ever heard of a laptop computer?

    35. Re:What this really points out... by JesseL · · Score: 1

      According to my figuring (correct me if it's wrong) 50ms * 200 watts = 10 Joules = .8 Farads @ 5V.

      Even 5 farad 5V supercaps aren't all that big or expensive any more.

      --
      "Prefiero morir de pie que vivir siempre arrodillado!"
    36. Re:What this really points out... by Anonymous Coward · · Score: 0

      Here's a question for you: Calculate the size of the capacitor needed that can hold enough power to run a 200W load for 5 minutes and maintain a voltage level within a specific usable range.

      Have a look at some of the new supercapacitors -- they're getting to be capable of enormous storage.

    37. Re:What this really points out... by Bender0x7D1 · · Score: 2, Interesting

      I agree with you.

      My point was that just because a battery can power a laptop for several hours doesn't mean a single battery can supply a server for 5 minutes. So, the GP was claiming that because: (laptop power consumption) * (2-3 hours) == (server power consumption) * (5 minutes) it shouldn't be hard for the same battery to power both. The point I was trying to make is that a device that provides a certain range of performance, (in this case the car at 70 MPH), doesn't mean it is easy for it to perform well outside that range, (operating at 420 MPH).

      --
      Reading code is like reading the dictionary - you have to read half of it before you can go back and understand it.
    38. Re:What this really points out... by mlwmohawk · · Score: 1

      Effing expensive batteries ($1200), one in each power supply, and they have to replaced every three years. *grumble*

      I bet, the safety and environmental requirements must be huge.

    39. Re:What this really points out... by jewelises · · Score: 1

      That will sync the disks, but it won't stop the database from accepting incoming data. It won't stop cron jobs which might be just about to trigger. It won't deal with tasks that are in the middle of a big operation which involves a lot of writing to disk.

      How about this?

      killall5 -19; sync

      killall5 sends a signal to all running processes other than the current session's. Signal 19 is a SIGSTOP, it immediately freezes a process and cannot be caught or ignored.

    40. Re:What this really points out... by enoz · · Score: 1

      We could name this system call 'shutdown'.

      Then maybe if we knew the power was about to drop out it could call 'shutdown now'.

      Brillant.

    41. Re:What this really points out... by jimicus · · Score: 1

      That will sync the disks, but it won't stop the database from accepting incoming data. It won't stop cron jobs which might be just about to trigger. It won't deal with tasks that are in the middle of a big operation which involves a lot of writing to disk.

      How about this?

      killall5 -19; sync

      killall5 sends a signal to all running processes other than the current session's. Signal 19 is a SIGSTOP, it immediately freezes a process and cannot be caught or ignored.

      Doesn't work.

      Postgres, for instance (and I doubt it's the only program to which this applies - I suspect the same is true of most databases) doesn't guarantee that its own data files are consistent unless it's been cleanly shut down. While you will flush any writes which are in progress and prevent any further writes, you won't guarantee that the end result will contain consistent data.

      Any sensible database will have a mechanism to recover from this, but that mechanism may result in you losing the last transaction which wasn't fully committed. All the atomicity in ACID means is "you won't start up the database after a crash and find your data doesn't make any sense because only half a transaction was committed".

      It doesn't mean "As soon as you hit enter a transaction is guaranteed to commit successfully even if the power fails during the commit process".

    42. Re:What this really points out... by Anonymous Coward · · Score: 0

      Who the hell is talking about 5 minutes!? I'm saying you should be able to get a clean shutdown in 5 seconds if you prioritize it correctly.

      You obviously dont work in a Windows environment.

    43. Re:What this really points out... by evilviper · · Score: 1

      Why 5 minutes? It usually takes less than a second to run a sync on the disks depending on how active they are.

      You've never tried that on a server with a several GBs of RAM. It can take quite a while to flush all the RAM-cached fs data to disk... on the order of a couple minutes.

      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
    44. Re:What this really points out... by Anonymous Coward · · Score: 0

      Maybe on a single user system.

      On a multi user system you could have a back end database with a front end web server taking requests from clients wherever. The web server could be running complex SQL statements potentially causing contential with other users. You can have huge SQL databases with gigs of ram where a lot of that is just for caching.

      I personally work with a few systems with 32 gigs of ram that are clustered so 5 seconds to commit or rollback transactions, sync with the cluster pair, write to disk, sync the file system, etc.. is way too optimistic. We run dual UPS, dual power supply, dual power feed unit (cant think of the real name off hand), and dual generator along with server level redundancy just to avoid problems of data loss.

    45. Re:What this really points out... by Frank+T.+Lofaro+Jr. · · Score: 1

      That's not a system call.

      And if you think about the halt system call there is a race condition.

      Doing sync, then halt allows other things to happen between those two calls.

      Unless Linux has a halt syscall that can take an argument to tell it to sync. (and does the right thing, by stopping tasks from running, then sync, then halt)

      --
      Just because it CAN be done, doesn't mean it should!
    46. Re:What this really points out... by Decibel · · Score: 1

      That will sync the disks, but it won't stop the database from accepting incoming data. It won't stop cron jobs which might be just about to trigger. It won't deal with tasks that are in the middle of a big operation which involves a lot of writing to disk.

      That's why you run an ACID database and not a toy database if you care about your data. If the database says "yes, it's committed" then the data damnwell better be committed (unless your hardware or OS is lying about when data hits non-volatile storage, in which case there's nothing the database can do).

    47. Re:What this really points out... by jimicus · · Score: 1

      That's why you run an ACID database and not a toy database if you care about your data. If the database says "yes, it's committed" then the data damnwell better be committed (unless your hardware or OS is lying about when data hits non-volatile storage, in which case there's nothing the database can do).

      The ACID aspect guarantees that you won't wind up rebooting after a crash to discover that a transaction is half-committed and the numbers in table A don't correspond with the numbers in table B. One obvious way to implement that is that you mark the start and end of each transaction directly to the filesystem as you're going along, and if you come back from a reboot to find a bunch of transactions with start markers but no end markers, you know they didn't commit so you roll them back. With a large database, the commit process could involve updating lots of tables in lots of different files so it's not possible to do them with one OS call.

      Furthermore, sync is an operating system command which will have the operating system process any outstanding filesystem synchronisation tasks. It doesn't contact the database and say "hey, are you through with committing any outstanding transactions?"

    48. Re:What this really points out... by IdeaMan · · Score: 1

      It is the hardware and software written under the assumption that it is somehow OK to take more then 1 seconds to either start up or shut down that is the problem.

      Why do computers take so long to boot up? Require all add-on cards to self test and report good within 100ms of powerup. Use non-volatile memory to automatically back up ram. Computer software and hardware design just stinks these days... and yes I am an embedded systems engineer.

      --
      They ARE out to get you simply because They are in it for themselves and they don't care about you.
  12. It happened to someone by Joebert · · Score: 4, Insightful

    The funny part is someone had to have thought they were safe without a UPS for this to become news.

    --
    Wanna fight ? Bend over, stick your head up your ass, and fight for air.
    1. Re:It happened to someone by Verteiron · · Score: 4, Funny

      Yes. My first reaction upon reading the summary was.. "Duh?" What, did they have it plugged into the wall before that? A UPS becomes MORE critical, not less, as the cost of hardware (RAID arrays are expensive) goes up.

      --
      End of lesson. You may press the button.
    2. Re:It happened to someone by thermian · · Score: 1

      Would you believe that a certain major UK university runs its entire computer science dept without either UPS or power spike protection?

      I was surprised, especially since I saw how the regular power spikes blew computer after computer and nothing was done.

      As for trying to run experiments that took more then a day or two to complete, well, can you picture a post grad who's just found that a weeks work has once again been wasted because some bean counter refused to pay for department wide UPS?

      Alas I don't have to imagine, and I became quite well versed in the experience.

      --
      A learning experience is one of those things that say, 'You know that thing you just did? Don't do that.' - D. Adams
    3. Re:It happened to someone by mortonda · · Score: 1

      No kidding. I don't run any computers in this house without a UPS. I just gave a computer back to a friend which I had been working on - the hard drive was shot. I asked if he had power problems, sure enough, they have brownouts, blinks,etc... and no UPS. Is it any wonder they lost the hard drive?

    4. Re:It happened to someone by Joebert · · Score: 1

      I know I started to take UPS for granted when I went a year or so working almost exclusively with laptops.
      One storm after that stint where I lost a whole days work snapped things right back into perspective for me though.

      --
      Wanna fight ? Bend over, stick your head up your ass, and fight for air.
    5. Re:It happened to someone by Slashdot+Parent · · Score: 1

      (RAID arrays are expensive)

      You are doing something wrong, then.

      Redundant Array of Inexpensive Disks

      --
      They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock
  13. Don't for get to test people, TEST! by sco_robinso · · Score: 5, Insightful

    In my company, everything is behind UPSs. Our SAN is even behind 2 separate UPSs. We thought everything was configured properly, but you'd be surprised what comes to roost when you test everything.

    We recently had a test night where all we did was test the UPS system and shutdown procedures, and there was a couple gotchas. Interestingly, by default the APC powerchute app we were using defaulted to shutting down the UPS completely after the [first] server went down - not good. This was buried fairly deeply in the configuration.

    Equally important to any protection measure, be it RAID, Power Protection, whatever - is testing!

    1. Re:Don't for get to test people, TEST! by Darkk · · Score: 4, Interesting

      I 100% agree with the idea of testing under controlled conditions. The oops you guys discovered is a good thing to be caught early on. I can imagine the look on your support team's faces when the UPS suddenly turned itself off while the remaining servers still trying to perform a safe shutdown. I'm sure the secondary UPS was left running as a precaution until the test is successful.

      I have seen a screw up where somebody cut into a live power cord thinking it was a tie wrap caused a major short in the PDU. The guy thought he was safe until he discovered whoever installed the servers didn't double check the power connections and loads so it created a cascade failure in several racks and lost several tons of data. Recovery took awhile.

      Least to say it was not a good day.

    2. Re:Don't for get to test people, TEST! by raddan · · Score: 1

      It is also important to note that on {many | most} SAN-connected RAID enclosures, there's actually a battery backup unit that writes pending transactions to disk before the unit switches itself off due to power loss. Now, this doesn't help you when one of the SAN clients starts blatting out garbage, but assuming your clients are connected to UPS devices, that shouldn't happen.

    3. Re:Don't for get to test people, TEST! by Anonymous Coward · · Score: 0

      ...2 separate UPSs...

      Hah! I've got a Beowulf cluster of UPSs!

    4. Re:Don't for get to test people, TEST! by Anonymous Coward · · Score: 0

      >In my company, everything is behind UPSs. Our SAN is even behind 2 separate UPSs. We thought
      >everything was configured properly, but you'd be surprised what comes to roost when you test
      >everything.

      If your SAN has those 2 UPSs hooked up in series, you might want to test it. Some major UPS companies don't recommend this, as it can confuse the electronics of the second UPS because of the shape of the waveform coming out of the first UPS.

      It might sound like a good idea on the surface, but unless it's properly designed, it may be worse than just having a larger single UPS.

    5. Re:Don't for get to test people, TEST! by oddaddresstrap · · Score: 1

      lost several tons of data

      You really should look into moving your data to a parchment-based system and dump all those old stone tablets.

    6. Re:Don't for get to test people, TEST! by Darkk · · Score: 1

      Yep, would explain why Iron Mountain been complaining the back data is too heavy.

    7. Re:Don't for get to test people, TEST! by sco_robinso · · Score: 1

      No, they're not in a series.

      Each UPS is plugged into it's own dedicated electrical circuit - no UPS is plugged into another one. Our building's property managers have also done a pretty good job with redundancy in the building's ciruits.

      And yes, in a controlled environment, this saved us a lot of potential headaches. Even the particular servers and UPS's I was personally responsible had a couple minor issues. Some benign configuration change I made in PowerChute caused the UPS to stop communicating with the servers.

    8. Re:Don't for get to test people, TEST! by clone53421 · · Score: 1

      Obviously he was referring to the 64 terabytes of data they had stored on Fujifilm tapes...

      (joke, but I didn't totally pull that number out of my ass... I guesstimated a 4 gb tape weighs about 4 oz and figured what 2 tonnes would hold... wow, the Firefox spell check knows the word "guesstimated"!)

      --
      Alexander Peter Kristopeit bought his basement from his mommy for one dollar.
    9. Re:Don't for get to test people, TEST! by dossen · · Score: 1

      If a server is important enough to use double UPS, would it not be a better idea to have redundant PSUs and connect one UPS to each - that way it would be protected against either of the UPSs or PSUs failing (as long as they fail to "power off") - as well as the mains failing?

    10. Re:Don't for get to test people, TEST! by Anonymous Coward · · Score: 0

      I 100% agree with the idea of testing under controlled conditions.

      Amen to that. I once worked for a jerk who had all the answers. He desperately wanted a new server, but couldn't justify the cost to his management. It was the only large server in a fairly small department and we managed it ourselves.

      I did some calculations based on the estimated time to rebuild the server in case of a total drive failure where we had to acquire all new drives versus the cost of people idled for a couple of days for the outage. It neatly justified the cost of a backup server.

      I built the new one from scratch (OS only -- it was an OS/2 system), then applied the the current data from backups.

      We did have the OS backed up, but, since this would be an additional server, we couldn't clone the main one exactly onto it because they wouldn't be able to coexist and idiot child didn't want to make the adjustments that would allow them to coexist.

      As it turned out, the build and data restore didn't work because ACLs from the main system didn't get handled properly by the new one. We manually took care of that and verified all was well.

      At this point, I said, "Well, we'll never have a better chance to test a full disaster recovery than now, before the new server is put into service, so let's scrape the drives and test the restore processing. But no, the throbbing brain decided that we didn't have time for that kind of foolishness.

      We replaced the old server with the new one and relegated the old one to a test bed. Eventually, we did have a series of drive failures on the new one and had to fall back to the old one. Surprise, surprise -- the restore failed in the same way due to authentication problems and it still took a couple of days to get it right. Boo-fcking-hoo -- I told the shithead we should test recovery, but, as always, he had the superior idea -- just charge on.

      Do you have any idea how hard it was to keep from laughing my ass off? Of course it was all my fault. Somehow.

  14. On the other hand.. by m0i · · Score: 2, Interesting

    you can recover your RAM minutes after loosing power.. no kidding! http://citp.princeton.edu/memory/

    --
    have you been defaced today?
    1. Re:On the other hand.. by GuldKalle · · Score: 1

      There's a really big difference between getting 99.9% of your data back, and getting your data back, so using this to get your system back up would be limiting to impossible.
      Personally, i'd much rather go with knowing I've lost data than possibly having wrong data.

      --
      What?
    2. Re:On the other hand.. by Anonymous Coward · · Score: 0

      > after loosing power

      What do you mean by loose power? I'm an EE, and I've never heard that slang before.

    3. Re:On the other hand.. by icegreentea · · Score: 1

      Only if all your RAM sticks are guarded by cans of compressed air/canisters of liquid nitrogen.

    4. Re:On the other hand.. by enoz · · Score: 1

      Hey that would work great to prevent memory theft as well.

      It could trigger the liquid nitrogen with the case-open alarm and the thief would end up with some painfully crunchy fingers.

  15. I know PHB's try to cut costs.... by knarfling · · Score: 1

    I know that PHB's will try to cut costs, and that unnecessary hardware is the first to be cut, but is there ANYONE who believes that a UPS is not needed? Are there really people out there that think, "We don't need the UPS right now. We can wait until we have more money."

    It boggles my mind that there is even a need for such an article

    --
    Great civilizations have lived and died on false theories. Don't mess up mine with a few facts.
    1. Re:I know PHB's try to cut costs.... by Anonymous Coward · · Score: 0

      Uh yes, there are. Why should I bother with the expense of a UPS? I don't even back up. I migrate to new disks as needed. My subversion repository/media server runs on an old laptop, and so has a built in UPS.

    2. Re:I know PHB's try to cut costs.... by myowntrueself · · Score: 1

      You are not kidding.

      In order to convince management to fund a backup airconditioning unit for our machine room I had to write up a *business case*.

      Management want to replace our entire email infrastructure with something approaching MS Exchange on size and complexity? No written business case, they just go with whatever is shiniest.

      --
      In the free world the media isn't government run; the government is media run.
  16. That's what I always say sometimes by RiffRafff · · Score: 1

    Well, duh. Thank you Captain Obvious.

    Here's question for you all. I have a cheap Conext (made by APC) IPS. Yes, it's an interruptible power supply. It used to work fine, but once I added a Samsung b/w laser printer, whenever the printer's heating element first comes on, the UPS drops out immediately and the computer restarts. Even put a new battery in it; no help. The printer, btw, is NOT plugged into the UPS. The line voltage appears to get yanked down just momentarily and the computer ignores it, when off the UPS. The UPS, with nothing plugged in to it, always clicks off then back on once during the printer's warm-up cycle. Is the UPS just too small (900 AVR)?

    --
    "I might have made a tactical error in not going to a physician for 20 years." -- Warren Zevon
    1. Re:That's what I always say sometimes by CastrTroy · · Score: 1

      Most UPS devices should have a test button. Try pushing the test button when your computer isn't doing anything critical to see if it really can stand up to the load. If you don't have a test button. Just yank the cord from the wall (or back of the UPS unit), If it fails, it means that you don't have enough power for the devices hooked up to it.

      --

      Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
    2. Re:That's what I always say sometimes by Reziac · · Score: 1

      The docs for every UPS I've ever seen say do NOT attach a laser printer to them, for exactly the reasons you've seen -- when the printer comes on, the startup drawdown is just too much (it's generally about four computers worth).

      Your printer should be on a good surge protector, but there is no reason it needs to be attached to the UPS. Some UPSs now have spare plugs for exactly this use -- they provide surge protection but not continuous power.

      --
      ~REZ~ #43301. Who'd fake being me anyway?
    3. Re:That's what I always say sometimes by alta · · Score: 5, Interesting

      Rule #1.

      NEVER plug a laser printer into a UPS. The power that the fuser draws is WAY too much.

      Look at some of the cheap office units, they show little pictures on them, notice the printer icon is on the surge side, NOT battery/surge side.

      If the power goes out, you should NOT be trying to print.

      http://articles.techrepublic.com.com/5100-10878_11-6085460.html See #6

      http://arstechnica.com/guides/other/ups.ars/3

      http://www.jetcafe.org/npc/doc/ups-faq.html#0405 see 04.05

      Would you put a space heater on a UPS? Shredder? Vacuum? Table Saw? If you put a laser printer on it, you may as well.

      --
      Do not meddle in the affairs of sysadmins, for they are subtle, and quick to anger.
    4. Re:That's what I always say sometimes by bgat · · Score: 2, Informative

      Yes, quite. It can't handle the substantial inrush current needed by the laser printer.

      The "click" you hear in the UPS when the laser printer warms up is the UPS noting the drops on the power mains, which gives you some idea just how much current that printer needs.

      I have a Samsung ML2150, and have noticed the same thing. Lights flicker, etc. whenever I submit a print job and the printer transitions from standby to active. The various UPSes in my office sense that, and respond with clicks and beeps.

      Take the laser printer off the UPS. If you really need printer capability during a power failure, switch to an ink jet.

      --
      b.g.
    5. Re:That's what I always say sometimes by timster · · Score: 1

      You should read his whole comment -- the printer is not plugged in to the UPS.

      --
      I have seen the future, and it is inconvenient.
    6. Re:That's what I always say sometimes by alta · · Score: 1

      Maybe I'm the dumbass here, WTF is an interruptible power supply? And why is it called a UPS, when a UPS is an UNinterruptible power supply?

      --
      Do not meddle in the affairs of sysadmins, for they are subtle, and quick to anger.
    7. Re:That's what I always say sometimes by RiffRafff · · Score: 1

      Hi. Thanks. But, um, in my post I wrote "The printer, btw, is NOT plugged into the UPS."

      I don't understand how the printer could yank the line voltage down so that the UPS faults, and yet a computer plugged directly into the wall can handle it. Unless my computer's power supply buffers better than the UPS.

      Maybe I'm not explaining it well enough. If I plug my computer into the UPS, and plug the printer into a different wall outlet in the room, when I print, I hear the UPS click, and then the computer resets. If I unplug the computer from the UPS, and then plug the computer into yet another wall outlet, when I print the UPS still clicks (with nothing plugged into it), but the computer is fine.

      Printer is plugged into a reasonably high-end Tripp-Lite Isobar suppressor (but acts the same without it).

      --
      "I might have made a tactical error in not going to a physician for 20 years." -- Warren Zevon
    8. Re:That's what I always say sometimes by RiffRafff · · Score: 1

      Thanks for responding; please see my reply to Reziac.

      --
      "I might have made a tactical error in not going to a physician for 20 years." -- Warren Zevon
    9. Re:That's what I always say sometimes by natoochtoniket · · Score: 1

      Is the UPS just too small (900 AVR)?

      Duh? You think it might?

      Read the power labels on all of the devices that you intend to plug into that power supply. Add up the volt amps (volts times amps), or the watts (almost the same thing). The total needs to be smaller than the power-supply.

      Even if the capacity numbers look good, batteries lose capacity as they age.

    10. Re:That's what I always say sometimes by RiffRafff · · Score: 1

      If I yank the UPS power cord, it goes to battery with nary a hiccup. There is something about the high current draw of the printer on the line voltage that messes with it. When the printer warms up, the UPS never goes to battery, but just loses power (or voltage level drops too low) to the computer. Very odd.

      --
      "I might have made a tactical error in not going to a physician for 20 years." -- Warren Zevon
    11. Re:That's what I always say sometimes by Anonymous Coward · · Score: 0

      Laser printers (in particular, the fuser roller, or in some types, the corona wire) cause momentary very high current draw. This trips an internal protection breaker in most UPSes. Not the surge suppressor, just an internal overcurrent breaker to keep the battery from overheating and catching fire.

      APC's larger systems can handle it, but they're quite expensive. You can tell which ones can handle laser printers by the fact that they're rack-mountable and produce a true sine wave output. Oh, and they cost multiple kilodollars.

      Or, since you already have a decent UPS (900 VA is big enough for a PC and everything typically attached to it), you can just move the laser printer to a non-battery-backed plug. It won't cause problems if you keep it off the battery.

    12. Re:That's what I always say sometimes by Microlith · · Score: 1

      Does the computer reboot when you run a test on the UPS?

      Or when you just yank the UPS's plug from the wall?

    13. Re:That's what I always say sometimes by clone53421 · · Score: 1

      The printer, btw, is NOT plugged into the UPS.

      --
      Alexander Peter Kristopeit bought his basement from his mommy for one dollar.
    14. Re:That's what I always say sometimes by clone53421 · · Score: 1

      NEVER plug a laser printer into a UPS.

      ...He didn't. Go back and read his post again.

      --
      Alexander Peter Kristopeit bought his basement from his mommy for one dollar.
    15. Re:That's what I always say sometimes by Anonymous Coward · · Score: 1, Insightful

      WTF is wrong with your power installations, guys? Flickering lights, brownouts whenever the printer warms up, voltage spikes every day? Is your electricity produced by hamsters in wheels and delivered through bell wire? Perhaps you should stop sprinkling the place with UPSs and pay someone to redo your electrical installation instead.

    16. Re:That's what I always say sometimes by alta · · Score: 1

      Sorry, somehow I missed that part...

      Sounds to me like the UPS is just bad. The printer is causing a brown out when the fuser warms up. The UPS isn't able to cope. The click is it switching to battery.

      1. Plug a digital clock in, or something else with a VERY low draw that you'll notice if it goes off. Don't plug ANYTHING else into it.

      2. Turn on printer.

      If it resets at this minimum load, you have a bad UPS. If it doesn't, you're probably drawing too much.

      3. If it doesn't reset, try adding load. Maybe try 60watt lamps, as you know what they draw and can add one at a time. Test between each addition.

      At some point you'll have an idea of what sort of wattage your UPS fails at. If it's something really low, you STILL have a bad UPS. If it's something reasonable your UPS is just too wimpy for your load.

      To give you an idea, I have an 750VA APC with a Quad core 2.0GHz processor, 2 drives, 2 high end video cards and a 20" LCD plugged in. And a voip phone. It handles this load without any issue when the power dies.

      --
      Do not meddle in the affairs of sysadmins, for they are subtle, and quick to anger.
    17. Re:That's what I always say sometimes by alta · · Score: 1

      Yup, I was thrown off by the order of his statements... I read, "I have a cheap UPS", then "I added a Samsung b/w laser... My brain shutdown after that ;)

      --
      Do not meddle in the affairs of sysadmins, for they are subtle, and quick to anger.
    18. Re:That's what I always say sometimes by QuoteMstr · · Score: 1

      We have a UPS that works fine, but one day I decided to stress-test it by plugging it in and unplugging it repeatedly. If I do it often enough, I can the thing to drop power long enough for connected computers to die. Design bug?

    19. Re:That's what I always say sometimes by Anonymous Coward · · Score: 0

      It's a defective or too-small UPS.

    20. Re:That's what I always say sometimes by natoochtoniket · · Score: 1

      Oh -- and you have TWO power supplies to worry about.

      One is the UPS. Everything plugged into the UPS must add up to less power than the UPS can supply.

      The other is the circuit breaker. Everything that draws from that circuit breaker must add up to less amps than the circuit breaker can supply. In most buildings, one breaker supplies several outlets, and often also suplies some lights.

    21. Re:That's what I always say sometimes by PlusFiveTroll · · Score: 1

      Is your plug/home properly grounded? If your not grounded at all or the grounds resistance is too high (25 Ohms is the standard for new housing) your UPS (and surge protector) is worthless as it has no where to dump the surge to and will fault (reset) to protect itself. It sounds like you may already have some wiring issues. http://www.howstuffworks.com/surge-protector.htm is always a good read.

    22. Re:That's what I always say sometimes by Reziac · · Score: 1

      Oh, I got the impression that it was. Well, now I'm wondering if there's a bad breaker on the circuit... or a short in the printer... or... something like that.

      --
      ~REZ~ #43301. Who'd fake being me anyway?
    23. Re:That's what I always say sometimes by Reziac · · Score: 1

      Somehow I missed that part :) But as someone else says, and as I speculate in another post, now it sounds like something is awry in the house wiring.

      I had something like that here, where stuff plugged into a certain kitchen outlet made the bathroom plug flicker or not work, and v.v. Turned out they're on the same circuit (my house was evidently wired by a contortionist) and the polarity was wrong on the bathroom plug. Fixed that and the problem went away.

      Similarly, the 220 outlet in my kitchen doesn't carry enough amps -- so if I try to use both the oven and the stovetop (on my electric range), it'll pop *half* of the 220 circuit's breaker (it's two breakers wired together, however that works). And then the stove goes thru the motions and looks like it's still powered on, but nothing cooks.

      --
      ~REZ~ #43301. Who'd fake being me anyway?
    24. Re:That's what I always say sometimes by RiffRafff · · Score: 1

      Sorry, my attempt at levity. Of course it's a UPS, but since it is allowing the computer to reset whenever I print, it's really more of an IPS. In other words, it's pretty much useless to me right now. ;-)

      --
      "I might have made a tactical error in not going to a physician for 20 years." -- Warren Zevon
    25. Re:That's what I always say sometimes by RiffRafff · · Score: 1

      No, the computer does not reboot when yanking the UPS cord from the wall. In that case it goes to battery, just like it should. It's only when I print, or rather, start to print, and the printer element preheats, that the UPS clicks off and then right back on, and the computer reboots then. Again, printer is plugged into a different wall socket, not into the UPS. The UPS test button functions normally, as does the initial test when you first power up the UPS.

      --
      "I might have made a tactical error in not going to a physician for 20 years." -- Warren Zevon
    26. Re:That's what I always say sometimes by RiffRafff · · Score: 1

      I'll look into that. Although I should mention that both the UPS and Tripp-Lite suppressor both have built-in circuit analysis sections, and neither one indicates a problem.

      However, resistance between neutral and NEMA ground is about 50 ohms, so I'll check some other rooms, and maybe tighten the connections on the buss-bar in the breaker box. Thanks.

      --
      "I might have made a tactical error in not going to a physician for 20 years." -- Warren Zevon
    27. Re:That's what I always say sometimes by RiffRafff · · Score: 1

      Yes, this is a smaller model, the ML-2571N, but the same (in)famous Samsung current draw.

      BTW, the printer is not plugged into the UPS. If I can find a long enough CAT-5, I'm going to move it into another room and see if that makes any difference. Thanks.

      --
      "I might have made a tactical error in not going to a physician for 20 years." -- Warren Zevon
    28. Re:That's what I always say sometimes by jbridge21 · · Score: 1

      My friend has a UPS that is big enough to run a vacuum cleaner on for ten minutes without being plugged in. I think it was an APC 3000VA. If you actually have a larger model, none of the consumer 300VA junk, then plugging a printer into it really should not be a problem at all.

    29. Re:That's what I always say sometimes by Anonymous Coward · · Score: 0

      Someone in my office once put a vacuum on a UPS. There was muffled bgrzzt sound, a puff of acrid smoke, and the network went down -- all of it, since the UPS was one powering the main bank of switches in the network rack. Fortunately it's a rather small network.

      All unused UPS power outlets now have clear packing tape with the word "NO" in extrabold permanent marker over top of them.

    30. Re:That's what I always say sometimes by WuphonsReach · · Score: 1

      I have a Samsung ML2150, and have noticed the same thing. Lights flicker, etc. whenever I submit a print job and the printer transitions from standby to active. The various UPSes in my office sense that, and respond with clicks and beeps.

      If the lights are flickering when a laser printer spins up, then you (very likely) have overloaded circuits (and probably/possibly overloaded cabling). Spend the money and get the electricians to run a separate 20A (or 30A) run for that laser printer.

      (Otherwise, you're likely risking more catastrophic issues down the road. Such as fire.)

      --
      Wolde you bothe eate your cake, and have your cake?
    31. Re:That's what I always say sometimes by Anonymous Coward · · Score: 0

      Back in the dark ages I worked for a very small ISP. One day one of the owners decided he didn't like all the dust on something in the server room and decided to vacuum it up (OK so far). He grabbed the vacuum, walked into the server room, looked around for an available power outlet, plugged the vacuum in and went to town.

      Guess where that outlet drew power from?

    32. Re:That's what I always say sometimes by dereference · · Score: 1

      I had exactly this problem you describe, with two different Conext UPS devices. As far as I can tell, the AVR system simply doesn't work, even with a good fully-charged battery. Yes, AVR is supposed to smooth out exactly these kinds of low voltage situations, where you have another high-draw device on the same circuit. Both UPS devices ended up giving a hardware fault LED/beep code after about two years, and the only answer from Conext (really APC) was to buy a replacement unit (with some nominal credit if I returned the Conext). I've never had any problems with any (of the many) APC devices I've used over the years, beyond changing batteries more often than I'd like. I always try to get the devices with AVR; these don't click over to battery during these low voltage situations, which seems (anecdotally) to keep the batteries in better shape for longer.

      The "solution" is to get a real APC, ideally a model with AVR, or a workaround is to plug the printer into a totally different circuit (often meaning another room, assuming you're talking about a residence). I've found that Conext devices are a lot less expensive for a very good reason.

    33. Re:That's what I always say sometimes by Brianwa · · Score: 1

      Voltage fluctuations are pretty normal. Half of my house is an addition, it's all powered through a 100 foot or so conduit running to the main breaker. Turning on a CRT, anything with power capacitors, or power tools will cause the lights to flicker briefly. There's nothing wrong, just a long cable run and basic physics at work.

    34. Re:That's what I always say sometimes by toddestan · · Score: 1

      I don't understand how the printer could yank the line voltage down so that the UPS faults, and yet a computer plugged directly into the wall can handle it. Unless my computer's power supply buffers better than the UPS.

      Many computers can withstand several hundred millisecond power drops without a problem. So what happens is your printer causes a short drop that the PC doesn't notice*, but causes the UPS to do its thing. I've seen a similar thing at work, where a bunch of desktops have UPSes that are too small for them, so whenever the power flickers those UPSes go into panic mode and immediately tell the computer to shutdown/hibernate, while those without the UPS are generally fine.

      *This is actually pretty hard on the power supply and harddrives in the PC even though it may not reset or crash, so it's best to have the computer on a good UPS.

    35. Re:That's what I always say sometimes by alta · · Score: 1

      Well, I thought that at first, but a google search for interruptible power supply shows TONS of them for sale!

      Like this model:
      http://www.geek.com/ultra-products-interruptible-power-supply/

      Too weird.

      --
      Do not meddle in the affairs of sysadmins, for they are subtle, and quick to anger.
    36. Re:That's what I always say sometimes by alta · · Score: 1

      Yeah, but SHOULD you plug that in? We have a few 2200VA rack mounts that can handle a lot of stuff. Some things are an unecessarily high draw (printer) or just produce a lot of interference? feedback? Anything with a motor comes to mind. We had a somewhat decent 750 blown when the cleaning guy plugged his vacuum into it.

      --
      Do not meddle in the affairs of sysadmins, for they are subtle, and quick to anger.
    37. Re:That's what I always say sometimes by alta · · Score: 1

      We have kids... Lots of those little plastic outlet covers. They're starting to get to the point where they're not needed, I think I'll use them for that.
      Thanks!

      --
      Do not meddle in the affairs of sysadmins, for they are subtle, and quick to anger.
    38. Re:That's what I always say sometimes by The_reformant · · Score: 1

      My table saw is mission critical...*especially* during a power outage.

      --
      I have discovered a truly remarkable sig which this post is too small to contain.
    39. Re:That's what I always say sometimes by RiffRafff · · Score: 1

      Makes sense. Thanks.

      --
      "I might have made a tactical error in not going to a physician for 20 years." -- Warren Zevon
    40. Re:That's what I always say sometimes by RiffRafff · · Score: 1

      I do believe you're right. And this thing is about two years old. I guess it's time for new one. Maybe it should have been named the "Conext 900 POS." ;-)

      Thanks

      --
      "I might have made a tactical error in not going to a physician for 20 years." -- Warren Zevon
    41. Re:That's what I always say sometimes by RiffRafff · · Score: 1

      Is the UPS just too small (900 AVR)?

      Duh? You think it might?

      Uh, duh, nooo, I don't. If you had read a bit more carefully, you would have noticed that I said the UPS clicked and faulted even with nothing plugged into it. I.E., no load.

      So, your snarkiness isn't quite as effective when you don't pay attention.

      --
      "I might have made a tactical error in not going to a physician for 20 years." -- Warren Zevon
  17. Get a UPS by Chemisor · · Score: 3, Insightful

    I really can't understand people who don't have a UPS. Don't you care about your data? At all? The UPS is not very expensive (My BackUPS 900 is very nice and only $100), and will last a long time (you just replace the batteries now and then). Once you are on UPS, you can stop worrying about any power issues, journalling file systems, crash recovery, and all that. The computer will never fail due to power. If you run Linux, it will also never fail due to the OS. If you are a normal user, that means your computer will never fail, period. Seriously, there is no excuse for not having a UPS. Go and get one right now!

    1. Re:Get a UPS by Reziac · · Score: 1

      I always tell people the same thing. For about $100 for a decent home-type UPS, you will never have your hardware trashed by power spikes and sags, and you'll never have your work rudely interrupted or destroyed by a power outage.

      --
      ~REZ~ #43301. Who'd fake being me anyway?
    2. Re:Get a UPS by LBArrettAnderson · · Score: 2, Insightful

      Unless your PSU breaks...

    3. Re:Get a UPS by GuldKalle · · Score: 2, Interesting

      Depends on where you live. Here in Denmark I've only experienced two power outages in my lifetime. One was in a house in the middle of nowhere, during a winter storm, the other was due to an unpaid bill. Under those circumstances I've got a lot of other stuff to spend 100$ on.
      If we were talking about a datacenter, then yes, UPS on everything important. But for home use, nah.

      --
      What?
    4. Re:Get a UPS by Trashman · · Score: 1

      In addition, when selecting one, make sure you buy a properly rated VA for your setup. A UPS will not save you if you have too many devices hooked up that draw more power than the battery can provide.

      --
      Do not read this .sig
    5. Re:Get a UPS by Anonymous Coward · · Score: 0

      You are absolutely correct. It's good thing that hard drives never fail!

    6. Re:Get a UPS by Chemisor · · Score: 1

      > It's good thing that hard drives never fail!

      They don't! Over the last fifteen years of running a computer every day, I have only had one hard drive failure. One. That particular drive was bought used, on eBay, because it was cheap. I think that I can safely say that my chances of experiencing another drive failure any time soon are very very low.

    7. Re:Get a UPS by josh82 · · Score: 1

      "It's good thing that hard drives never fail!"

      "They don't!"

      Tell that to anyone who's was lucky enough to purchase a DeathStar.

    8. Re:Get a UPS by tehcyder · · Score: 1

      The UPS is not very expensive (My BackUPS 900 is very nice and only $100), and will last a long time (you just replace the batteries now and then).

      What batteries, I thought it ran off the mains? Uh oh...

      --
      To have a right to do a thing is not at all the same as to be right in doing it
    9. Re:Get a UPS by Anonymous Coward · · Score: 0

      Only $100? Where? I'm seeing $200 for that model.

  18. Is this bring your kid to work day? by alta · · Score: 4, Funny

    Ok, now everyone has something to give to your kid for the sysadmin-in-traning class.

    For the rest of us... back to work, nothing here you didn't learn your first year.

    For the poster... Shame shame... Turn in your card.

    --
    Do not meddle in the affairs of sysadmins, for they are subtle, and quick to anger.
  19. Uhm, no..? by TheDarkener · · Score: 1

    If you back up religiously, assuming you have the backups on some sort of removable media, why would recovering from them be impossible when data loss via electrical outage occur?

    Dur-durdur!

    --
    It is pitch black. You are likely to be eaten by a grue.
    1. Re:Uhm, no..? by clone53421 · · Score: 1

      If you back up religiously

      Ah, so your denomination takes "pray^H^H^H^Hback up without ceasing" very literally, I take it? :p

      --
      Alexander Peter Kristopeit bought his basement from his mommy for one dollar.
    2. Re:Uhm, no..? by HikingStick · · Score: 1

      Have you ever had to restore from backup? From time to time, things simply fail.

      --
      I use irony whenever I can, but my shirts are still wrinkled...
    3. Re:Uhm, no..? by TheDarkener · · Score: 1

      Have you ever had to restore from backup? From time to time, things simply fail.

      As a computer tech for over 10 years, yes. I've had to restore from backups before. That's what verification of your backups is for - to make sure it happened.

      --
      It is pitch black. You are likely to be eaten by a grue.
    4. Re:Uhm, no..? by SevenDigitUID · · Score: 1

      Because data changes between backups and outages. Even if you are running incremental backups every 15 minutes, that is 14 minutes worth of data you might lose. In a high transaction database environment, 14 minutes is a lot of data. If you are using some sort of "any time data changes back it up" system, you still have to deal with the fact that you aren't getting your backups onto removable media immediately. If your power outage ruins a disk, and the backup server doesnt write its cache out fast enough, you just had a loss. If the outage ruins a disk and a disk on your backup storage, you had a loss. Good backup practices go a long way, but they don't make you bulletproof.

    5. Re:Uhm, no..? by clone53421 · · Score: 1

      Hey! That's what I said! :p

      --
      Alexander Peter Kristopeit bought his basement from his mommy for one dollar.
    6. Re:Uhm, no..? by HikingStick · · Score: 1

      I've been in the field for 15 years. I've run data verifications, and even facilitated physical restoration tests for contingency testing. I've still seen some media failures and data corruption even when data has been previsouly verified and all data handling practices were followed. Such events are rare, but they can happen.

      When you get to smaller businesses, the risk of damage to the backup media are amplified, since media are often accessible to many hands and exposed to a variety of environmental factors. The human element (e.g., spilled sugary coffee) can instantly render a backup tape useless. In such situations, it is rare for someone to come forward and admit the problem when it happens. It's more likely to be noticed when someone wants you to pull that ancient file from the tape that was made 11 months ago--something sticky this way comes.

      --
      I use irony whenever I can, but my shirts are still wrinkled...
  20. Carefully proofredded article by Intron · · Score: 2, Funny

    "3.2. (Ecrypted) file systems"

    Please tell me more about these ecrypted file systems. Do they also do gurnalling?

    --
    Intron: the portion of DNA which expresses nothing useful.
    1. Re:Carefully proofredded article by clone53421 · · Score: 1

      Of course. That's a gnu feature...

      --
      Alexander Peter Kristopeit bought his basement from his mommy for one dollar.
  21. Don't forget the simple case... by s31523 · · Score: 1

    So a UPS is needed, really. Working on a long block of code haven't hit save in a while and no autosave is on... Bam, power is out and you just lost 100 lines of code you spent hours on. Go get a UPS.

    1. Re:Don't forget the simple case... by Richard+Steiner · · Score: 2, Informative

      Real text editors will recover gracefully from such situations. :-)

      (I'm think along the lines of @UEDIT on OS2200 which saves its entire virtual memory state to disk periodically and can recover it with ease at the next startup, or the old EDT editor on VMS which saved the commands one entered and could replay them when a recovery was specified).

      I'm surprised more text editors don't have a similar feature. I think vim does, tho...?

      --
      Mainframe/UNIX Bit Twiddler and long time Windows/Linux Hobbyist.
      The Theorem Theorem: If If, Then Then.
    2. Re:Don't forget the simple case... by Cro+Magnon · · Score: 1

      Been there, done that. Except it wasn't a power outage. I had been coding like crazy most of the day on a mainframe project. I was still a newbie and hadn't gotten into the habit of frequent saves. Suddenly the network went down hard. If you heard a loud screaming noise from the Midwest US in the late 80's, that was me!

      --
      Slow down, cowboy! It has been 4 hours since you last posted. You must wait another few hours.
    3. Re:Don't forget the simple case... by Anonymous Coward · · Score: 0

      Couple of points:

        - What text editors don't have autosave enabled by default these days?
        - If you've been working on code for a while you're going to have wanted to run it (compile it, whatever), which means it's already been saved a lot.

      Not to say that the GP doesn't have a point in general, though

    4. Re:Don't forget the simple case... by Anonymous Coward · · Score: 0

      Working on a long block of code haven't hit save in a while and no autosave is on... Bam, power is out and you just lost 100 lines of code you spent hours on.

      ... a while ... hours...??? Make up your mind. If you lost hours of code because you didn't save, you deserve to lose not only your code but your job as well.

      Years ago, while looking for a job, my wife took a typing class at the local JC. It wasn't a "learn to type" class -- it was for proficient typists to do drills to improve their speed and accuracy.

      The class was conducted on PCs using that IBM cross-platform abortion of a word processor whose name I have mercifully forgotten. At one point, some moron walked by her desk and kicked the plug out. While she was rebooting, the instructor told the class that, for occasions like this, they should save every fifteen minutes. My reaction when I heard this was, "The dumb ass should be fired for criminal incompetence!". Fifteen minutes??? Maybe if you're typing from a hard copy, where it's completely recoverable. If you're doing anything creative, five minutes is likely too long if you get a sudden flash of insight that you might not easily be able to re-create.

      Frankly I'm such a poor typist (yeah, I should have learned to touch type thirty years ago) that I save every couple of lines of code.

    5. Re:Don't forget the simple case... by Richard+Steiner · · Score: 1

      To address your points one at a time:

      (1) Autosave changes the original source file. Editors which save their internal editing context don't -- they maintain their own working copy of the editing workspace(s) independently of the target source files. That way, a user doesn't have to actually modify their original file in order to maintain the integrity of the editing environment in case of a power outage.

      Autosave is a simplistic alternative. An editor like @UEDIT maintains all edited files, search strings, change strings, and other internal environment variables including command history. When you recover, you get it *ALL* back.

      (2) Probably true. :-)

      --
      Mainframe/UNIX Bit Twiddler and long time Windows/Linux Hobbyist.
      The Theorem Theorem: If If, Then Then.
  22. And this is what ZFS looks out for by E-Lad · · Score: 3, Interesting

    ...by design. TFA doesn't delve into too much detail, but a sudden power loss on such software RAID systems is a condition that ZFS accounts for. Its Copy-on-write (COW) and write-length stiping strategy prevents things such as the RAID5 write hole condition, a condition that has the biggest chance of occurring when a power loss event happens.

    1. Re:And this is what ZFS looks out for by X0563511 · · Score: 0, Troll

      What's with all the ZFS spam lately? Did I miss something?

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    2. Re:And this is what ZFS looks out for by Just+Some+Guy · · Score: 1

      Yeah - ZFS. It really is that nice. Check out the FreeBSD wiki for an example of how cool it can be.

      --
      Dewey, what part of this looks like authorities should be involved?
    3. Re:And this is what ZFS looks out for by X0563511 · · Score: 1

      Hmm, have to give it a test-drive. I've been using JFS lately, mostly because of the dynamic inodes.

      Pardon my trollish response earlier, anything that people start randomly sticking in slashdot stories like that immediately flags as suspect (in my mind)

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
  23. no, that's not the scary thing by ScentCone · · Score: 2, Funny

    The scary thing is that yet one more person can't feakin' tell the difference between "loose" and "lose." It's becoming an epidemic.

    --
    Don't disappoint your bird dog. Go to the range.
    1. Re:no, that's not the scary thing by clone53421 · · Score: 1

      I've searched through the parent, the grandparent, the summary, and even TFA (gag)... the only misuse I can find is this statement in the article:

      DRAM needs to be refreshed constantly otherwise it will loose it's data

      Is this what you were referring to? All the other uses of lose, loses, and loss have been correct...

      --
      Alexander Peter Kristopeit bought his basement from his mommy for one dollar.
    2. Re:no, that's not the scary thing by Anonymous Coward · · Score: 0

      Oh, loosen up.

    3. Re:no, that's not the scary thing by ScentCone · · Score: 1

      Nope, that's it. I'm just pitching a fit over that thing, right there. But it's probably the 10th time I've seen it today, in print, in e-mails, and in numerous discussion threads. Just gets under my skin, since it suggests that, as usual, people don't actually think about what they're typing. It's not just a typo - it's right up there with people who say "I could care less about..." when they actually mean the exact opposite. It's just laziness and thoughtless communication, that's all.

      --
      Don't disappoint your bird dog. Go to the range.
    4. Re:no, that's not the scary thing by Anonymous Coward · · Score: 0

      loose it's

      Head asplode.

    5. Re:no, that's not the scary thing by clone53421 · · Score: 1

      Ho hum. I'm actually pretty irritated with my own laziness... I used to notice every little detail like that. I still can if I have my eyes open, but too often I "read the meaning" and miss the misuse of grammar or the typo. :(

      I'm personally very forgiving of "I could care less about...". I feel that it's merely an expression of sarcasm, and I'm very tolerant of sarcasm... it is, after all, one of the highest and most noble forms of expression available to civilized mankind. ;)

      --
      Alexander Peter Kristopeit bought his basement from his mommy for one dollar.
    6. Re:no, that's not the scary thing by JustOK · · Score: 5, Funny

      its not worth loosing you're cool about grammer misteaks and etc.

      --
      rewriting history since 2109
    7. Re:no, that's not the scary thing by Culture20 · · Score: 1
      At the risk of sending you to the emergency room, the person who wrote

      loose it's data

      used the wrong "its" too.

    8. Re:no, that's not the scary thing by clone53421 · · Score: 1

      executioner(jury(judge(JustOK)));

      :p

      --
      Alexander Peter Kristopeit bought his basement from his mommy for one dollar.
    9. Re:no, that's not the scary thing by moranar · · Score: 1

      mmmh, misteak. ssssluuurp.

      --
      "I think it would be a good idea!"
      Gandhi, about Internet Security
    10. Re:no, that's not the scary thing by somersault · · Score: 1

      it's right up there with people who say "I could care less about..."

      I think I love you! :)

      But ceriously. Losen up. Appearantly their are more to life then grammer nutsies.

      --
      which is totally what she said
    11. Re:no, that's not the scary thing by Anonymous Coward · · Score: 0

      Losen up!

  24. A UPS is good to have. Even at home. by Forge · · Score: 3, Interesting

    Last night we had a power outage. I shut down the desktop and was able to continue working for almost 2 hours on the laptop because with the Desktop down the UPS was only carrying the DSL router and the WiFi box.

    At work. Power is a whole enterprise within the company I work for.

    Dual gas powered Generators at each location, Rooms full of Batteries for the Telecoms gear (most is straight DC) and Inverters for the Servers. (DC PSUs are available for some of the servers we use but at so high a premium that the inverters are cheaper.)

    We can handle a dozen Power cuts in a day with no service interruption or data loss ("Tested" 2 weeks ago) and we can stay up without external power for more than a week. After that we have to start trucking in additional diesel.

    Yep. That's right. With sufficient fuel we can be online indefinably. Which we will have to do if we get hit by a major hurricane.

    Which means the phone network is a lot more reliable than the Power grid where I live.

    As for Data loss. I have over the years done a lot of recovery work. "Morfy" of "Murfy's Law" fame isn't a guy or a girl. He is a deamon from the darkest pits of hell sent to torment the souls of IT workers everywhere.

    Imagine a server, where UPS #2 is down for repairs, UPS #1 fails during a power cut, When everything comes back up we find 2 failed hard drives in the RAID 5 on the email server.

    despite previous testing and confirmation that the backups work the most recent tapes failed to read.

    Eventually we sent the failed drives off to a Data recovery company in Florida because

    #1. The customer can afford it.
    #2. Simply "skipping" a few days of Email is not an option for a bank (hence the ability to afford data recovery).

    So yeah. A UPS is essential. Just like RAID, Clustering and Backups but in the end it can all fail.

    Best advise? Memorize all your important data. That way if you loose your mind, you are not responsible for the lost Data (or anything else).

    --
    --= Isn't it surprising how badly I spell ?
  25. Other reasons to run a UPS by rwa2 · · Score: 3, Interesting

    UPS units are relatively cheap, it's well worthwhile to invest in one, not just to protect from data loss:

    * Hardware loss: I've seen a lot of hardware blown up from power interruptions. Do you trust your power company that much to provide clean power to you? Sure surge protectors help a bit, but a decent UPS costs maybe twice as much as a good surge protector.

    * Time lost restoring your session after blackouts / brownouts: OK, maybe you're used to restarting your computer every morning anyway. But I like to leave things open and return to my desktop just the way I left it arranged.

    * Stats: Using NUT and Munin, you get to monitor and log your power, so you can see things like exactly when your electricity went out and for how long, what load your PC is drawing after that last upgrade, etc. e.g.: http://hairball.bumba.net/cgi-bin/nut/upsstats.cgi?host=apc@localhost

    * Graceful shutdown: you have a chance to tell your buddies that your power just went out, and you'll be coming back once it's restored.

    Frankly, I'm a little surprised a backup battery isn't built into PC power supplies already, so they'd work a bit more like laptops. Same with networking gear.

    1. Re:Other reasons to run a UPS by Darkk · · Score: 2, Interesting

      It's human nature. We tend to not think about the future until "oh shit" happens THEN something is done about it. It happens with everything these days.

      Years ago UPS used to be very expensive item and was not the norm for home user to actually own one. Now it's becoming more affordable but the same users who couldn't afford a UPS back then think, "Well, I've been without one for years so why should I need one now?". Same logic applies to what I said above.

    2. Re:Other reasons to run a UPS by Anonymous Coward · · Score: 0

      Graceful shutdown: you have a chance to tell your buddies that your power just went out, and you'll be coming back once it's restored.

      Does this assume your buddies (and the intervening) are on your UPS? Or at least an alternatively-powered network?

    3. Re:Other reasons to run a UPS by v1 · · Score: 1

      Graceful shutdown: you have a chance to tell your buddies that your power just went out, and you'll be coming back once it's restored.

      or if they're local, you get to see them timeout one after another and disappear as you are saving your documents and committing your databases.

      The main unit in the basement holds for over two hours. (has two car batteries to munch on) but that's longer than my ISP evidently.

      --
      I work for the Department of Redundancy Department.
    4. Re:Other reasons to run a UPS by droopycom · · Score: 1

      Frankly, I'm a little surprised a backup battery isn't built into PC power supplies already, so they'd work a bit more like laptops. Same with networking gear.

      Price ? At least for the stuff I use at home, its pretty cheap. Battery would jack up the price.

      Also, batteries stop charging after a while, need to replace them. Its probably more efficient to have one battery for all your stuff.

    5. Re:Other reasons to run a UPS by Darkk · · Score: 1

      Graceful shutdown: you have a chance to tell your buddies that your power just went out, and you'll be coming back once it's restored.

      or if they're local, you get to see them timeout one after another and disappear as you are saving your documents and committing your databases.

      The main unit in the basement holds for over two hours. (has two car batteries to munch on) but that's longer than my ISP evidently.

      That brings up an interesting question about using car batteries since they all use 12volts. So is it safe to actually hook one up to the standard UPS provided the cables are rated for it? A typical UPS battery is only like 7 Amp-Hours units while a car battery is like 40+ Amp-Hours.

      Just curious.

    6. Re:Other reasons to run a UPS by Anonymous Coward · · Score: 0

      That's because human nature it so get things done. Preparing for all the possible 'oh shit' things is a waste of time. So everybody chooses which to prepare for and skips the rest. It is a calculated choice even if you don't agree with their valuations. Why do you just go buy yourself some meteor insurance?

    7. Re:Other reasons to run a UPS by caluml · · Score: 1

      Hmm - you should be careful - it's looks like your voltage is only half what it should be. I'm amazed anything is running at all.

    8. Re:Other reasons to run a UPS by NeoSkandranon · · Score: 1

      It is absolutely hilarious to be proven the only one in an office capable of reading the labels on a UPS's plugs (e.g., battery enabled and plain surge protection) and thus being bathed in the pleasant glow of an LCD when the rest of the office goes dead dark to much profanity :)

      --
      If you can't see the value in jet powered ants you should turn in your nerd card. - Dunbal (464142)
    9. Re:Other reasons to run a UPS by rcw-work · · Score: 1

      That brings up an interesting question about using car batteries since they all use 12volts. So is it safe to actually hook one up to the standard UPS provided the cables are rated for it? A typical UPS battery is only like 7 Amp-Hours units while a car battery is like 40+ Amp-Hours.

      I wouldn't stick significantly larger batteries on any UPS that doesn't have external battery connections. The reason is that the heat sinks on the charger and inverter may be designed to only handle full load for a certain amount of time - giving them more battery power to chew on may cause them to overheat. You may be able to make it safer by adding active cooling, but it's still at your own risk.

      Anything with external battery connections most likely has continuous-duty parts inside.

    10. Re:Other reasons to run a UPS by v1 · · Score: 2, Informative

      What I have is a Tripp-Lite SB-2000, which is an oldie but a goodie. Only link I can find now is here. It runs on 24v external power, so I just set two car batteries on top of it. Picked it up years ago for a song on ebay.

      That unit though really is meant to have massive batteries on it. (looks like 24v golf cart batteries maybe, it has large binding posts on it for the external battery, there is no internal battery)

      You can't just hook a car battery up to some old APC you have sitting around. It may run on it, but there are two factors to keep in mind:

      1) UPS's are designed with cooling in mind. Sure you can put a monster battery on it so it has a runtime (at max output) of an hour instead of 10 minutes, but is it going to catch on fire or just plain overheat and shut down at 30 minutes in?

      2) if it runs off the batteries, it has to charge them back up. The charge circuit faces the same limitations as the inverter in terms of capacity and cooling. Your UPS may run fine for 45 minutes, but then when power comes back, the charge circuit may fry after an hour of continuous load trying to bring the battery back up to full.

      and of course 3) installing a larger battery doesn't affect your maximum output (watts), it only affects your maximum uptime (watt-hours)

      I suppose also 4) is worth considering... not all hardware LIKES to run off a UPS. The power tends to be kinda nasty. I don't even want to know what my old tripp-lite puts out for power but I'm pretty sure it's very dirty. Fortunately all the hardware that's on it doesn't seem to mind. (yet) The longer you run something on a UPS, the more likely you are to damage it if it's not tolerant. I once tried placing a harmonic filter on my tripp-lite. Worked like a charm, put out a nearly perfect and clean sine wave. For about 6 minutes. Then it smoked. The power was simply too nasty for it to filter. Newer UPSs of course do better here. They usually advertise a "modified sine wave", same as you see stamped on inverters.

      Final note: no, you cannot stack UPS's. The line filters on modern UPS's don't like the power coming from a UPS and will switch on when the upstream UPS turns on.

      --
      I work for the Department of Redundancy Department.
    11. Re:Other reasons to run a UPS by Darkk · · Score: 1

      Ok, that makes sense and pretty much figured it's not that simple or safe to do it.

      I have seen large UPS systems that use standard car or marine batteries and the single inverter is a monster in size. It required large amounts of cooling to keep it from blowing up. It always made me nervous to stand close to it.

      Thanks for the insight.

    12. Re:Other reasons to run a UPS by Brianwa · · Score: 1

      While it would work, it's not the best idea to run a UPS off a car battery. Car batteries are designed to put out massive current for a very short period of time (starting your engine), but will not survive being deeply discharged and recharged many times at all. You also have liquid battery acid and greater hydrogen gas output to worry about. Get some proper deep cycle batteries.

    13. Re:Other reasons to run a UPS by rwa2 · · Score: 1

      Graceful shutdown: you have a chance to tell your buddies that your power just went out, and you'll be coming back once it's restored.

      Does this assume your buddies (and the intervening) are on your UPS? Or at least an alternatively-powered network?

      Oh, I'm a geek, so I mean my /online/ buddies in IRC and so forth. So yeah, the latter.

    14. Re:Other reasons to run a UPS by rwa2 · · Score: 1

      Meh, if price was that much of an issue, they'd sell cheap 'n' light plug-in "laptops" without batteries. Most of my laptops usually turn into that anyway once the batteries get old and fail :P

      Plus, if the battery was built into the switch / router / PC power supply, you wouldn't need an inverter to go from DC to AC only to go back to DC again. /Plus/, all of the environmental sensors that go with the UPS for voltage, load, battery charge, temperature, etc. could be directly interfaced to the motherboard, rather than having to connect a separate USB / serial cable.

      So there's still room for this kind of improvement in the PC segment. Of course, all this stuff is already done in laptops, and it probably isn't too much longer until laptops / tablets somehow manage to make traditional PCs completely obsolete.

    15. Re:Other reasons to run a UPS by Renraku · · Score: 1

      Do you trust your power company that much to provide clean power to you?

      The power company provides power that's almost perfect when it comes out of the plans, in theory. However, there's a lot of line equipment between the generators and the consumers. Miles and miles and miles of copper in the form of lines, bus bars, transformers, etc. There's almost no way for them to clean up the power before it reaches your house other than making every single transformer a power regulator as well.

      --
      Job? I don't have time to get a job! Who will sit around and bitch about being broke and unemployed then?
    16. Re:Other reasons to run a UPS by rwa2 · · Score: 1

      Heh, thanks for the concern, but 120VAC @ 60Hz is the norm here in the U.S., my good chap. /am envious of European plugs, though

    17. Re:Other reasons to run a UPS by v1 · · Score: 1

      It's also important to see the difference between a UPS and an inverter with a battery. The UPS has two important additions:

      1) automatic cutover from the mains
      2) recharges the battery when the mains are back

      This is why UPSs are more expensive than inverters. Some UPS's have the additional feature of being able to "boost" or otherwise regulate mains power. So if the power co is providing too high or low of voltage, or line noise, it tweaks the power coming in without cutting over to the inverter.

      The cutover is also an important difference between inverters. Good inverters only cut over when needed, and the cutover is totally transparent to the equipment they power.

      Most modern UPSs also have a USB cable to hook to your computer to talk with the UPS software. There's an industry standard on how this communication works, (huge surprise, imho) and macs have the software built into the OS to automatically manage controlled shutdown when the batteries are about to exhaust.

      --
      I work for the Department of Redundancy Department.
  26. Mine run on evil thoughts and hatred by Joce640k · · Score: 1

    And I bet they has a longer uptime than yours....

    --
    No sig today...
  27. Our Tandem by PIPBoy3000 · · Score: 5, Interesting

    This reminds me of my favorite power loss story. The facility was doing a generator test, where we were supposed to switch over from city power to the generator. Unfortunately it didn't happen smoothly and the UPS kicked in. Sadly it turned out that so many servers had been added since the original design, the UPS was really only good for fifteen minutes or so. The final problem was that our operator didn't notice the issue quickly enough and so the next thing everyone in IT knew is that our main data center just lost power.

    We spent most of the day getting our servers back up from various states of disrepair (confirming the article, power loss is superbad). It turns out that our main medical software ran on a Tandem. Though the drives and such lost power, the CPU had a backup of D-batteries and survived the power loss just fine. Needless to say, we stopped making fun of their seemingly primitive emergency backup power.

    1. Re:Our Tandem by clone53421 · · Score: 1

      Shiny new D batteries for the tandem, eh? That deserves more than just a pat on the head... er, case.

      --
      Alexander Peter Kristopeit bought his basement from his mommy for one dollar.
    2. Re:Our Tandem by ydrol · · Score: 1

      Tandem just got a mention over at DailyWFT

    3. Re:Our Tandem by Alioth · · Score: 2

      This is my favorite power loss story.

      It's great.

      http://www.alioth.net/tmp/vaxen.html

  28. It can be done! by GameboyRMH · · Score: 3, Funny

    ...If you're a Mac fanboy running a network of Apple computers. If anything goes wrong, it's an artistic expression and anyone who criticizes the problem is a closed-minded square who "doesn't get it." Then you sit back in self satisfaction listening to alternative pop, thinking about how hip and different and enlightened you are.

    Happy thoughts power supply: Dead stable.

    Linux networks can run on happy thoughts as well as long as you run on electricity during the setup and installation stages and then switch to happy thoughts once everything's running properly...you just have to make sure you never, ever run emacs, vi, or Gpaint.

    --
    "When information is power, privacy is freedom" - Jah-Wren Ryel
    1. Re:It can be done! by barnackle · · Score: 1

      ...If you're a Mac fanboy running a network of Apple computers. If anything goes wrong, it's an artistic expression and anyone who criticizes the problem is a closed-minded square who "doesn't get it." Then you sit back in self satisfaction listening to alternative pop, thinking about how hip and different and enlightened you are.

      It's funny... I use Linux to demonstrate how different and enlightened I am. Apple is played out. That scene is spent.

    2. Re:It can be done! by fahrbot-bot · · Score: 1

      ...make sure you never, ever run emacs, vi, or Gpaint.

      Umm... -x happy-thoughts-mode
      And for your ~/.emacs file:

      (autoload 'happy-thoughts-mode "htfuncs"
      "Major mode for Happy Thoughts compatible operation."
      t nil)

      --
      It must have been something you assimilated. . . .
    3. Re:It can be done! by telbij · · Score: 1

      No doubt! With marketshare above 5% Apple is lamer than the Seattle music scene. But Linux has been trendy for years too. The best bet to ensure a feeling of superiority for years to come is Plan 9 baby!

    4. Re:It can be done! by mikael_j · · Score: 1

      Of course, the problem with Plan 9 is that even on the most generic of modern hardware it seems to be quite moody, and don't even think of dual-booting it!

      It does get bonus points for the arcane magic required to get the installer to boot though. :)

      /Mikael

      --
      Greylisting is to SMTP as NAT is to IPv4
    5. Re:It can be done! by Anonymous Coward · · Score: 0

      No, you don't need electricity. All you need is a previously setup linux system and a windows user there to watch you use it and drool.

      "You can do that? Srsly?!"

      That phrase alone would be enough to power the entire 10 hour setup that it takes to completely setup and configure my linux machines.

    6. Re:It can be done! by telbij · · Score: 1

      Are you kidding? What hipster doesn't love moody?

  29. Duh! by slashname3 · · Score: 1

    Post this under most obvious thing ever!

    I guess the author wasn't worried about any events or transactions that were in the process of being committed. Nor has he managed any production databases.

    Next thing you know there will be an article about not being able to surf the web when the Internet connection is down.

  30. He didn't mention RAID batteries by Percy_Blakeney · · Score: 1

    He fails to mention battery packs for RAID cards. They maintain power to the disk cache memory on the card in the event of a power failure, which allows the card to finish writing the cache to disk once main power is up again. That's one of the arguments for a hardware RAID solution.

  31. ZFS by Anonymous Coward · · Score: 1, Informative

    Always? Maybe if you are using Linux. Not if you are using an OS that runs ZFS filesystems.

    --AC

  32. don't try this at home, kids by v1 · · Score: 1

    Linux software RAID, and any RAID basically, needs to know if the disks of the array are still properly matched to eachother when the array is initialized. When power fails, or when you press reset, they will be in a "dirty" state, and the system may need to recreate the array. That is, if it can. I've never tried it, but I can imagine that a RAID0 can be completely destroyed by a power failure. But, don't take my word for that...

    One way is if the partition table and drivers on one slice gets trashed, and the first few meg of the data partition (directory mostly) get trashed on the other slice, by the same event that also happens to hang the computer.

    You have no idea how unpleasant it is to reconstruct a partition table from scratch and reinstall firewire driver partitions using DD. It didn't BOOT, but I was able to bribe it to mount and copy the data off.

    --
    I work for the Department of Redundancy Department.
  33. lcd's dont degauss by scharkalvin · · Score: 1

    I never thought of the problem with the degaussing coil in the monitor. But then again, who still uses a CRT anymore? (Well I do at work since my company is too cheap to buy me a new monitor). Point is you can leave the monitor powered by the UPS if it is an LCD type.

    1. Re:lcd's dont degauss by Alioth · · Score: 1

      I still use one. My 21 inch Trinitron still works fine and has a nicer picture than all but the most expensive LCD panels. Its only drawback is its enormous weight, leaving a permanent bow in my desk!

  34. My personal favorite experience... by Loco3KGT · · Score: 1

    At my company's NOC the UPS failed... so everything failed, except the fancy new generator they had just installed.

    Big problem though: when the UPS totally crapped itself, power from the generator couldn't get through the UPS to any of the devices plugged into it. Whoops.

    --
    Blessed be he who reads this post, Cursed be he who tells my boss.
    1. Re:My personal favorite experience... by Darkk · · Score: 1

      Yep, that is a biiiiiiiiiig whoops. Hopefully you guys have a UPS service contract to get that sorted out fast.

    2. Re:My personal favorite experience... by Anonymous Coward · · Score: 0

      Your experience is not atypical. For the data center I've managed for 15 years, we have had only two power outages, but more than a dozen outages caused by UPS problems. If you have a decent power company you can expect your power to be out maybe once every five years. We have Duke Power in a city in NW South Carolina. Even with our expensive Liebert online UPS's we have trouble that causes downtime every two years or so.

      For better reliability I have our secondary DNS and several backup systems connected directly to utility power without being connected to a UPS or generator.

    3. Re:My personal favorite experience... by myowntrueself · · Score: 1

      Mine would be...

      We had a whole rack full of nice shiny Dells all with dual power supplies.

      Some very bright and intelligent guy figured that it would be a great idea to hook one side of their powersupplies up to the mains and the other half to the UPS.

      One day we had a power failure. The UPS immediately overloaded and failed. The whole server room went dark.

      See, the load on the UPS from the Dell rack was normally distributed between mains and UPS...

      Mains fails, all power gets drawn from UPS, UPS chokes and dies due to the sudden load.

      Hilarity ensues!

      --
      In the free world the media isn't government run; the government is media run.
  35. Chose your UPS carefully, and TEST it the hard way by jalet · · Score: 4, Informative

    This morning we had a planned shutdown of 100 servers for eletricity works, all were on the same 40 kVA UPS. All went fine, we shutdown all servers to be safe, and kept some stuff online for montoring and the like, then main power was shut off. The UPS gladly took the load, with an estimated battery life of 75 minutes, more than what was needed for the electrical work. Once this was done, the electrician put the main power back on, and... the UPS shutdown !

    Since all servers were stopped already we didn't lose anything, but we had to put the UPS in bypass mode for a while, then back on, and now we hope for the best waiting for the UPS to be repaired, crossing most of our fingers because of the holidays...

    In summary : testing that the UPS can handle the power coming back is as important as testing for it to be able to handle the power shutting down.

    --
    Votez ecolo : Chiez dans l'urne !
  36. All it took was a Panel... by Anonymous Coward · · Score: 1, Interesting

    I worked for a respectable insurance company. The other day a "well-known" H/W maker came to our place to upgrade the hardware for a mainframe, in our computer room.

    They unscrewed the mainframe's panels and put them aside, on the large thingy right beside it.

    That thingy aside happened to be the UPS, which started to heat up, having its vents blocked by the panels. At some point, it gave up, sending a massive "shutdown now" command to all connected computers, including most of the web infrastructure...

    It's been more that 2 days now, and we are still struggling to bring all the pieces together...

  37. Re:A UPS is good to have. Even at home. by v1 · · Score: 2, Informative

    Last night we had a power outage. I shut down the desktop and was able to continue working for almost 2 hours on the laptop because with the Desktop down the UPS was only carrying the DSL router and the WiFi box.

    good uptime for a laptop. got a second battery? (I know I do)

    Inverters for the Servers. (DC PSUs are available for some of the servers we use but at so high a premium that the inverters are cheaper.)

    that's because it just has to invert it before it can step it up or down. If you supply DC you are actually introducing another necessary step. It gets hard to cram 2x the electronics into the PS. Inverters are definitely the way to go.

    We can handle a dozen Power cuts in a day with no service interruption or data loss ("Tested" 2 weeks ago) and we can stay up without external power for more than a week. After that we have to start trucking in additional diesel.

    Yep. That's right. With sufficient fuel we can be online indefinably. Which we will have to do if we get hit by a major hurricane.

    Might want to rethink how easy it is to get a truck in during a hurricane. ;) Unless it's more of a boat, think Katrina.

    Imagine a server, where UPS #2 is down for repairs, UPS #1 fails during a power cut, When everything comes back up we find 2 failed hard drives in the RAID 5 on the email server. despite previous testing and confirmation that the backups work the most recent tapes failed to read.

    um, ouch?

    Best advise? Memorize all your important data. That way if you loose your mind, you are not responsible for the lost Data (or anything else).

    Was going to say, all of the above is moot if an EF5 rolls through town. Better add "offsite backup" to your list if it's not already there. With the EF5 that ran through here last month, some people got their backups turned into "offsite" backups. (maintenance guy was here last week, said they are still looking for their dump truck )

    --
    I work for the Department of Redundancy Department.
  38. Battery Backup by MattW · · Score: 1

    TFA has no mention of battery backup, which is insane, since not only can it improve reliability, it can allow the drive to return write success before the data actually makes it to disk, leading to significant write performance gains in many circumstances.

  39. Thanks Captain Obvious by bravecanadian · · Score: 2, Informative

    Any professional server or data center setup that does not include a UPS for a graceful shutdown... is almost by definition NOT professional.

  40. Voltage Spikes by natoochtoniket · · Score: 5, Informative

    The typical small UPS system has some amount of surge protection built-in. But it's typically only good for at most a couple thousand joules. But then, if you get a spike that is big enough to blow a varister, you also get to buy a new ups.

    A better solution is to put a "whole house" surge protector on the circuit-breaker panel. It protects everything, with a much higher number of joules. Five or six pounds of varisters can absorb a lot more shock than one ounce of varisters. They cost about $100, and can be found at most big hardware stores or electrical supply houses. That doesn't eliminate the need for a ups. It does protect the ups, along with the other equipment, from most voltage spikes.

    Last year, lightning hit the power pole 20 feet from my house. We know where it hit because the pole caught fire. My next-door neighbors on both sides lost every single piece of electrical equipment -- not just computers, TV's, and stereos, but also fridge, microwave, water heater, and range. All of it was damaged beyond repair. We barely noticed the hit, except for the bright flash of light, and had no damage at all.

    1. Re:Voltage Spikes by Darkk · · Score: 1

      I wonder if this is a standard practice in Florida home having varisters system installed in the electrical panel?

      If I am ever going to build a house anywhere I'd definitely have it done since it sounds cheap to do.

    2. Re:Voltage Spikes by mhall119 · · Score: 1

      I wonder if this is a standard practice in Florida home having varisters system installed in the electrical panel?

      It's not.

      --
      http://www.mhall119.com
    3. Re:Voltage Spikes by Darkk · · Score: 1

      I wonder if this is a standard practice in Florida home having varisters system installed in the electrical panel?

      It's not.

      I guess anything is not required by code they won't do it.

    4. Re:Voltage Spikes by clone53421 · · Score: 1

      Mod parent insightful? or maybe redundant... ;)

      --
      Alexander Peter Kristopeit bought his basement from his mommy for one dollar.
    5. Re:Voltage Spikes by Jay+L · · Score: 2, Insightful

      I'm not convinced that whole-house protection helps much either. A few years ago, there was some event during a thunderstorm - we never quite figured out what - that fried two TiVo modems, a garage door opener (the circuit board was visibly burned and light bulb shattered), a few Wirsbo hot-water thermostats (not even connected to the mains power, just low-voltage from the boiler), a few Vantage whole-house dimmer modules, an intercom, and a printer.

      The house was, at the time, "protected" with two Cutler-Hammer CHSP suppressors (MOV). After the incident, their "protection working fine" LED was still lit! The only room with no damage was my recording studio, which had Equitech balanced-power panels; the ginormous-hunk-of-iron transformer probably saved me there. The power company had no reports of direct lightning strikes, other than one hit that took out a transformer (and since my power didn't go out, I apparently wasn't on that circuit).

      I recall doing some reading about lightning arrestors, ground grids, and such, and eventually came to the conclusion that it (a) surge suppressors are fairly useless, because they don't always present the quickest path to ground, and (b) it would be 10x cheaper to let stuff die and replace it than to set up a proper lightning protection system.

    6. Re:Voltage Spikes by natoochtoniket · · Score: 4, Informative

      The path-to-ground is really important, as is the quality of the ground. The length of the path is the reason why whole-house devices are installed at the service entrance panel. But, that assumes that your service-entrance ground is a good ground.

      If your ground is not good, shorting to ground won't do much good. A lot of houses around here are grounded to plumbing pipe that is buried just 12" deep. During a dry spell a few years ago, I detected variable voltage where it shouldn't have been. The voltage problems cleared up after I added an 8-foot vertical ground rod to the system.

      The thing that kills a surge protector is too many amps for too long. If it shorts the power to ground (low-resistance), but the ground is not really well-grounded, then the whole thing can float close to line-voltage. In that case, that voltage can destroy your other devices, while the surge unit never gets enough current to burn the varisters.

    7. Re:Voltage Spikes by Darkk · · Score: 1

      When I installed my satellite dish I installed a 8 foot copper rod into the ground and bonded it. Most home satellite installs don't do this and not even grounded at all.

      It's rare when a lightning hits the dish but when it happens all of that juice is gonna go straight to the receiver and then the tv along with everything else connected to it and hopefully it won't arc across the room towards you.

      I'm sure the wife wouldn't be to happy.

    8. Re:Voltage Spikes by russotto · · Score: 2, Informative

      I'm not convinced that whole-house protection helps much either. A few years ago, there was some event during a thunderstorm - we never quite figured out what - that fried two TiVo modems, a garage door opener (the circuit board was visibly burned and light bulb shattered), a few Wirsbo hot-water thermostats (not even connected to the mains power, just low-voltage from the boiler), a few Vantage whole-house dimmer modules, an intercom, and a printer.

      Common-mode spike. The power line was fine, but your ground got knocked up to a few kilovolts by a nearby strike.

    9. Re:Voltage Spikes by Jay+L · · Score: 1

      Oh, that's the part I forgot - we did test the ground quality, and it wasn't transmitter-quality, but it was better-than-average for a residence - IIRC, we'd buried some deep copper rods for it, since the recording studio would need a solid ground. (I wish I could remember the actual numbers.) So that wasn't it either...

    10. Re:Voltage Spikes by Jay+L · · Score: 1

      Common-mode spike

      That makes perfect sense. And I suspect it also helps explain the lack of damage to the equipment that was on balanced power; I bet the Equitech balances hot against ground, and doesn't touch neutral at all. Or something like that. (I'm a software guy.) Thanks!

      Is there any way to protect against common-mode spikes? Do some other surge protectors offer that?

    11. Re:Voltage Spikes by somersault · · Score: 0, Offtopic

      My employer just modded me redundant because I spend all my time reading slashdot, you insensitive clod!

      --
      which is totally what she said
    12. Re:Voltage Spikes by hedwards · · Score: 1

      If you're going to go to that sort of trouble, you may as well go the whole distance and upgrade from the whole house surge protector to a whole house line conditioner.

      It's going to cost more, probably significantly more, but that way you know that any power in the building is nice and clean.

      But either way, you're still stuck with a UPS on any and all non laptops anyways, so you're going to have be using those as well.

    13. Re:Voltage Spikes by unitron · · Score: 1

      Those two Tivo modems were probably fried by a voltage spike on the telephone line itself, which means nothing you did or could do to your house wiring would have made a difference. Tivo modems are notorious for suffering damage under those circumstances.

      Your telephone line (or to be more accurate, anything you connect to your telephone line) needs protection. In addition to commercial solutions, such as surge suppressors and UPSes that have modular phone jacks in addition to the 120 Volt part, tie a knot in each telephone line near each end. It won't have much inductive reactance at audio or even DSL frequencies, but the rise-time of a lightning induced voltage spike is so short as to be the same as a very high frequency, and as frequency increaases, the reactance of any given inductance does the same.

      --

      I see even classic Slashdot is now pretty much unusable on dial up anymore.

    14. Re:Voltage Spikes by leuk_he · · Score: 1

      A small ups are more likely to absorb a couple hunderd joules (check it that is a factor 10...). YOu find out the difference when the computer blows out smoke to protect the fuse from blowing out.

    15. Re:Voltage Spikes by evilviper · · Score: 1

      A few years ago, there was some event during a thunderstorm - we never quite figured out what - that fried two TiVo modems, a garage door opener (the circuit board was visibly burned and light bulb shattered), [...] The house was, at the time, "protected" with two Cutler-Hammer CHSP suppressors (MOV). After the incident, their "protection working fine" LED was still lit!

      It has already been pointed out that this was very likely due to common/ground spike. But more to the point, this illustrates my biggest pet peeve with surge protectors and (home) UPSes...

      Surge protectors are all exactly the same, no matter how much you pay for them. From the "free after mail-in rebate" to the bulky, major brand-name, $100 gold-plated wonders with the $500 million insurance policy. They all do exactly the same thing... They buy a 2"sq. varistor package for $2 and connect it inline on the "hot" wire. Neutral and Ground lines are passed through, entirely unfiltered and unprotected.

      And you're not the only person to get screwed over by this fact. Just about a year a friend of mine went through a similar incident... An electrified fence, some distance away, shorted to ground, and absolutely destroyed every small electrical device within a square city block. The responsible party was required to pay for much of the damage, but the point remains: surge protectors are COMPLETELY useless in such a scenario, a scenario which occurs fairly frequently. For the price of 3 fuses, which can be found in any home hardware store, any power strip can be modified to be practically impervious to a surge of any size, over any of the 3 power lines. Companies that make surge protectors and home UPSes opt, instead, to offer damn near no protection, while fooling the public into believing they're somehow being made safe.

      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
    16. Re:Voltage Spikes by natoochtoniket · · Score: 1

      Surge protectors are all exactly the same, no matter how much you pay for them. ... They buy a 2"sq. varistor package for $2 and connect it inline on the "hot" wire. Neutral and Ground lines are passed through, entirely unfiltered and unprotected.

      Not true. In cheap 110V units, the single varister is connected BETWEEN the hot and the ground. MOV varisters have high resistance at low voltage, and low resistance at high voltage. If the varister burns out, the electricity still goes to the load, but there is no protection.

      In surge protectors that are designed to be installed to an electrical subpanel, there are SIX (6) varisters. They are connected between hot+/ground, hot+/neutral, hot-/ground, hot-/neutral, hot+/hot-, and ground/neutral.

      In protectors that are designed to be connected to a service-entrance panel, only three (3) are needed, because the ground and neutral are bonded at the service entrance. They are connected between hot+/ground, hot-/ground, and hot+/hot-.

      And, in all cases, the units that have higher joule ratings and higher amp ratings are made with higher-rated varisters.

      It is important to note that all of the wires pass through. There are no 3/0 gauge wires inside a whole-house surge. The unit just connects to the breaker bus (via a double breaker) and to the ground and neutral buses with little 14 gauge wires. (Those little wires are enough to carry several thousand amps for a few milliseconds without melting.) If a varister gets fried, the electricity still gets to the load, and there is no protection.

      The bus configuration means that all of the surge protectors in a phase work together to absorb the energy. If you have ten little 800-joule surge protectors on a phase, the loads on that phase are protected up to a big fraction of 8000 joules, depending on the length of the wires. (If a load is closer to the service entrance than some of the surge protectors, it might get burned during the nanoseconds before those protectors short. This is why it is better to put the protection at the service entrance.)

    17. Re:Voltage Spikes by natoochtoniket · · Score: 2, Informative

      Neutral and ground are supposed to be bonded at the service entrance panel, and not anywhere else. If the ground is actually grounded, with a big copper wire to a big copper spike that goes deeper than the water table, that will normally provide the path of least resistance for the electricity to follow.

      A lot of houses don't have a good ground connection. Most building codes (and the NEC) allow 25 ohms resistance on the ground connection. But it's hard to measure, so the building inspectors don't measure it. In order to measure it, you have to install an additional 8-foot spike ten feet away from the ground connection you want to measure.

      Plumbing systems used to be metal pipe, so a connection to plumbing was an adequate ground. But, now, most new plumbing is plastic, an insulator. A few years ago they tore up the streets in my neighborhood to install new water pipes (plastic of course). After they did that, the only ground on my house was the short length of metal pipe that ran from the house to the meter. And that pipe was less than 12 inches deep, in dry sandy soil.

      The easy way to be sure that you have a good ground is to install two new 8-foot spikes, at least 10 feet apart (from each other, and from any existing ground spike). Measure the ohms between them to be sure you have less than 25 ohms. Then bond BOTH of them to the existing ground at your service-entrance panel using bronze clamps and 6-gauge or larger copper wire. Costs less than $100, and can be done in just an hour or two.

    18. Re:Voltage Spikes by Dare+nMc · · Score: 1

      voltage problems cleared up after I added an 8-foot vertical ground rod to the system.

      Living in AZ, I can tell you all our radio towers, after installed, are then prepared with a large bag of salt then soaked deep into the ground with several waterings to get this good ground (and repeated every few years.)

      However you should realize that current takes the path of least resistance. So by having the best ground path, I hope you have the best surge suppressor close by (IE not a 100' of 14G ground wire away from your ground), cause more of any surge is now coming your way (IMHO). The earth ground is mostly their to protect people, not equipment. IE if your computer has no earth ground path nearby, it wont care if their is a brief 1000Volt offset between earth ground, and the computer ground, as long as the computer ground, and line stay 120Volt apart. Now if a person is touching a metal desk, touching the computer, your body has a capacitance that would love to be charged by any spike... So having the best ground, will make your house safer for people. But may make your house the "lightning rod" of the neighborhood for your electronics.

    19. Re:Voltage Spikes by Dare+nMc · · Score: 1

      and not even grounded at all.

      Every working small-dish is grounded, if they didn't bond outside or place it on a metal surface, then yeah the receiver is providing the ground, very bad if the dish builds a sudden charge (lightning lights up the coax cable.)
      Definitely SAT gets a much better signal, with a better ground. However to best protect the people/equipment I would think you would want something else to be your lightning rod. IE another structure higher up, with a better ground. Making your sat the best grounded path just intuitively seams like something to avoid when lightning is around.

    20. Re:Voltage Spikes by HTH+NE1 · · Score: 1

      A co-worker lost a lot of equipment in a lightning strike recently. Anything with coaxial cable or network cables got fried.

      And his battery-operated alarm clock spontaneously reset as well.

      Once you're dealing with charged air allowing current to jump the air gap between clouds and ground, it doesn't much matter what you're doing with your little metal wires.

      Me, I have a transmitter tower a few blocks north of my house tall enough to take the lightning and far enough away not to affect my home.

      --
      Oh, say does that Star-Spangled Banner entwine / The myrtle of Venus with Bacchus's vine?
    21. Re:Voltage Spikes by natoochtoniket · · Score: 1

      You are quite right that there are also other electromagnetic effects. Radio receivers, because of their induction coils, can be damaged even when they are not plugged in at all. There's not much we can do about that. The EE's who design such systems usually try to manage those issues, until their PHB's cut cost by eliminating the "extra" capacitor.

      And electricity can use telephone or cable-TV wire just as easily as it can use power wire. In many communities, the phone and cable wires are hung on the poles below the power wires, and the neutral/ground wire is at the top of the pole. That arrangement usually protects the phone and TV wiring, because the power wiring is higher and better grounded.

      The way the electrical system is set up, there are lots of grounds in a neighborhood. The power company puts a ground at every third or fourth pole, and every structures is supposed to have a ground. And all of those grounds are supposed to be bonded to the neutral. So there are lots of different paths that the electrons can take to get from the sky to the ground. The easiest paths are usually via the neutral wire at the top of the pole to the various ground connections in the neighborhood.

      When there is a lightning strike on that neutral at the top of the pole, the electrons flood the neutral/ground conductors and use all of the nearby paths to ground. When there is a strike on one of the hot wires, it also takes all available paths. The paths that have low resistance get more of the current than the paths that have high resistance.

      So, the idea is just to make sure that the paths that go through my expensive equipment have higher resistance than some of the other available paths. This is done by providing lower-resistance paths (mainly surge protectors, and good grounds). Of course, some of the other available paths go through other peoples houses, so my ground connection doesn't have to actually carry the entire current of the lighting strike.

      Turns out that a 6-gauge wire can carry a ludicrous number of amps for a small fraction of an second without melting. A larger wire is better of course. Most of the lightning-rod systems that I have seen use 2-gauge or larger wire.

      All of the above is assuming that your power distribution is on poles. If you have underground power, then the situation is reversed. In that case, the easiest path for lighting to take is often to hit a house. When that happens, the wiring in that house gets the full current, with no distribution to other structures. If I lived in such a neighborhood, I would seriously consider adding a lightning-rod system.

  41. I run all my computers on UPSs by EmbeddedJanitor · · Score: 1

    The major reasob for doing this is that I live in a rural area where bad weather can make the power glitchy. One of the neatest things about using UPSs is that I can unplug stuff from the wall (eg if I need to move cables to a different power socket) and keep the computers alive.

    --
    Engineering is the art of compromise.
  42. He forgot UPS-triggered shutdown by SleptThroughClass · · Score: 5, Insightful
    The author did not mention having the system set up to have the UPS trigger an automatic shutdown.

    If you're not at the machine, or don't know how to shutdown without a CRT, the disk can get messed up when the UPS runs out of power. Unless you only have a desktop machine with no network applications writing to disk (no BitTorrent); then you might be OK if you just walk away from your keyboard and let the system become quiescent before it loses power.

  43. once again ZFS shows it's ahead of the game by Anonymous Coward · · Score: 0

    ZFS has end-to-end checksums for every block, for data and for meta-data, this problem will never arise for it.

    http://opensolaris.org/os/community/zfs/faq/

  44. wow by BigJClark · · Score: 0, Troll


    People, this is such a non-issue. Are there really places out there that run prod-level systems without battery backup? I bet those sysadmins got their degree from a cereal box.

    --

    Hi, I Boris. Hear fix bear, yes?
  45. What about data gains? by OrangeTide · · Score: 1

    Everyone is worried about data loss, but sometimes I wonder what sort of data gains happen without our knowledge or consent?

    --
    “Common sense is not so common.” — Voltaire
  46. Reason by RAMMS+EIN · · Score: 1

    For those who want to know what TFA says without actually reading it, it boils down to:

    1. What you think you have saved and what has actually been written to stable storage may not be the same. In particular, things may still sit in DRAM, waiting to be written to disk.

    2. What gets written to stable storage after the power failure may not be what was intended to be written. You could end up with corrupt data.

    3. That's the hardware side of the story; software introduces many more hazards by lengthening the path between your actions and stable storage.

    --
    Please correct me if I got my facts wrong.
  47. Recondition those RAID cache batteries! by Anonymous Coward · · Score: 0

    And remember to recondition those RAID controller cache batteries! Nine out of ten servers I ran in to at my last job had a daily-recurring syslog entry that the RAID battery was shot because no one had ever bothered to recondition it because of the (relatively minor) performance hit to turn the cache off.

    If I remember correctly, LiveJournal had a MAJOR data corruption issue where they had to reformat and restore off of tape to repair because their cache batteries had gone tits up and their wonky drivers lacked the verbiage to remind them.

  48. Why keep forcing disk syncs? by Mr.+Arbusto · · Score: 1

    I'm confused by a number of his recommendations. LVM storage is default for most distros now. There are a few things that he suggests...like keep forcing disk syncs, that slow down the processes and still allow for files to change as you read them.

    2005 called, they wanted to let you know snapshots are stable.

  49. Encrypted file systems? by thomasdn · · Score: 1
    From the article:

    According the Gentoo Wiki, you are even more susceptible to data loss in the event of a power failure when using an encrypted file system. I have to admit that I can't think of the reason why this would be so, because as I explained, after a power failure, everything that is written to disk is garbage anyway, whether it passes through some encryption pipeline or not. But, it's something you want to keep in mind.

    Can anyone please exlpain why encrypted file systems should be more susceptible to data loss? (if it is true, of course. If not, please confirm that it isn't)

    1. Re:Encrypted file systems? by X0563511 · · Score: 1

      Most disk cyphers are block ciphers. What this means is, that even if you only lost 1/8 of that block as it was ciphered and written to disk, any data that has a bit in that block is now damaged. Because of the nature of disc encryption, there's no connection between filesystem blocks and cipher blocks used by the algorithm, so you may loose more than one filesystem block as a result of a damaged cipher block.

      Wikipedia has a good article on block ciphers.

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
  50. WTF?!?! by Anonymous Coward · · Score: 0

    What kind of idiot would run a server without a UPS?!?! Perhaps some pimply faced 15 year old... but you wouldn't catch me dead without my 80 KVA UPS!!!!

  51. I dont get it by bizitch · · Score: 2, Insightful

    1) You build a RAID5 array
    2) You backup
    3) You test your backups
    4) You plug your server DIRECTLY INTO THE WALL?!?!

    Ummm DUH! Of course you need a UPS - what kind of yutz does 1-3 and then powers the server off of unconditioned wall power?

    --
    ---- "Logoff! That cookie shit makes me nervous!" - A. Soprano
  52. Power failures don't lead to loss of money... by dpbsmith · · Score: 1

    ...in financial transactions. Database transactions are interlocked in such a way that if $1000 is transferred from an account in bank A to an account in bank B, then no matter what happens, come hell or high water, when the dust settles the $1000 has either been moved to bank B or remains in bank A. There cannot be $0 in both or $1000 in both.

    If file systems aren't designed to work this way, it's not because of any intrinsic limitation on what is or is not possible, it's because system designers have made a conscious effort to favor speed over reliability.

    Even in supposedly mission-critical servers.

  53. Thoughts by RAMMS+EIN · · Score: 1

    I have thought about this matter, and I think it is important to factor in how much data is an acceptable loss.

    My /home is backed up every night, and backups are kept for 12 months. That means there are three ways I can lose data:

    1. I lose it before it's been backed up. This applies to, at most, the last 24 hours of work I do.
    2. I lose data after it's been backed up, and I don't notice for at least 12 months.
    3. The data disappears from both the live system and the backup, before I can recover it.

    2 is fairly unlikely. 3 is one that worries me, but I'm working on that by arranging my backup to be duplicated off-site. 1 is the one that bothers me the most. I imagine my harddisk failing on the day I just finished a large project, thus replacing the joy of having finished it by the pain of just having lost the final pieces.

    What TFA talks about (losing data before it has been written to disk) doesn't worry me so much. I doubt I'd lose more than a few minutes of work that way. And I'd know that something had failed, so I would expect the data loss.

    It gets more worrisome when you provide services to remote users. Imagine that you run an e-commerce website, and a customer has just placed an order and received a confirmation, and then your machine goes *poof* before the order has been comitted to stable storage. The customer would not be pleased, especially if they had already paid for the order.

    I think, in cases like the e-commerce example above, you would want to make sure that changes had been recorded before telling anyone that they had. And now comes a question: is there any way I, as a programmer, can verify that something has been written to stable storage? Can I tell the system (library/operating system/database/whatever) "write this down and don't return to me before you've actually written it"? And preferably without writing _everything_ to stable storage.

    --
    Please correct me if I got my facts wrong.
    1. Re:Thoughts by RockMFR · · Score: 1

      I used to back up my home every night, too, but the cost was ridiculous. Sure, it was easy enough to call up Countrywide and get a new mortgage every day (they think my income is "hundreds of billions"), but it's not so easy for the furniture. And ordering Russian brides to fill those houses is no walk in the park.

  54. Mmmm! Puppies!!! by Gription · · Score: 4, Interesting

    Less filling but tastes great!


    Ok back on subject
    A UPS isn't even a panacea... I had a server lose 3 out of 4 HDs in a 4 hour period. (The 3rd drive went at 4:57 PM Thursday Dec 11th 1997. Not that I would remember...) When I looked at the service history on it it had been losing drives for 8 months at an accelerating rate.

    Turns out that the 3000va rack mount wonder UPS from that big, well known vendor was the problem. The switching unit in it was sending spikes into the equipment.

    They wouldn't warranty it so I ended up putting a Triplite ISObar surge suppressor between it and the server in our test environment and it was in service for years after that.

    Never trust any piece of equipment...

    1. Re:Mmmm! Puppies!!! by Kent+Recal · · Score: 1

      They wouldn't warranty it

      Hmm, that's funny. So you bought a midrange UPS (which must've been quite expensive in 1997) to protect your equipment and then that very UPS goes nuts and destroys your equipment? And then the vendor has the balls to refuse RMA, much less to cover the destroyed hardware?! Which UPS-vendor was that again?

    2. Re:Mmmm! Puppies!!! by sexconker · · Score: 1

      We all know it's APC.
      And APC products are actually pretty damned crappy, sadly.
      The software is absolutely horrible, to boot.
      From what I hear the warranties and damage protection are basically lies.

    3. Re:Mmmm! Puppies!!! by Anonymous Coward · · Score: 0

      A major one.

    4. Re:Mmmm! Puppies!!! by maglor_83 · · Score: 4, Funny

      They wouldn't warranty it so I ended up putting a Triplite ISObar surge suppressor between it and the server in our test environment and it was in service for years after that.

      Never trust any piece of equipment...

      You mean like a Triplite ISObar surge suppressor?

    5. Re:Mmmm! Puppies!!! by hedwards · · Score: 1

      That's sad, I've been using a couple of cheapo no name UPSes for a couple of years and have yet to have a problem with either of them. I haven't had a single HDD go bad in that time compared with like 4 in the 6months prior to installation, and apart from not having a utility for non windows OSes, I have no complaints.

      Funny thing is that I spent like a total of $100 on both of them, and have probably saved more money than that on HDD replacements.

    6. Re:Mmmm! Puppies!!! by Gription · · Score: 2, Interesting

      I have had hundreds of APC UPSes that never had a problem. The one that ate my server just happened to be the one for the core database running a mail order company... 14 days before Christmas. At that point we were doing $90k a day.

      The reason I remember the exact minute it failed was I had my bag in hand and was walking toward the door when the server alarm went off.

      18 hours later I found out from the backup software vendor that there was a bug in the software that meant it wouldn't restore any rights information so the server configuration was totally lost.

      The backup that saved us was a DOS batch file that copied everything down to a PC. 43 hours later I was able to actually go home.

      After the blowup they finally approved the request for a secondary server.

    7. Re:Mmmm! Puppies!!! by orangesquid · · Score: 1

      Pfff. There's nothing that a large capacitor bank or stored-inertia motor-generator pair can't do. Who needs some tiny little solid-state UPS? If you run over it with a truck, it stops working! And, forget this DIMM memory. A large cabinet of core memory is not only economical and practical (keeps your server room warm, also functions as an EMF detector and cosmic ray detector---four uses in one!) but is inherently non-volatile and stylish to boot. Lost power so long that your stored energy systems got exhausted? No need to fear, thanks to magnetic hysterisis technology, core memory deteriorates quite slowly when kept in good field isolation!

      --
      --TheOrangeSquid Is it any wonder things seem so awry? We swim in a sea of confusion and don't have to think to survive
    8. Re:Mmmm! Puppies!!! by sexconker · · Score: 1

      Sure, you can have tons that don't fail.

      But APC is notorious for overestimating their capacities, not delivering on warranties/damage reimbursements, having HORRENDOUS documentation, support, and software, and having generally shitty surge protection and power filtering.

    9. Re:Mmmm! Puppies!!! by Gription · · Score: 1

      . . .

      ... and having generally shitty surge protection and power filtering.

      I really don't like to consider an APC UPS a power filter. I just don't trust it, especially the desktop models...
      If I am seriously protecting something (like my home theater) I put an ISObar in front of it.

    10. Re:Mmmm! Puppies!!! by sexconker · · Score: 1

      I'm talking about rack solutions, where they advertise power filtering.

  55. can always ??? by baomike · · Score: 1

    How often is this?
    is it equivalent to "possibly always" or "always sometimes"?

  56. What I did by kilodelta · · Score: 2

    I was heavily involved in the planning for moving our I.T. infrastructure to a different place.

    It went from what was essentially a closet in a basement with a single AC unit and individual UPS's on each server.

    So I decided redundancy was key. We had redundant AC, but the best part was power.

    All servers (70 of them at last reckoning) are attached to an APC Symmetra that nominally gives 40 minutes of battery power. The Symmetra in turn is backed up by a 125kW natural-gas fired generator that spools up within 10 seconds.

    It was decided we could suffer a brief AC outage so that was simply attached to the generator. There were two 2 ton AC units in place.

    Even had the foresight to extend a tendril out to the MDF in the building so that our telecom and ISP could plug their UPS into the generator circuit.

    And what was the fly in the ointment? Our DNS services were provided by an outside entity. So one day we had a power failure that hit a very large swath of the city and included us and the entity that provided DNS services.

    So while everything in our shop was running, nobody from the outside could see our public services, and nobody inside could get out.

    We actually got hold of the DNS zone and had our own after that.

    1. Re:What I did by mweather · · Score: 1

      70 servers with individual gnerator backed up UPS and not a one was for your DNS? Not to mention only ONE DNS? Tisk, tisk. Multiple DNS servers geologically separated is the way to go.

    2. Re:What I did by kilodelta · · Score: 1

      Absolutely. I had made this exact point when we were planning the move but the then I.T. Director didn't want to hear about managing a DNS zone. I told him I'd done so in many cases without issue using BIND of all things.

      After that power failure the tune changed. Got a Linux box with BIND installed and configured for our zone and all was well.

  57. Ah, that's easy by jcochran · · Score: 5, Funny

    All you need to do is have the grid power feed some high wattage light bulbs. And near the light bulbs is some solar cells. The output from the solar cells is used to charge batteries which feed an inverter that actually powers the computer. Of course there is some power loss in the conversion process, and you need to have some (ok, a lot), of the input power to the system commited towards running a cooling unit to keep things at a reasonable temperature. But the resulting device provides clean power with no possibility of any surges getting thru to the protected equipment.

    Of course, if you go to this level of trouble for your power source, then I'd also suggest opto-isolating all signal lines to and from the server. And enclose the server in a well grounded faraday cage. And it wouldn't be a bad idea to have a dedicated comm link to a duplicate server located else where. Preferably on a different tectonic plate.

    1. Re:Ah, that's easy by rcw-work · · Score: 3, Interesting

      All you need to do is have the grid power feed some high wattage light bulbs. And near the light bulbs is some solar cells.

      You now have a 1% efficient power supply.

      A slightly more practical option (with better isolation than a standard electromagnetic transformer, but unfortunately also some inductive effects) would be to couple two motors with an insulative shaft.

    2. Re:Ah, that's easy by ColdWetDog · · Score: 1

      Umm, wouldn't batteries be a tad easier... But, I like your style.

      --
      Faster! Faster! Faster would be better!
    3. Re:Ah, that's easy by jcochran · · Score: 1

      Did I say anything about efficiency?
      And I can see that my tongue in cheek wasn't obvious enough.

    4. Re:Ah, that's easy by rcw-work · · Score: 1

      And I can see that my tongue in cheek wasn't obvious enough.

      Nor was mine. :) Although I haven't seen either myself, I'm sure they've both actually been implemented somewhere in the world (for things besides the classic desk calculator).

    5. Re:Ah, that's easy by somersault · · Score: 1

      Batteries don't stop power surges

      --
      which is totally what she said
    6. Re:Ah, that's easy by lga · · Score: 1

      You may jest but the old Wang computer at the place where I work had a bank of three 'frequency converters' powering it. All they are is a motor and a generator set up to take 50Hz up to 60Hz.

      More of a problem is that removing them to make space for more racks of servers would involve 5000 pounds worth of electrical work. The result is they stay wired in and taking up floor space and electricity. (They are on the same circuit breaker as the PC servers.)

  58. Guaranteed to work every time, 60% of the time... by raftpeople · · Score: 1

    Anchorman

  59. MOD PARENT UP! by afabbro · · Score: 1

    The grade-school English used by Slashdot's "editors" is so sad.

    --
    Advice: on VPS providers
  60. Re:A UPS is good to have. Even at home. by noidentity · · Score: 1

    Good you could continue working (posting to Slashdot), but I think there's one small problem: the UPS is causing excessive newlines to be inserted into your outgoing stream.

  61. 4 words by mistahkurtz · · Score: 1

    battery backed write cache (usually only costs a couple/few hundred extra on a decent server)

    --
    not only is time travel possible, it's irrelevant.
  62. Servers should have a few seconds of power by davidwr · · Score: 1

    This is one reason why servers should have power supplies that have enough reserve power to write out journaled data before it gets corrupted, along with alarm mechanisms when the voltage begins to drop and when it is about to drop so low that data is compromised.

    They should also have power-conserving circuits so "non-essential" components get cut off completely so the reserve power isn't used up too fast.

    With good capacitors, holding 2-4 seconds of power should be trivial, holding 10-20 should be very doable at some cost.

    Desktops can have this or not depending on cost and market demand.

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
    1. Re:Servers should have a few seconds of power by m.dillon · · Score: 1

      Er, no. Not at the rate a HD consumes power, particularly while writing. 1/100 of a second is plenty doable (enough to finish a full-track write). Several seconds is not really doable. The HD can only handle a voltage drop of ~1V (if that) on the +12V bus, and less on the +5V bus.

      -Matt

  63. Wrong, I think. by Spazmania · · Score: 2, Informative

    The hard drives and DMA controller however, will run a bit longer; so if data is being written to disk, the DMA controller will keep reading data from memory, but it has no idea that this data is corrupted.

    Pretty sure that's wrong. It used to be (20 years ago) that hard drives losing power in this way had a chance of the heads crashing against the platters (the fabled "hard drive crash"). To solve this, modern drives are very sensitive to the power input. As soon as power fails the drives extract power from the spinning platters to move the heads over to the parked position. Regardless of what the DMA controller thinks it should be doing, the hard drive is busy parking the heads.

    --
    Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
  64. Spelling Lesson by Anonymous Coward · · Score: 0

    Indefinitely
    Murphy
    Advice
    Lose

    And generally "Demon" when you're not talking about Linux.

    And I won't even get started on your capitalization or grammar.

  65. Laptops are the key... by Anonymous Coward · · Score: 0

    This is why our entire IT infrastructure is based on laptops. Built in protection against power failure.

  66. Idiocracy by JWSmythe · · Score: 1

    You have to love someone who posts what almost reads as authoritative, and then puts crap in it like:

    > "When power fails, or when you press reset, they will be in a >
    > "dirty" state, and the system may need to recreate the array.
    > That is, if it can. I've never tried it, but I can imagine"

          MD devices generally recover fine from a power loss. And, I've tried it. A lot. I've had quite a few machines (say hundreds) which have had various events in the past, which caused them to lose power. Here's an example now. Someone accidentally pulled the power cord out on this machine today. It wasn't intentional, they thought they were pulling the one above it.

    cat /proc/mdstat
    Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
    md1 : active raid5 hdd2[1] hda2[2]
                351421696 blocks level 5, 4k chunk, algorithm 2 [3/2] [_UU]

    md0 : active raid1 hdd1[1] hdc1[0] hda1[2]
                40064 blocks [3/3] [UUU]

    unused devices:

          One of the drives didn't come up. Not surprising, this machine has been up for a long time, under heavy load. I'm pretty sure we're beyond MTBF on most of the components. It will be replaced soon anyways. We swapped drives, and ran:

    raidhotadd /dev/md1 /dev/hdc2

        Now it's rebuilding. There's no noticable performance impact. No data was lost. The only "downtime" was for the tech to realize they pulled the wrong cord, and put it back in.

    cat /proc/mdstat
    Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
    md1 : active raid5 hdc2[3] hdd2[1] hda2[2]
                351421696 blocks level 5, 4k chunk, algorithm 2 [3/2] [_UU]
                [>....................] recovery = 0.0% (54992/175710848) finish=159.6min speed=18330K/sec

    md0 : active raid1 hdd1[1] hdc1[0] hda1[2]
                40064 blocks [3/3] [UUU]

    unused devices:

    --
    Serious? Seriousness is well above my pay grade.
  67. Captain Obvious says... by Anonymous Coward · · Score: 0

    "Power loss can cause data corruption? Duh... "

    What a novel idea. Yes - you need a UPS and you need to test its operation.

  68. transaction tracking .. by rs232 · · Score: 1

    Didn't Novell sort this out years ago with something called 'transaction tracking' or was some kind of parallel universe?

    "NetWare includes a transaction-monitoring feature called the Transaction Tracking System (TTS(TM))"

    --

    You seem to be using Microsoft Internet Explorer. Although this site is made up of valid HTML 4.01 and CSS2, when using Internet Explorer, the layout will be messed up, because Microsoft deliberately sabotages the CSS standard (among other things). I have no intention of including all sorts of work arounds for this. Do yourself a favor, get Firefox. It is more secure and much more convenient. Any other normal browser will do as well...

    --
    davecb5620@gmail.com
    1. re: transaction tracking .. by rs232 · · Score: 1

      Guess the concept don't grok around here and reading the responces was almost a complete waste of time, look mom, I'm on slashdot .. :)

      irrelevant keywords:

      ups, raid5, copy-on-write, blocked vents, batteries ...

      --
      davecb5620@gmail.com
  69. Losing power while writing to a HD == lost HD by m.dillon · · Score: 1

    If a hard drive loses power while it is writing to the platter there is a very high chance that you will lose the HD. Not only is the HD likely doing a full-track write, and thus touching sectors that aren't even part of the computer-directed write operation, but HD manufacturers skimp on the parts (basically just a few big capacitors) to detect the loss of power, finish the write out, and prevent the HD's heads from writing a swath of garbage across half the disk as they whip over to their parked position.

    The result? data corruption and even physical damage.

    Probably five out of the last six drives I've lost were due to uncontrolled power failures. One was even from a UPS, which worked just fine but the computers failed to get the shutdown signal and were still writing when the UPS finally ran out of battery time.

    The really funny thing about this is that your typical RAID system is doing parallel writes. You can wind up losing several disks at once and if that happens you probably won't be recovering anything off that RAID. Oops!

    This has nothing to do with computer's DMA still running and passing corrupted data.

    -Matt

    1. Re:Losing power while writing to a HD == lost HD by pslam · · Score: 2, Informative

      I don't think this has been true since... maybe 8-10 years now? Definitely since MR drives came on the market (ages ago).

      Modern drives have:

      • A capacitor that stores enough charge to "emergency park" the heads.
      • Low voltage detection that kicks in, disables the head, and dumps the capacitor into the seek coil.

      It does NOT go writing crap all over whatever's between your data and the parked position, unless the drive is a defective design. The emergency park is a fairly brutal affair, and you'll typically see the datasheet list a maximum number that's notably lower than the max power cycles.

      It's also essential these days because:

      • The head should (of course) never touch the platter.
      • The drive can't actually spin up if the head is resting on the platter.
      • So the drive is designed with the assumption the head NEVER touches the platter in its lifetime.

      Normally that holds true. I've seen some drives (1.0" and 1.8" miniature ones) which suffered from head-on-platter but that was due to misdesign in the power supply feeding it (e.g voltage rails going slightly negative, draining the cap early).

      But anyway, the worst you'll get with the power going out is a partially written sector, which will then be marked bad, probably permanently. Or maybe a bunch of sectors. Or maybe bad in a different order to what the OS sent due due to caching.

      If you had a drive and/or RAID fail due to power outage, you should get a refund. You might lose a tiny amount of data, not the whole lot.

    2. Re:Losing power while writing to a HD == lost HD by m.dillon · · Score: 1

      Sorry, you're wrong. It's a nice fantasy but I've lost enough drives to know that powering off a drive while it is writing has a good chance of destroying it. I'm not talking ancient drives here, I'm talking drives made in the last year.

      If you think otherwise then go right ahead, try writing to your drive and pulling the power. Keep doing it, I doubt the drive will last more then a few power cycles before it becomes completely corrupted and you have bad sectors coming out your ears.

      You actually believe that at most you'd lose one sector? Good luck with that!

      -Matt

    3. Re:Losing power while writing to a HD == lost HD by pslam · · Score: 1

      I think hard drive manufacturers would have shockingly high return rates if removing power during a write had a high chance of destroying the entire drive.

      It's not like it's an unlikely scenario, and they HAVE thought of it. Perhaps you can shed light on which manufacturer you used so we can all avoid buying their defective designs? All the drives I've used in making various consumer electronic devices have data that says they handle it fine. Perhaps you bought an IMB or Hatichi drive?

  70. not their job by Eil · · Score: 1

    Journaling filesystems are not meant to prevent data loss, they only (help) prevent the filesystem from becoming trashed if the disk loses power in the middle of a write. No amount of software can change that.

  71. Part of this seems like bad advice! by Anonymous Coward · · Score: 1

    From the article:

    "The surge protection on UPSes also often includes protection for ethernet and/or telephone networks. I really advice against using those. When there is a surge, the MOVs temporarily short the line containing the surge with the safety earth, but it will also connect the data networks to it. This safety earth, however, does not have infinitely low impedance, and therefore it's possible that some of the excess current will travel up the network, as opposed to down the safety earth. The exact details of this are more complex than this, but as always, the internet is your tool should you want to find out more."

    While this may be true, what happens if a power surge instead comes in through a telephone line, or an Ethernet cable (possibly by way of a cable modem or something). If, for example, a lightning strike hits the cable or phone lines or something similar, that power surge can come right through and fry everything plugged into your fancy expensive surge protector or UPS.

    What you really want to do if you're protecting your equipment from power surges is to create a barrier between everything you have plugged in and the outside world. A high quality surge suppressor is that barrier.

  72. I'm shocked! by mindstrm · · Score: 1

    Do people actually consider running anything the least bit important without a UPS?

    Is it common for anyone to run production equipment in the US without power protection?

  73. B/S by Anonymous Coward · · Score: 0

    Synchronous I/O with IBM HACMP/XD clusters. (Now PowerHA).

    No data loss.

    Nuff said.

  74. I'm always late for the party by denalione · · Score: 1

    but here it goes anyway.

    You mean there are sysadmins out there who would spend hundreds to thousands on RAID and not at least by a cheap UPS for their server?

  75. Such a huge issue - Most info is wrong by bradgoodman · · Score: 1
    This is such a colossal issue, and, well, most of the stuff even posted here is wrong or misdirected.

    There are a billion parts of the system in which data could be lost:

    Applications write to disks, even the apps could do some caching, or could partially write files - corrupting them if they leave files half-written. This could be corrected if apps always did "safe-writing" for proper recovery, but some do, some don't.

    A kernel or OS will usually implement some buffering of caching of data. This could be bypassed, but at the severe expense of performance.

    Filesystems control the writing of data blocks, and filesystem metadata to the disks. Often times, if one piece of this gets written, but another doesn't, corruption could occur. Things like journaling exist, must most of the time (for example) ext3 jounraling prevents the filesystem metadata from getting corrupted, not the file data.

    RAID controllers often have battery backed caches. Often, these batteries are dead, and you may not even know. As someone who has worked extensivly in this area, I can assure you there is no way of knowing the health of the battery without completely draining it, recharging it, and looking at it charge/discharge capactity. Trust me, your RAID controller does not do this. If it did, you'd be completely vulnerable to data loss while this test was in-progress and the battery was depleated. Does your RAID controller have two batteries? I didn't think so.

    Your raid controller sucks. You might feel all warm and fuzzy that you have a RAID-5 array from a name-brand vendor, but until you've pushed that card, and tested it in all the potential edge conditions, you don't know how many blaring issues it really has. Really, I'm serious.

    So here's the issue: Power outage is only one of several (billion) things that could go wrong rendering the system inoperable, and causing data loss. Correcting the issue means understanding the problems at all of these layers and fixing them. The chain is only as strong as it's weakest link.

    Are there implementations that go the whole nine years and do all this? Yes:

    Databases (at the application layer) tend to be anal about how the write to the disk/filesystem, journaling, safe-writing, etc. These are sometimes even mirrored at this level (mysql cluster) to prevent problems.

    Filesystems (like ext3) can journal actual data (not just metadata) at the severe expense of performance. This is so bad, you probably want to handle things at the application layer above.

    High-end systems can use (typically external) active-active RAID controllers with mirrored caches. Pricey. You put it all together - and you're talking a system which is well integrated, purpose-built, and very, very, very well tested at the extremes of any edge conditions. This is what separates very expensive high-end solutions from cheap things thrown together with mix-and-matched commodity hardware. Not to say "commodity" hardware isn't okay - but you have to really know what its doing - and how the pieces interact.

    So to the topic - having a UPS is like pissing in a dark blue suit - it makes you feel all warm and comefy, but no one really see that you're really just covered in piss.

    It will product against one issue. If your serious about protection, you need more.

    So like everything in life, everything comes at a cost. No, your $200 UPS is not the magic bullet to protect your data that is sooo critical to you.

    So now how important is that data really to you? How much you got to spend? ;-)

  76. Anyone ever watch what happens in Brown Outs? by TheNetAvenger · · Score: 1

    Just brown outs alone permanently degrade circuits, let alone most power losses.

    This is one reason you will find even old laptops survive longer than a desktop counterpart that is not running on regulated power. (Except for the few laptops with GPU/CPUS that can cook eggs and people don't clean the vents out.)

    Seriously, if this concept is new to anyone, run to buy a UPS with uber-fast switching or looped continous power...

    As for good old power losses, nothing is coded to be completely impervious, although some out there do a beter than expected job, especially when it comes to data loss situations. The tricks do help, like a 'good' RAID and journaling and an OS that expects people to be stupid enough to pull the plug at anytime. Here is an area where Windows and OS X tend to be a bit better as the OS and software integration is designed around users that think unplugging the unit or flipping a power switch is normal. And between the two, I give a nod to Vista because of NTFS and its journaling, until Apple gets around to ZFS.

  77. Still trivial by raftpeople · · Score: 1

    That applies to every moment in time, whether there is a power outage or not

    1. Re:Still trivial by Trogre · · Score: 1

      Yes, any time your data is spinning at nx10^3 rpm, or even if it isn't, there is a finite chance of data loss. Your data centre could be hit by a comet at any time of course. However I think that's nitpicking a bit. Let me reword it a bit: a power failure has a /significant/ chance of /causing/ data loss.

      --
      "Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife
  78. EMC by Anonymous Coward · · Score: 0

    So that is what those SPS things were for, all of this time I just thought they were shocking homeless people.

    You like being homeless now? How about now you filthy bitch!? How does getting a job and working sound about now!? How about some health insurance? How about some responsibility! Uh oh, order another CX300, this shocker is out of homeless motivational juice.

    I dunno though, data loss, or motivational torture. With all of the shades of grey, who can say what is right and wrong anymore.

  79. Re: Fast generators by Anonymous Coward · · Score: 0

    "but they need a very fast backup generator to sustain anything more than 30 seconds of outage."

    Crap, that is an eternity for a backup generator. Cheap automatic ones can be online in 5 seconds or less. Expensive ones with air start can be online in half a cycle, or 1/120th of a second. With the fast generators, you really don't need a UPS, though prudence dictates otherwise.

  80. so 'always can' then? by Langfat · · Score: 1

    wouldn't that be more appropriate?

  81. Fly wheels... by klubar · · Score: 1

    A more common, and efficient way of isolating from the power grid is via a flywheel. The grid runs a motor that's connected to a moderate sized flywheel and then the flywheel is connected to a DC (or AC with a converter) generator that charges the UPS batteries. This provides excellent isolation from the grid and not much loss of power efficiency. If there is a spike/lightening strike the motor/generator set can ride it out without any problems. If there is a short (less than 2 seconds) drop out the flywheel will keep everything going.

    Motor/generator sets are off the shelf technology that have been proven for many years in data centers. And besides they look really cool.

    See http://www.pscpower.com/pages/industrial%20motor%20generator%20rt.htm "The Series IMG-RT Ride Thru Motor Generator integrates state-of-the-art controls, a single-shaft motor generator and a mechanical flywheel into a power conditioning system that can deliver up to 5 seconds of ride-thru during an interruption of power. The typical induction-synchronous MG set delivers this ride-thru with a maximum frequency drop at full load of 1%, or 0.6Hz on a 60Hz system. For sensitive applications where no frequency variation is acceptable, a synchronous-synchronous MG set is available. Series IMG-RT ride thru motor generator systems are sized and customized to meet a wide range of customer driven application criteria. MG sets are available in size ranges up to 2500 kVA for low voltage applications up to 600V. For medium voltage applications, please consult the factory."

    See also http://www.pscpower.com/pages/series%20xc.htm for upto 10,000 kVA (parallel modules and "ring bus" configuration), claims to have 20 year service life.

    Caution--some serious high voltage/current here. Do not attempt at home.

  82. Shucks, by DrSkwid · · Score: 1

    Where some = those not suitable for coping with a power loss scenario, quel surprise.

    --
    There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
  83. On database engine behaviour.. by Anonymous Coward · · Score: 0

    Why do you think that a real database doesn't count a transaction as committed until the disk reports the relevant parts of the transaction log *written to disk* rather than sitting in a cache on the way there?

    That's the ideal way of dealing with this. But it requires the hardware to never lie, the OS to never lie, the database to be designed to cope with this (and allow for multiple outstanding requests otherwise performance will suffer.)

    Of course, hardware often does lie. (SCSI gear is better than ATA gear due to command queuing; in theory SATA with NCQ can be as good as SCSI).

    Note: Running sync() won't deal with the situation if your hard drive lies and claims the sector is written to disk when it's really sitting in the HDD cache.

  84. Home computer users... by Anonymous Coward · · Score: 0

    Need a UPS too! I am amazed that someone will spend anywhere from $250 to several thousand on a new system, but they will not spend $70 on a decent UPS to protect it. I live in a town of about 25,000 in Southeast Iowa. We seem to get a lot of short (less than 1 second) power drops. Usually several per day, sometimes dozens. The power company gets many complaints, but is not interested in fixing the problem. In such a situation, I consider a UPS essential And not that certain cheap brand that a popular chain of department stores sells (or used to)...I went through 3 oof them in less than a year. I now have a more reliable name brand UPS that has lasted 2 1/2 years so far, with no problems..

    I also consider frequent backups of my data to be essential.

  85. Get Real. by notnAP · · Score: 1

    C'mon. Show of hands... Who here runs a server with redundant power supplies, RAID configured storage, and confirmed backups without plugging the whole kit and kaboodle into a UPS?

    Bueller? Bueller?

  86. Australian Government by Anonymous Coward · · Score: 0

    You know an article if obvious when you see Airport Security already doing it.
    All their computers and hell, even their explosive dust sniffers run on UPSs.

  87. Please stop spreading lies! by dermoth666 · · Score: 1

    While the author of this article do have some points, half of it is misconceptions or just plain nonsense.

    I started laughing at the 2nd paragraph: did he says unrefreshed DRAM garbage being written to disk??? Regardless of the fact that DRAM keeps its content seconds, sometimes even minutes after power goes off - even when removed from the system, can he explain how, if there's no more power refresh the ram, his DMA controller will be copying data? How the disk controller will send the data trough the wire? How the data will be written when there's no power to spin the platters and move the heads?

    Sure there used to be systems with power fail interrupt. That was the SGI's using an old version of XFS _without_ journaling. The PSU was loaded with big capacitors and upon triggering that interrupt the system would flush cache to disk before the power was out.

    There's also misconception about databases - at least MySQL. I work with it in cluster environments. In my testing, I was routinely (and automatically trough network boot bars) shutting off current of the active node, causing a hard shutdown, and letting resources fail over the passive one. Did that hundred - maybe thousand - times on a well loaded replicated slave cluster without a single glitch. No forced InnoDB recoveries. No replication problems.

    His "disk cache" issue is a nonsense too - at least the way he present it. The proper way to demonstrate it would rather be doing the sync, then upon sync returning shutting off the computer, because there lies the problem. My MySQL cluster above was able to recover because on every fsync (and there were hundreds _per second_!) it knew the data was hard on disk (in the battery-backed RAID controller cache actually). The problem lies with consumer-lever hardware. IDE/SATA has their write cache enabled by default. In some case it can't even be turned off.

    So instead of suggesting to buy UPSes to "patch the problem", data reliability should start with decent hardware components: ECC Ram, SCSI/Sas drives, etc. Sometimes tha's also a tradeoff between speed and reliability as you ofter get the choice. And BTW in about 5 years I've seen whole datacenters loose power at 4 occasions, and not the cheapest/smallest ones (three times UPS failures, and once generator failed to kick in. That was in two different datacenters in US and Canada). I've also had an UPS failure from an expensive APC SMART-UPS. You can't only rely on UPSes.

    1. Re:Please stop spreading lies! by dermoth666 · · Score: 1

      I just read about the so called "write intent bitmap" feature, that according to the author is the only way to keep software raids safe...

      The only purpose of this thing is to avoid a full resync after failures. The performance impact is high though, and won't help if you have disk write cache problems anyways. In the other case, the resync will just use data of the first disk which should be good if FS and applications properly fsync'ed their critical data.

      I wouldn't recommend it unless you have a slow multi-terabyte array that would take days to sync.

  88. If it's good enough for us... by Anonymous Coward · · Score: 0

    Take a lesson from commercial aircraft design. Power inputs are characterized extensively and the devices on the power bus have built-in capacitance to hold-up the processor/devices long enough to commit all data before the device loses power. In addition, critical devices have redundancy so if one device dies, the other device can complete the critical operaton.

    I suppose if this is good enough for our asses, it is good enough for some non-living data!

  89. UPSes are a must for home users by Loconut1389 · · Score: 1

    Forget about outages for a minute, think about lightning, surges, sags, etc. I lost some network gear to lightning before I had money to put UPSes everywhere, and I've seen modems with chipsets that have huge holes ripped in them from a strike. Currently, my TV stack (tivo, cable box, cable amp, cable modem, router, etc) are all plugged into a 1500 VA UPS, all of our computers each have UPSes which guarantee at least 10 minutes runtime (depends on number of disks, etc), and the tivo in the other room is on a 550VA UPS. The 1500VA UPS will carry our network gear, tivo, etc for about 2 hours and 20 minutes. Most of our outages tend to be either brief cuts lasting 2 seconds or less, but otherwise tend to be a couple of hours. Around 4:30a the other night, we had an outage affecting some 3000 houses. Before the power went out completely, it reset 3 or 4 times as the grid tried to reroute around the fault and failed. This would have been murder on a disk if the system was set to restore to previous state. The other dozen or so blips every year would also be bad on disks. Anyway, the power finally goes out, my tivo is still recording my shows, and the only light in the house is the LCDs, which I use to shut down the systems and immediately power the screens down to increase runtime. I eventually had to shut off the smaller UPS with the extra tivo, but everything else was shut down cleanly and didn't have any bobbles in power. After sitting around in the dark for a while, unable to sleep, I plugged in a lamp into the 1500VA on my main desktop UPS (which was offline) and read a book for a couple hours until the power came back on. It's pretty damn weird when you think you're maybe the only person for a couple miles with a light on, let alone a tivo or a computer. I second what others have said about laptops, it's pretty nice to be able to keep working for a while with the power out. UPS's are a must for many reasons.

  90. Sigh, well-meaning, but lacking in factual basis by tjrw · · Score: 1

    Fundamentally, any time you lose power, you WILL lose some data that is cached. Period. The important part is not whether or not you lose data, but whether you KNOW what got written and what didn't. That's whole point of transactions.

    Filesystem journals are (mostly) concerned with metadata integrity/consistency (ZFS is somewhat exceptional in this area). Unless you do full-data journalling (and the performance penalty means you don't want to), it is up to the application to implement transactional behaviour. This is a key feature of databases.

    Part 2, ram can fail (refresh) before hard drive stops doing DMA. Frankly, I don't believe it. Data gets DMA'd to buffers on the drive. That's the cheap (powerwise) part. Moving the heads around and writing the data is the expensive bit. The drive is not going to tell you it has written something unless it has done so and modern drives won't try to write unless they have enough power to do so.

    Part 3, you can lose cached data. Clue. The same problem happens if the drive dies or starts erroring out. If you don't do synchronous writes, you (the app) are responsible for checking to see that the writes actually got committed.

    Part 3.2. This has NOTHING to do with bit flipping. The Gentoo Wiki is talking about enabling write-cacheing on the drives. This is incredibly dangerous (fatal) without a UPS. Basically, enabling write-cacheing on the drive allows the drive to say "yes this is committed to stable store" when it isn't, and if the cache is not protected, all bets are off. True, it's much more practical to do hard-luck recovery of unencrypted data in this case, but fundamentally, this is a no-no. A lot of RAID controllers offer writeback cacheing and most are smart enough to disallow this unless the cache is battery-backed.

    Part 3.3 RAID. RAID 0 isn't worse than a single disk. RAID 1 potentially requires a full resync, but is recoverable. Similar rules for other RAID levels. This is all known/handled.

    Part 3.4. VERY WRONG. All I can say is read up on ACID. Postgresql is fully ACID (http://en.wikipedia.org/wiki/ACID) compliant, as are most/any databases worthy of the name these days.

    I won't argue that power failures are healthy or desirable. Hardware stress and failure are the most obvious issues here, but most of the "issues" brought up in the article are simply incorrect. Of course, there are an unbelievably large number of "applications"/programs out there that don't implement the necessary journalling/transactions to correctly deal with power outages/crashes etc. But that's another story. Badly-written applications are as old as the computer.

  91. the server alarm went off .. by rs232 · · Score: 1

    "The reason I remember the exact minute it failed was I had my bag in hand and was walking toward the door when the server alarm went off"

    You actually had a server alarm - cool !!!

    --
    davecb5620@gmail.com
  92. "Don't kill -9 postmaster" is NOT about power loss by tgl · · Score: 1

    I lost faith that this guy knows what he's talking about when I read this:

    > The PostgreSQL mailinglist doesn't have "don't kill -9 the postmaster!" as a standard signature to list messages for nothing.

    Indeed that is standard advice, but it has NOTHING WHATSOEVER to do with power failure recoverability. The reason you're not supposed to do it is that hard-killing the postmaster doesn't get rid of its subprocesses or shared memory segment, which could make a subsequent attempt to restart the postmaster hazardous. But those things won't survive a system crash due to power loss (or any other reason).

    I'm not really qualified to evaluate all the other statements in the article, but the fact that the one statement I do know about is hogwash doesn't make me feel good about the others.

  93. ZeroSurge .. To The Rescue!! by Anonymous Coward · · Score: 0

    http://www.zerosurge.com/

    ( no dies-in-2-years MOVs: series-reactor, instead.

    Been around for many years, & the only thing that can stop mega-spikes:
    when nearby-lightning has killed all the normal UPSs in the region,
    if YOUR UPSs are protected by these, they'll probably be fine.

    no, I'm not affiliated, but prefer infrastructure to continue-working, for OUR benefit.

    They have a 5A panel-mount unit, for sticking on appliances, too :)

  94. Wrong by Anonymous Coward · · Score: 0

    You CAN make a system that won't lose data. The FIX protocol (Financial Information Exchange) is an example of this. The data is sent, received, written, and then CONFIRMED RECEIVED is sent back before it is deleted/processed at the source. There are all kinds of redundancy built into FIX. This of course wastes a lot of cycles but in the financial world this is what is done because data corruption could equal actual money lost. Imagine if your bank balance was being transferred and was lost!

    1. Re:Wrong by RegularFry · · Score: 1

      Without knowing any details of FIX (although now I'm tempted to go and look it up) it's still possible to lose data if the CONFIRMED RECEIVED message is sent erroneously. This is far from impossible if various parts of a system are powering down at different speeds.

      --
      Reality is the ultimate Rorschach.
  95. Yank the plug. by IdeaMan · · Score: 1

    I wish they would do this randomly at Microsoft and Intel... that and have the computers randomly fail and give bad calculation results. Maybe then we would have fault-tolerant computing and truly robust backup schemes.

    The space shuttle computers could then be just normal computers.

    --
    They ARE out to get you simply because They are in it for themselves and they don't care about you.
  96. UPS's aren't golden eggs either ... by freaker_TuC · · Score: 1

    My company had a powerfailure one year ago after our local electricity company finished their work in the street. The electricity went out-on-out-on, this in 9 cycles. Not only there were big problems with the devices not behind an ups, but also big problems with the servers and other material behind ups's.

    The current somehow managed to reach the serial port of 2 APC's, killing half of the server infrastructure in that rack.

    Needless to say, I switched immediately to the higher series with online-power. I hope to prevent such kind of damage.

    Every UPS has only a limited processing power; expressed in Joules. Once the power problem goes over that amount, the UPS will fail its function and eventually also fail its safety precautions. The higher series got filtering and protection for such seperated.

    --
    --- I am known for the ones who want to find me on the net. Is that a privacy risk or a privilege? One might wonder..