Slashdot Mirror


Can Maintenance Make Data Centers Less Reliable?

miller60 writes "Is preventive maintenance on data center equipment not really that preventive after all? With human error cited as a leading cause of downtime, a vigorous maintenance schedule can actually make a data center less reliable, according to some industry experts.'The most common threat to reliability is excessive maintenance,' said Steve Fairfax of 'science risk' consultant MTechnology. 'We get the perception that lots of testing improves component reliability. It does not.' In some cases, poorly documented maintenance can lead to conflicts with automated systems, he warned. Other speakers at the recent 7x24 Exchange conference urged data center operators to focus on understanding their own facilities, and then evaluating which maintenance programs are essential, including offerings from equipment vendors."

5 of 185 comments (clear)

  1. Security updates by bjb_admin · · Score: 5, Informative

    Sometimes I get the feeling that security updates can in most cases cause more problems than the issues themselves.

    I can think of many occasions that a security update has broken a server/router/etc. Obviously the lack of a security update can lead to a bigger headache in the future. But the typical user doesn't understand and has the attitude "IT broke the server again".

    If a virus or hacker causes an issue the attitude is "I hope they fix that soon. I hate viruses/hackers" (obviously this is a huge generalization).

  2. Maintenance took down Chernobyl by ExtremeSupreme · · Score: 3, Informative

    That being said, it was because their procedures were shit, not because they were doing maintenance.

    1. Re:Maintenance took down Chernobyl by crankyspice · · Score: 5, Informative

      That being said, it was because their procedures were shit, not because they were doing maintenance.

      Actually, no, the Chernobyl disaster was sparked with a 'live' test of a new, untested mechanism for powering reactor cooling systems in the event of a disaster that brought down the power grid. http://en.wikipedia.org/wiki/Chernobyl_disaster#The_attempted_experiment (And even that test was delayed several hours, into a shift of workers that weren't properly prepared to conduct the test.)

      --
      geek. lawyer.
  3. Re:In between maybe? by sphealey · · Score: 3, Informative

    ===
    Back in the early 90s, I inherited from a friend a fear of rebooting, turning off, or performing maintenance on a computer. Half the time he opened the case, the computer would become unbootable or never turn back on.
    ===

    Neither you nor your friend are alone in thinking that:

    AD-A066579, RELIABILITY-CENTERED MAINTENANCE, Nowlan & Heap, (DEC 1978) [this used to be available for download from the US Dept of Commerce web site; now appears to be behind a US government paywall (!)]

    A more recent summary:

    http://reliabilityweb.com/index.php/articles/maintenance_management_a_new_paradigm/

    sPh

  4. Re:In between maybe? by AliasMarlowe · · Score: 4, Informative

    It lives on also among the DoD's general specifications, and can be downloaded from this page.

    --
    Those who can make you believe absurdities can make you commit atrocities. - Voltaire