Slashdot Mirror


Data Center Power Failures Mount

1sockchuck writes "It was a bad week to be a piece of electrical equipment inside a major data center. There have been five major incidents in the past week in which generator or UPS failures have caused data center power outages that left customers offline. Generators were apparently the culprit in a Rackspace outage in Dallas and a fire at Fisher Plaza in Seattle (which disrupted e-commerce Friday), while UPS units were cited in brief outages at Equinix data centers in Sydney and Paris on Thursday and a fire at 151 Front Street in Toronto early Sunday. Google App Engine also had a lengthy outage Thursday, but it was attributed to a data store failure."

9 of 100 comments (clear)

  1. If only you had listened... by BillyMays · · Score: 5, Funny

    I'm guessing that the majority of these were caused by leaks or spilled drinks. If only you guys had listened to me and gotten Zorbeez(tm)[SOAKS UP 10x ITS OWN WEIGHT!].

    -B. Mays

  2. Be Redundant! by drewzhrodague · · Score: 5, Insightful

    Anyone seriously oncerned about their web applications, will have redundant sites, and a way to share the load. Few people pay attention to the fact that DNS requires geographically disparate DNS servers *, such that even in the event of a datacenter fire (or nuclear attack), there will still be an answer for your zone. Couple this with a few smaller server farms in separate places, and there won't be any problems. I went to look it up on wikipedia, but didn't find out where it is required for authoritative DNS servers to be in separate geographic regions. Where did I read this, DNS and BIND?

    --
    Zhrodague.net - I do projects and stuff too.
    1. Re:Be Redundant! by W3bbo · · Score: 5, Informative

      The DNS RFCs advise that zone nameservers should be in separate subnets. Specifically RFC 2182 recomends that secondary DNS services be spread around geographically.

  3. No preventative maintenance? by Neanderthal+Ninny · · Score: 5, Insightful

    My wild guess is they are deferring preventative maintenance on these data centers so we are seeing these major outages now. Fire suppression, UPS, transfer switches, generators, distribution panels, transformers, network gear, server, storage devices and other gear will fail if you don't maintain them properly. As loads increase, the equipment will fail earlier and my guess the people have pushed the limit of this equipment beyond they the lifespan of load rating.

  4. Downside to consolidation by Anonymous Coward · · Score: 5, Insightful

    Surprise surprise...there's a downside to consolidation. Hey morons, the internet was invented as a means to ensure redundant communications paths given nuclear warfare. The old central switch (physical switching) was seen as too cumbersome and vulnerable. Now that we have wonderfully redundant communications, and have done away with most of the downsides of physically distributed systems, morons are building logically centralized systems.

    NEWSFLASH - Redundant communications and physical virtualization do very little for you if you build a logical mainframe.

    Truly distributed systems must be physically AND logically DISTRIBUTED with redundant comms paths in order to gain the full benefits of decentralization. (e.g. Distributed isn't distributed if all your authentication is done at one site or all your traffic must pass through .)

  5. Former critical power field engineer here... by asackett · · Score: 5, Interesting

    ... saying that it's time to reconsider cost cutting measures. In 15 years in the field I never saw a well designed and well maintained critical power system drop its load. I saw many poorly designed and/or poorly maintained systems drop loads, even catching fire in the process. One such fire in a poorly designed and poorly maintained system took the entire building with it, data center and all. The fire suppression system in that one was never upgraded to meet the needs of the "repurposed space" which was originally a light industrial/office space.

    --

    Warning: This signature may offend some viewers.

  6. Even worse... by Anonymous Coward · · Score: 5, Informative

    I'm one of the guys that services the security system in Fisher Plaza. The damn sprinklers killed half my panels near the scene. Turns out they use gas suppression methods in the data centers, not so much in the utility closets. And the city of Seattle REQUIRES sprinklers throughout the building, even right over the precious, precious servers. In defense of the staff there however, they do not keep them all charged 24/7. Other then that, I have no more info, as they're pretty locked down.

  7. Re:"bad week to be a piece of electrical equipment by Anonymous Coward · · Score: 5, Interesting

    Because out of all of the data centers in the world, there were problems at five? Riiiiight. Good reporting, Slashdot.

    Can I sign up for broken water main notices here, too, or do I need to go to another website?

    100+ million people daily are "serviced" by these 5 data centers.

    Company's such as authorize.net where COMPLETELY unavailable for payments to hundred of thousands of webmasters sites (ya know the people who make money)

    If you don't think this is serious news then you are still living at home.

    Ya that's what I thought.

  8. Re:Rackspace in Dallas by zonky · · Score: 5, Informative

    That isn't quite right, re: their 2007 outage.

    It wasn't a power issue as such, but the way their chillers reponded to two quick power fluctuations in succession:

    This is what they said:

    Without notifying us, the utility providers cut power, and at that exact moment we were 15 minutes into cycling up the data centerâ(TM)s chillers. Our back up generators kicked in instantaneously, but the transfer to backup power triggered the chillers to stop cycling and then to begin cycling back up againâ"a process that would take on average 30 minutes. Those additional 30 minutes without chillers meant temperatures would rise to levels that could irreparably damage customersâ(TM) servers and devices. We made the decision to gradually pull servers offline before that would happen. And I know we made the right decision, even if it was a hard one to make.