Slashdot Mirror


Data Center Power Failures Mount

1sockchuck writes "It was a bad week to be a piece of electrical equipment inside a major data center. There have been five major incidents in the past week in which generator or UPS failures have caused data center power outages that left customers offline. Generators were apparently the culprit in a Rackspace outage in Dallas and a fire at Fisher Plaza in Seattle (which disrupted e-commerce Friday), while UPS units were cited in brief outages at Equinix data centers in Sydney and Paris on Thursday and a fire at 151 Front Street in Toronto early Sunday. Google App Engine also had a lengthy outage Thursday, but it was attributed to a data store failure."

3 of 100 comments (clear)

  1. Re:Be Redundant! by W3bbo · · Score: 5, Informative

    The DNS RFCs advise that zone nameservers should be in separate subnets. Specifically RFC 2182 recomends that secondary DNS services be spread around geographically.

  2. Even worse... by Anonymous Coward · · Score: 5, Informative

    I'm one of the guys that services the security system in Fisher Plaza. The damn sprinklers killed half my panels near the scene. Turns out they use gas suppression methods in the data centers, not so much in the utility closets. And the city of Seattle REQUIRES sprinklers throughout the building, even right over the precious, precious servers. In defense of the staff there however, they do not keep them all charged 24/7. Other then that, I have no more info, as they're pretty locked down.

  3. Re:Rackspace in Dallas by zonky · · Score: 5, Informative

    That isn't quite right, re: their 2007 outage.

    It wasn't a power issue as such, but the way their chillers reponded to two quick power fluctuations in succession:

    This is what they said:

    Without notifying us, the utility providers cut power, and at that exact moment we were 15 minutes into cycling up the data centerâ(TM)s chillers. Our back up generators kicked in instantaneously, but the transfer to backup power triggered the chillers to stop cycling and then to begin cycling back up againâ"a process that would take on average 30 minutes. Those additional 30 minutes without chillers meant temperatures would rise to levels that could irreparably damage customersâ(TM) servers and devices. We made the decision to gradually pull servers offline before that would happen. And I know we made the right decision, even if it was a hard one to make.