Data Center Power Failures Mount

← Back to Stories (view on slashdot.org)

Data Center Power Failures Mount

Posted by timothy on Monday July 6, 2009 @11:58AM from the send-money-drugs-and-sealed-lead-acid-batteries dept.

1sockchuck writes "It was a bad week to be a piece of electrical equipment inside a major data center. There have been five major incidents in the past week in which generator or UPS failures have caused data center power outages that left customers offline. Generators were apparently the culprit in a Rackspace outage in Dallas and a fire at Fisher Plaza in Seattle (which disrupted e-commerce Friday), while UPS units were cited in brief outages at Equinix data centers in Sydney and Paris on Thursday and a fire at 151 Front Street in Toronto early Sunday. Google App Engine also had a lengthy outage Thursday, but it was attributed to a data store failure."

4 of 100 comments (clear)

Min score:

Reason:

Sort:

Re:Be Redundant! by W3bbo · 2009-07-06 12:54 · Score: 5, Informative

The DNS RFCs advise that zone nameservers should be in separate subnets. Specifically RFC 2182 recomends that secondary DNS services be spread around geographically.
Even worse... by Anonymous Coward · 2009-07-06 13:28 · Score: 5, Informative

I'm one of the guys that services the security system in Fisher Plaza. The damn sprinklers killed half my panels near the scene. Turns out they use gas suppression methods in the data centers, not so much in the utility closets. And the city of Seattle REQUIRES sprinklers throughout the building, even right over the precious, precious servers. In defense of the staff there however, they do not keep them all charged 24/7. Other then that, I have no more info, as they're pretty locked down.
Rackspace in Dallas by Thundersnatch · 2009-07-06 15:01 · Score: 4, Informative

We're a Rackspace customer in their DFW datacenter. This is the third power-related outage they've had in the last two years at that supposedly world-class facility.
The first wasn't really their fault: truck driver with health condition runs into their transformers. Generators kick in, but chillers don't re-start quickly enough. Temps skyrocket in minutes, emergency shutdowns. Maybe the transformes should have had some $50 concrete pylons surrounding them?
The second outage was the result of a botched generator upgrade.
This latest outage was the result of a botched UPS maintenance.
None of the outages was long enough to trigger our failover policy to our DR site, but our customers definitely noticed.
While their messaging has been very open and honest about the problems, and the SLA credits have been immediate, we pay them nearly $20K per month. Nedless to say, we are shopping, and looking into a "multiple cheap colos" architecture instead of "Tier-1 managed hosting". Nothing beats geographic redundancy.
1. Re:Rackspace in Dallas by zonky · 2009-07-06 15:40 · Score: 5, Informative
  
  That isn't quite right, re: their 2007 outage.
  
  It wasn't a power issue as such, but the way their chillers reponded to two quick power fluctuations in succession:
  
  This is what they said:
  
  Without notifying us, the utility providers cut power, and at that exact moment we were 15 minutes into cycling up the data centerâ(TM)s chillers. Our back up generators kicked in instantaneously, but the transfer to backup power triggered the chillers to stop cycling and then to begin cycling back up againâ"a process that would take on average 30 minutes. Those additional 30 minutes without chillers meant temperatures would rise to levels that could irreparably damage customersâ(TM) servers and devices. We made the decision to gradually pull servers offline before that would happen. And I know we made the right decision, even if it was a hard one to make.