Data Center Power Failures Mount
1sockchuck writes "It was a bad week to be a piece of electrical equipment inside a major data center. There have been five major incidents in the past week in which generator or UPS failures have caused data center power outages that left customers offline. Generators were apparently the culprit in a Rackspace outage in Dallas and a fire at Fisher Plaza in Seattle (which disrupted e-commerce Friday), while UPS units were cited in brief outages at Equinix data centers in Sydney and Paris on Thursday and a fire at 151 Front Street in Toronto early Sunday. Google App Engine also had a lengthy outage Thursday, but it was attributed to a data store failure."
Anyone seriously oncerned about their web applications, will have redundant sites, and a way to share the load. Few people pay attention to the fact that DNS requires geographically disparate DNS servers *, such that even in the event of a datacenter fire (or nuclear attack), there will still be an answer for your zone. Couple this with a few smaller server farms in separate places, and there won't be any problems. I went to look it up on wikipedia, but didn't find out where it is required for authoritative DNS servers to be in separate geographic regions. Where did I read this, DNS and BIND?
Zhrodague.net - I do projects and stuff too.
My wild guess is they are deferring preventative maintenance on these data centers so we are seeing these major outages now. Fire suppression, UPS, transfer switches, generators, distribution panels, transformers, network gear, server, storage devices and other gear will fail if you don't maintain them properly. As loads increase, the equipment will fail earlier and my guess the people have pushed the limit of this equipment beyond they the lifespan of load rating.
Surprise surprise...there's a downside to consolidation. Hey morons, the internet was invented as a means to ensure redundant communications paths given nuclear warfare. The old central switch (physical switching) was seen as too cumbersome and vulnerable. Now that we have wonderfully redundant communications, and have done away with most of the downsides of physically distributed systems, morons are building logically centralized systems.
NEWSFLASH - Redundant communications and physical virtualization do very little for you if you build a logical mainframe.
Truly distributed systems must be physically AND logically DISTRIBUTED with redundant comms paths in order to gain the full benefits of decentralization. (e.g. Distributed isn't distributed if all your authentication is done at one site or all your traffic must pass through .)
"Major" data center or not, the one your company employing you at the time is using is the important one.
In my experiences, data center backups fail about a third the time power is interupted somewhere.
Servers in an Oakland California center were the victim of the loss of one of three power phases, while the monitoring that would have switched over to the diesel generators was looking at the power level of other phases. UPS systems ran out of power. An extra level of redundancy in the form of rack mount UPSes allowed servers to shut down properly despite the data center's loss of routing.
Data center #2 was the victim of a simple power outage and immediate failure of the main data center UPS system. According to a security guard I talked to, "it exploded". The diesel backup never had a chance to start.
Then the doubly-sourced Power Distribution Unit supplying a rack at a third ISP failed in a way that turned off both sources supplying the servers.
Hint: Add an extra level of UPS redundancy and safe shutdown software daemons, at least. Multiple data centers if you need more nines.
authorize.net are apparently complete idiots, if they are that large and all their equipment is in one datacenter then that's bordering on insane. Heck, my little company of under 1k employees has two facilities. Anyone who's should be running a site with 100k+ customers knows better.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.