Slashdot Mirror


Lightning Strikes Amazon's Cloud (Really)

The Register has details on a recent EC2 outage that is being blamed on a lightning strike that zapped a power distribution unit of the data center. The interruption only lasted around 6 hours, but the irony should last much longer. "While Amazon was correcting the problem, it told customers they had the option of launching new server instances to replace those that went down. But customers were also able to wait for their original instances to come back up after power was restored to the hardware in question."

4 of 109 comments (clear)

  1. Re:Inconcievable! by nine-times · · Score: 4, Informative

    Well it does seem like it was pretty resilient:

    While Amazon was correcting the problem, it told customers they had the option of launching new server instances to replace those that went down.

    So basically a set of servers went down, and it took down the particular instances running on those servers. Customers were still able to take the same exact image and start new instances-- it sounds like immediately. Now sure, it'd be nice if they worked out some kind of automatic clustering and failover to take care of this sort of thing for you, but when my server goes down with my dedicated host, I don't have the option to start up a new host immediately with the same exact configuration.

  2. Re:Do any of you know how they survived? by KahabutDieDrake · · Score: 4, Informative

    You've never actually worked with enterprise class gear have you? It's standard for most of the servers and all of the data storage to have capacitance/battery backups for just such an emergency.

    Typically, the raid controller will have enough on board capacity to clear it's write cache before losing power entirely. While the drive array will be connected to a decent UPS that can hold for at least a few minutes. Meanwhile, the server itself will also likely be connected to the same UPS, or a different one.

    The real question at hand is, were the UPS between the power distribution node and the server, or were they on the other side of the distribution node, and therefore worthless in a case like this? I've seen both configurations, but the latter is rarer. Not because of this particular case, but because of efficiency concerns.

    If there was a failure of design, it was most likely in the building wiring itself. The building was clearly not properly grounded against lightning strikes, as if it was, the surge would never have hit the internal wiring. It might have kicked the building off the grid for a time, but it should never have reached a power distribution node. Although it's likely the outcome would be similar if not identical.

  3. Re:Lightning once striked our office building. by Achromatic1978 · · Score: 4, Informative

    No, they don't. You're either being disingenuous, or idiotic.

    "Amazon EC2 provides developers the tools to build failure resilient applications and isolate themselves from failure scenarios."

    "you can protect your applications from failure of a single location"

  4. Re:Do any of you know how they survived? by sirsnork · · Score: 4, Informative

    RAID Controllers have batteries so they can remember whats in the cache (for about 48hours), not so they can write that data out to disks befoer they power off. When power is returned and thr disks come back up the cache is flushed before any other action, thereby keeping the array in one piece

    --

    Normal people worry me!