Slashdot Mirror


When the Power Goes Out At Google

1sockchuck writes "What happens when the power goes out in one of Google's mighty data centers? The company has issued an incident report on a Feb. 24 outage for Google App Engine, which went offline when an entire data center lost power. The post-mortem outlines what went wrong and why, lessons learned and steps taken, which include additional training and documentation for staff and new datastore configurations for App Engine. Google is earning strong reviews for its openness, which is being hailed as an excellent model for industry outage reports. At the other end of the spectrum is Australian host Datacom, where executives are denying that a Melbourne data center experienced water damage during weekend flooding, forcing tech media to document the outage via photos, user stories and emails from the NOC."

6 of 135 comments (clear)

  1. Re:Don't they have by johnncyber · · Score: 5, Informative

    Dude RTFA (I know, I know, shame on me). The backup generators kicked in, but 25% of the machines in data center did not receive power before crashing.

  2. Huh? by SlappyBastard · · Score: 2, Informative

    How did I end up in this article? Ah!!!

    --
    I scream. You scream. I assume that means we're both acquainted with the problem. We proceed.
  3. When the Power Goes Out At Google... by binaryseraph · · Score: 3, Informative

    ...a fairy dies.

  4. Re:the worst nightmare of data center peeps by SmilingBoy · · Score: 2, Informative

    First, your servers will shutdown ungracefully, and then, they will be destroyed with little chance of recovery. You will then have to rebuild your systems, and restore the data from the offsite backup. This will of course take time. If this is too much off a risk, you should run a alternate datacentre mirroring your primary databases that can go live within minutes.

  5. Re:floods by DragonWriter · · Score: 2, Informative

    The structure that can withstand a flood has existed for a lot longer than submersible warships - it's called a "hill". If you don't have one conveniently nearby to use you can even build an artificial one.

    An "artificial hill" intended to protect an area from floods is usually called a "levee", and while certainly extremely useful for their intended purpose, they aren't exactly an ironclad guarantee. So having contingency plans for the case where they fail isn't a bad idea.

  6. Re:Generators plus UPS FTMFW by Richard_at_work · · Score: 2, Informative

    In your rush to criticise 'Microsoft land', you must have overlooked his closing statement regarding 'if the backup generator failed to kick in'.

    You cannot have uptime without power. A mains outage coupled with an unexpected generator failure *will* result in downtime - your decision now is whether you wish your servers to be gracefully shutdown, or just have the rug pulled from under them and hours or days of potential angst as a result. Which is it?

    And before you suggest larger UPSes for longer protection, consider why you have both a generator and a UPS in the first place - UPSes cost a lot, they cost a lot to buy, and they cost a lot to maintain, and then they cost a lot to replace after only a few years. A generator in comparison costs a lot less all round.