When the Power Goes Out At Google
1sockchuck writes "What happens when the power goes out in one of Google's mighty data centers? The company has issued an incident report on a Feb. 24 outage for Google App Engine, which went offline when an entire data center lost power. The post-mortem outlines what went wrong and why, lessons learned and steps taken, which include additional training and documentation for staff and new datastore configurations for App Engine. Google is earning strong reviews for its openness, which is being hailed as an excellent model for industry outage reports. At the other end of the spectrum is Australian host Datacom, where executives are denying that a Melbourne data center experienced water damage during weekend flooding, forcing tech media to document the outage via photos, user stories and emails from the NOC."
I thought that contracts required Google to disclose the cause and time of their downtime, and this disclosure is part of that.
Right now though, Google is making Microsoft look like they have better uptime for SaaS.
The price is always right if someone else is paying.
What I want to know is, what caused the outage?
The post on the google-appengine group details all the things they did wrong and are going to fix, after the power went out. Fine, I have to plan for outages too. But what caused the unplanned outage?
I've yet to get a satisfactory answer as to exactly what would happen if - say - a water line breaks and floods all the electrical (including the dual redundant UPS systems) in the data center.
Then you are doing a rubbish job at your job! I would never hire somebody who would't ask this question up front before making a hosting decision, and having this decision made before you joined is no excuse. You are another typical slashdot failure of a sysadmin. Disaster recovery IS a sysadmins primary job. If your network cannot handle a disaster like this, then your network is rubbish.
Yeah, I just imagined working in raised-floor, climate-controlled rooms. You don't know shit about me, nor could you from a four word drive-by. You just want to put people down because it does something for you. That behavior demonstrates that you are a pitiful excuse for a decent human being, congratulations! Piss off.
I support the Slashcott and will not be reading or commenting from 2/10/14 to 2/17/14. Beta is steaming pile of dog shit
Besides, I'm unsure why you'd ever need more than that 22min since that is plenty of time for our on site staff to gracefully power down any of our major servers if the backup generator failed to kick in.
You consider powering down major servers to be a good option? Smells like an opinion from microsoft land (where "planned downtime" counts as "uptime", and an "uptime" of 95% is "acceptable"...)
I mod down anyone who says "I will be modded down for this", regardless of the rest of their comment