Lightning Strikes Amazon's Cloud (Really)
The Register has details on a recent EC2 outage that is being blamed on a lightning strike that zapped a power distribution unit of the data center. The interruption only lasted around 6 hours, but the irony should last much longer. "While Amazon was correcting the problem, it told customers they had the option of launching new server instances to replace those that went down. But customers were also able to wait for their original instances to come back up after power was restored to the hardware in question."
Isn't cloud computing supposed to tackle such instances?
What irony?
Maybe I'm just tired, but I'm not sure what irony is being referred to by the poster.
While everyone is talking up the cloud and how resilient it is... this is just yet another example to never put all your eggs in one basket. If your service is so damn important that it can't go down - have it hosted in two places.
Notice, Amazon.com didn't go down... :)
I'm thinking critically because Amazon, EMC, VMWare, etc bill The Cloud as a mystical place where you throw your shit and then it's universally available 100%. Nothing bad happens in The Cloud.
/cloud
So what's the deal with having all copies of these VMs in one datacenter? That's not very The Cloud of them. Maybe they should replicate all of EC2 to GFS. Would The Cloud win then?
Customers being given the option of redeploying their VMs or waiting an unspecified period of time until The Cloud is back online isn't The Cloud we were promised.
I like music
I'm thinking critically because Amazon, EMC, VMWare, etc bill The Cloud as a mystical place where you throw your shit and then it's universally available 100%. Nothing bad happens in The Cloud.
No, they don't. You're either being disingenuous, or idiotic.
So what's the deal with having all copies of these VMs in one datacenter? That's not very The Cloud of them.
So you expect Amazon to somehow be running the same VM simultaneously on multiple machines? The point of EC2 is that you have machine images prepared in advance, which you can launch at any time to instantiate a new, ready-to-go VM. The VMs themselves are obviously still running on actual machines, which are (surprise!) still vulnerable to things like lightning strikes and other random hardware failures.
If a few minutes downtime when something like that happens is unacceptable, then you should be running multiple machines in different availability zones-- which is exactly what you'd be doing in a more traditional environment. EC2 just makes it easier to do this in a flexible way. Yes, you pay for that privilege, but it's clearly worth it to some people.
I'm reading between the lines here (it doesn't actually say this in TFA), but it sounds like this was a direct hit. Not an outage, which is a different beast.
A UPS is about as useful in this instance as antibiotics against a virus - it's a solution to a different problem. Surge protectors don't help much either, not unless the strike was a fairly mild and/or remote one. You could switch over to a disconnected UPS system every time there's a thunderstorm on the horizon, but that seems needlessly complicated and expensive.
That being said, the GP referred to an outage, so you've quite correctly answered his question; it's just the wrong question to ask in this instance. And of course I could be misreading (or Amazon could be misrepresenting) the exact nature of the failure - if it were a regular outage, none of the above would apply.
Erotic is when you use a feather. Exotic is when you use the whole chicken.
What if they insured with AIG?
:)
Who covers the cost then?
Endless arguing. Did or didn't amazon say that using the cloud you "protect your application from failure of a single location"? And did or didn't this happen? Answering the two question in the right order will explain what the OP meant even to you.
A region consists of multiple datacenters. 99.93% would be for 1 datacenter, not the region.
Camping on quad since 1996.
"Amazon EC2 provides developers the tools to build failure resilient applications and isolate themselves from failure scenarios."
Let's highlight the words that needs emphasis.
"provides", "developers", "tools"
As to whether the developers use them or not isn't always automatic.
"you can protect your applications from failure of a single location"
"can"
Highly available does not meant fault tolerance. The latter allows an application to continue functioning after a component failure. Regardless of the snake oil that has been thrown around, there is no silver bullet that can automagically enable application to be multi-node aware with no chance of deadlock or data corruption. You need to program for this. Again, tools are provided, but that doesn't mean everyone will use them. So in the absense of a fault tolerant application, the cloud provides high availability.
You could switch over to a disconnected UPS system every time there's a thunderstorm on the horizon, but that seems needlessly complicated and expensive.
Actually, that's NOT a bad idea at all. If you used fiber to the rack and you had big ugly relays that would open the connections, it might be a useful strategy in lightning country. It shouldn't be too hard to detect when lightning is striking nearby, and open the contacts. You would definitely need to do it per-rack at minimum though, because having a battery in every system is an ecological nightmare.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"