Slashdot Mirror


Amazon Outage Shows Limits of Failover 'Zones'

jbrodkin writes "For cloud customers willing to pony up a little extra cash, Amazon has an enticing proposition: Spread your application across multiple availability zones for a near-guarantee that it won't suffer from downtime. 'By launching instances in separate Availability Zones, you can protect your applications from failure of a single location,' Amazon says in pitching its Elastic Compute Cloud service. But the availability zones are close together and can fail at the same time, as we saw today. The outage and ongoing attempts to restore service call into question the effectiveness of the availability zones, and put a spotlight on Amazon's failure to provide load balancing between the east and west coasts."

4 of 125 comments (clear)

  1. have your own servers by Dan667 · · Score: 4, Insightful

    or use a completely different company for redundancy. I think that is the lesson here.

    1. Re:have your own servers by rudy_wayne · · Score: 4, Insightful

      This incident illustrates once again why you need to put your stuff on your own servers and not someone else's. All computer systems will fail occasionally. There's no such thing as 100% uptime. However, when your own servers fail you can get your own people working on it right away and it's their number one priority. When your stuff is on someone else's servers, you're at their mercy. It will get fixed when they get around to it, and, they have more customers than just you, so you might not be first on the priority list. Or second. Or third. Or tenth.

  2. philosophical POV by Anonymous Coward · · Score: 2, Insightful

    I'll take the philosophical point of view on this and say failures are the best way to find and diagnose systemic weaknesses. Now Amazon knows the weakness in the AZs and can fix it.

  3. Re:Turning lemons into lemonade or...... by elohel · · Score: 3, Insightful

    Okay, I had to log in simply to comment on the stupidity of this statement. Aside from now being in violation of their own ToS (probably, at least in transgression of up-time guarantees), they're undoubtedly fiscally liable for refunding payment for the period of time in which services were unavailable or degraded. Additionally, this dramatically hurts their brand name - I know if I ever have to host anything on 'the cloud' (I can't believe I said it), this incident will be on my mind when the time comes for me to choose a provider. And before I stop beating this dead horse - think about what kind of liability Amazon would have, fiscally, for intentionally dropping services for revenue producing sites. One would imagine that Amazon would be fiscally liable for revenue losses during that downtime if this outage was intentional. That's no small amount of coin.