Amazon EBS Failure Brings Down Reddit, Imgur, Others
Several readers have sent word of a significant Amazon EBS outage. Quoting:
"Amazon Web Services has confirmed that its Elastic Block Storage (EBS) service is experiencing degraded service, leading sites across the Internet to experience downtime, including Reddit, Imgur and many others. AWS confirmed on its status page at 2:11 p.m. ET that it is experiencing 'degraded performance for a small number of EBS volumes.' It says the issue is restricted to a single Availability Zone within the US-East-1 Region, which is in Northern Virginia. AWS later reported that its Relational Database Service (Amazon RDS) and its Elastic Beanstalk application plaform also experienced failures on Monday afternoon."
I have to admit, due to this outage I just logged in to Slashdot for the first time in a year. We're experiencing our own outages at work, unrelated to AWS, but I'd hate to be an AWS admin during one of these major outages. This makes me wonder why Reddit, Imgur, etc., don't have presences in multiple availability zones to prevent this kind of outage.
An honest question, why don't these large, big-name sites utilize the Multi Availability Zone failover that Amazon offers? It seems these AWS outages make for good headlines, but shouldn't any large site be co-located in multiple physical locations to ensure uptime? If they WERE using Multi AZ, or there is some other technical reason why it wouldn't help, I'm really curious to know why...
My first thoughts as well.
A friend was recently telling me about an issue they were having at work ... they host stuff for other people, and have very high-availability SLAs. Unfortunately, the support they have from some of their own internal people is "weekdays 9-5". So when an outage happened, they were dead in the water, because their own people basically said "sorry, we don't do after hours support".
Your SLA is only as good as your weakest link. Granted, some of these sites may not have SLAs, but if you have an external vendor providing some of this stuff, and their service levels suck, then your service level can't be any better.
For me, I can't see why companies would be willing to do this kind of thing. The risks are just too high.
Lost at C:>. Found at C.
I'm just glad I moved my hosting away from AWS. It seems they've had a few problems lately in their datacentres. Local Aussie hosting seems to have better bandwidth anyway.