Amazon EBS Failure Brings Down Reddit, Imgur, Others
Several readers have sent word of a significant Amazon EBS outage. Quoting:
"Amazon Web Services has confirmed that its Elastic Block Storage (EBS) service is experiencing degraded service, leading sites across the Internet to experience downtime, including Reddit, Imgur and many others. AWS confirmed on its status page at 2:11 p.m. ET that it is experiencing 'degraded performance for a small number of EBS volumes.' It says the issue is restricted to a single Availability Zone within the US-East-1 Region, which is in Northern Virginia. AWS later reported that its Relational Database Service (Amazon RDS) and its Elastic Beanstalk application plaform also experienced failures on Monday afternoon."
Productivity reached a record high this afternoon.
After 3 days without programming, life becomes meaningless
- The Tao of Programming
It's the cloud! It's like never like down, and webscale!
Since no one can go on reddit, they will come back to /. only to find out why reddit is down!
We're experiencing our own outages at work, unrelated to AWS, but I'd hate to be an AWS admin during one of these major outages.
I used to be an admin working on AWS through some of these outages, and it's not pleasant let me tell you. The amount of redundancy you need to get through this makes putting stuff in the cloud prohibitively expensive and things are basically out of your hands. When you run your own servers you know how long it will take to replace a piece of hardware or take emergency measures to keep things running. At least you know you have control over the process. Amazon? They recover what they can of your EBS disks in a few days without telling you anything and in the case of the European outage they actually screwed the EBS snapshots with a recovery job they ran. Thankfully I ran backups every night that took all data off Amazon's system. All I didn't know was when I could be back up and running.
Using AWS for throwaway computing where you just want some computing power for a few weeks of the year? Yes, fine. Permanently running stuff in it? Nope.
For me, I can't see why companies would be willing to do this kind of thing. The risks are just too high.
That's because you don't have an MBA.
They do. It's a multi-AZ outage, despite what Amazon is saying.
Amazon's multiple availability zones stuff is total bullshit. It has become painfully apparent during every single one of these outages that the so-called availability zones are not separate because an EBS problem propagates everywhere. No one can actually work the availability zones out either because what Amazon cunningly does is call zones by different letters for different customers, so availability zone 'a' for one might be availability zone 'c' for another so no one can actually compare. That fact alone sent my bullshit meter off the scale. It just seems excessively evasive and sneaky for my taste.
If you want redundancy you are going to have to go to completely geographically separate zones. Keeping those zones in sync is prohibitively expensive for the vast majority. Either that or you have a backup cloud provider, but again you have to be so paranoid and trust Amazon so little that you have to be able to have your data out and off Amazon's infrastructure at least nightly at a moment's notice. Sorry, but that just doesn't work.