Slashdot Mirror


Amazon EBS Failure Brings Down Reddit, Imgur, Others

Several readers have sent word of a significant Amazon EBS outage. Quoting: "Amazon Web Services has confirmed that its Elastic Block Storage (EBS) service is experiencing degraded service, leading sites across the Internet to experience downtime, including Reddit, Imgur and many others. AWS confirmed on its status page at 2:11 p.m. ET that it is experiencing 'degraded performance for a small number of EBS volumes.' It says the issue is restricted to a single Availability Zone within the US-East-1 Region, which is in Northern Virginia. AWS later reported that its Relational Database Service (Amazon RDS) and its Elastic Beanstalk application plaform also experienced failures on Monday afternoon."

9 of 176 comments (clear)

  1. Other Victims by Revotron · · Score: 4, Informative

    Coursera is also down as a result.

  2. Single AZ my butt by Anonymous Coward · · Score: 3, Informative

    We are seeing EBS problems across multiple AZs with our services, as are many others. Amazon is downplaying the issue.

    See HN for ongoing discussion as well: http://news.ycombinator.com/

  3. Same region as the storm in June by bill_mcgonigle · · Score: 4, Informative

    Bad luck if you're hosted in the US-East-1 Region, I guess.

    Heh, I should really start advertising the LVS clusters I tend to as 'private clouds with better uptime than Amazon'.

    --
    My God, it's Full of Source!
    OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    1. Re:Same region as the storm in June by RulerOf · · Score: 3, Informative

      Real bad luck.

      Desk phones and SIP clients out for 2.5 hours for me. Calls rolled over at the provider level like they were supposed to though. Didn't think I'd have to put that to the test so soon.

      The server qualifies for the free tier, and that's probably why it just went straight unresponsive for two hours. Maybe I should upgrade to a slightly larger paid/reserved instance and..... Wait, I smell conspiracy.

      --
      Boot Windows, Linux, and ESX over the network for free.
  4. Re:Low Availability? by Anonymous Coward · · Score: 4, Informative

    >Reddit, Imgur, etc., don't have presences in multiple availability zones to prevent this kind of outage

    They do. It's a multi-AZ outage, despite what Amazon is saying.

  5. Re:Interestingly enough... by KodaK · · Score: 3, Informative

    All of those things were done here before they were done at reddit. You might want to get a new prescription for your rose colored glasses.

    --
    --J(K) DOS is like Unix in exactly the same way that a pinto is like an aircraft carrier.
  6. Re:Low Availability? by i_hate_robots · · Score: 2, Informative

    Multi AZ IS "completely geographically separate zones" and yes, you can specifically define which ones. Amazon is very clear that US East 1a,b,c,d are all the same physical data center. However, West is not. It's in Oregon (as opposed to VA for East) I've seen no evidence that true Multi AZ instances (as described by Amazon) are down. If you've got some though, I would be interested to see it because I would be pretty concerned.

  7. Re:Low Availability? by hawguy · · Score: 3, Informative

    Multi AZ IS "completely geographically separate zones" and yes, you can specifically define which ones.

    Amazon is very clear that US East 1a,b,c,d are all the same physical data center. However, West is not. It's in Oregon (as opposed to VA for East)

    I've seen no evidence that true Multi AZ instances (as described by Amazon) are down. If you've got some though, I would be interested to see it because I would be pretty concerned.

    Availability Zones are not geographically separate - regions are:

    http://aws.amazon.com/ec2/#features

    Availability Zones are distinct locations that are engineered to be insulated from failures in other Availability Zones and provide inexpensive, low latency network connectivity to other Availability Zones in the same Region. By launching instances in separate Availability Zones, you can protect your applications from failure of a single location. Regions consist of one or more Availability Zones, are geographically dispersed, and will be in separate geographic areas or countries

  8. Re:multi AZ? by c0lo · · Score: 3, Informative

    If they WERE using Multi AZ, or there is some other technical reason why it wouldn't help, I'm really curious to know why...

    Here's your answer: cascading failures.

    In short, the cascading failures don't happen because one local failure cause the entire capacity of the network to be exceeded... you see, it is not a case of every node connected to every node (O(N^2) connections), thus a failure only need to overload the capacity of the nodes connected to the failing one...

    --
    Questions raise, answers kill. Raise questions to stay alive.