Dark Day In the AWS Cloud: Big Name Sites Go Down

← Back to Stories (view on slashdot.org)

Dark Day In the AWS Cloud: Big Name Sites Go Down

Posted by timothy on Sunday August 25, 2013 @12:02PM from the central-authority-vs-resilience dept.

An outage of one company's servers might only affect that company's customers — but when a major data center for Amazon hits kinks, sites that rely on the AWS cloud services all suffer from the downtime. That's what happened today, when several major sites or online services (like Instagram and AirBnB) were knocked temporarily offline, evidently because of problems at an Amazon data center in Northern Virginia. From TechCrunch's coverage of the outage: "The deluge of tweets that accompanied the services’ initial hiccups first started at around 4 p.m. Eastern time, and only increased in intensity as users found they couldn’t share pictures of their food or their meticulously crafted video snippets. Some further poking around on Twitter and beyond revealed that some other services known to rely on AWS — Netflix, IFTTT, Heroku and Airbnb to name a few — have been experiencing similar issues today."

18 of 182 comments (clear)

Min score:

Reason:

Sort:

Re:Say what you will by Anonymous Coward · 2013-08-25 12:12 · Score: 5, Funny

In Soviet Russia, company's customers go down on YOU!
Re:Say what you will by rudy_wayne · 2013-08-25 12:23 · Score: 4, Interesting

One of the features of AWS was supposed to be the ability to reroute everything to a different datacenter if one goes down. I know I read that somewhere back when AWS was first starting up. You don't think they lied, do you?
Re:Say what you will by teknopurge · 2013-08-25 12:30 · Score: 5, Insightful

That's expensive. "Cloud" hosting services cost about 1.5x traditional hosting. When you want multiple locations("regions" in aws) you need to pay for resources in each additional region, then pay another cost to provide that failover. Cloud hosting is great, but it's nothing it does is new or cheaper than hosting 10 years ago.

--
Website Hosting
Has Rackspace had any outages in 10 years or so? by MillerHighLife21 · 2013-08-25 12:40 · Score: 5, Interesting

I've run servers on both Amazon and Rackspace for several years now and I can't recall a single instance of Rackspace having an outage. On the other hand, Amazon seems to have major issues at least 2 or 3 times a year. Is this stuff tracked anywhere?

--
"Don't teach a man to fish, feed yourself. He's a grown man. Fishing's not that hard." - Ron Swanson
Re:Say what you will by Anonymous Coward · 2013-08-25 12:47 · Score: 5, Insightful

No they didn't lie. You can set things up that way-simply set up your servers in multiple data centers(AWS availability zones) and load balance between them. It's foolish to just throw things up in the cloud and think magically I won't ever have to worry about downtime ever again. It's foolish-but a lot of companies act this way.
Somehow cloud hosting is taken as the silver bullet to prevent outages-it isn't. You still have to architect things the way you would normally if you're looking for things like disaster recovery, high availability, etc...etc..
Re:Say what you will by chrisgeleven · 2013-08-25 12:54 · Score: 5, Informative

Assuming you mean traditional round-robin A records, the timeout(s) you still have to suffer through would kill your latency.
If your talking about DNS providers (disclaimer, I work for Dyn) with advanced features that detect a failover event occurring and will only serve healthy A records, then that is a different story.
Re:Say what you will by Cyberax · 2013-08-25 13:38 · Score: 5, Interesting

Well, right now I have 500 machines running some heavy calculations in multiple AZs. Works perfectly fine, we have noticed the recent problems but simply stopped using the affected region (us-east-1) for the time being, shifting our calculations to other regions.

AWS is really great at scaling. It's better than anything else on the market, but it does require a lot of work.
Re:Say what you will by Glendale2x · 2013-08-25 13:40 · Score: 4, Interesting

No, you have to manage your own redundancy and failover on AWS. Look at all the effort Netflix has put into programming failover and stress testing and yet they still have frequent outages with AWS.

--
this is my sig
Re:Say what you will by Anonymous Coward · 2013-08-25 13:48 · Score: 5, Informative

either you don't speak English or you need to take your meds. no offense. so i'll try muddling a reply together for you.
There are many ways to setup remote failover systems. Most of them rely on some type of heartbeat system where there's a "heartbeat message" which they all send each other periodically, and if the current Active goes out of response for too long the others choose one to take over. So it doesn't matter if they're all in one room connected with a single switch, or spread all over the planet.
The real rub for any mechanism is DNS... if the primary server your FQDN points at drops then you might have redundancy but most people won't be able to take advantage of it. With more manual mechanisms (such as telling users "If our primary site goes down, try here instead!") that's not as much of a concern, just a PITA to keep track of.
Re:Say what you will by Anonymous Coward · 2013-08-25 14:05 · Score: 4, Informative

AWS Status Dashboard?
I know this is /., and people here don't like to read, but did anyone actually read the status dashboard posts?
This issue was limited to a single AZ, effected only a small number of machines, and was specifically an issue with added latency in EBS volumes. And Amazon completely resolved the issue in 4 hours.
So, call me crazy, but didn't they do exactly what they are supposed to do? Also, AWS quite clearly states that any given AZ *might* fail. Hence, if you want any sort of high-availability, you replicate across different AZs.
Plus, I have 10+ EC2 instances, and a number of other resources with AWS, and none of them were effected by this outage.
actually, no by Chirs · 2013-08-25 14:09 · Score: 5, Informative

"cloud" is sold as a *convenient* way to compute, where it's quick to add resources when needed so you can start small and scale up (and down) with demand.
It is *not* generally considered a cheap or particularly reliable solution. So far at least none of the cloud providers are offering five nines--if you want that, you should (for now at least) jbe looking at enterprise/telecom gear.
1. Re:actually, no by You're+All+Wrong · 2013-08-25 18:00 · Score: 4, Informative
  
  > It is *not* generally considered a cheap
  
  Quoth Forbes:
  Cost savingsâ€¦ [...] These are the advertised benefits of cloud computing
  
  Quoth Salesforce:
  4. Cap-Ex Free [...] no need for capital expenditure [...] minimal project start-up costs
  
  Quoth Verio:
  Achieve economies of scale [...] Reduce spending on technology infrastructure. [...] Globalize your workforce on the cheap [...] Reduce capital costs.
  
  And those were the first 3 hits for ``benefits of cloud computing'' (although the first one is meta, it refers to others refering to cost savings).
  
  I hate to shake you from your firmly entrenched world-view, but you have to know that people are touting cloud solutions as ones which have cost benefits. Whether they're valid claims or not is irrelevant, they are undeniably being made.
  
  --
  Your head of state is a corrupt weasel, I hope you're happy.
Everybody that is surprised is stupid... by gweihir · 2013-08-25 14:12 · Score: 4, Insightful

That things like this will happen with a cloud infrastructure are obvious. That the reliability claims made by the cloud providers are fantasy is also obvious. As soon as they start to do "uptime or else" (meaning you get tons of money as downtime compensation), things may be different. but they will not do that. At this time, the only thing you can do is change to a different cloud provider, which will have the same issues. Uptime guarantees without penalties when failed to meet them are worthless.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
1. Re:Everybody that is surprised is stupid... by VortexCortex · 2013-08-25 14:36 · Score: 4, Insightful
  
  We built a decentralized network called The Internet, even capable of withstanding global thermonuclear war -- packets rerouted moments after a city disappears from the mesh... And folks use data silos? Protip: Don't centralize services, that's daft in terms of both uptime and congestion.
Re:Has Rackspace had any outages in 10 years or so by CritterNYC · 2013-08-25 14:29 · Score: 5, Informative

It depends which data center you're in. PortableApps.com has been hosted at Rackspace for years and we had multiple major outtages due to ongoing power issues in the Dallas data center in 2009. The switch from grid to ups was failing and would take the whole wing of the data center out with every server crashing hard. It would take quite a while to come back up. Then we'd have to wait hours for the Rackspace folks to rebuild our corrupted database (fully managed account on a dedicated server). It happened two weekends in a row in June and one other time if I recall correctly, basically costing us a full day of downtime each time.

--
Portable versions of Firefox, GIMP, LibreOffice, etc
Wrong terminology? by elfprince13 · 2013-08-25 15:25 · Score: 5, Funny

Shouldn't this, technically speaking, be a "bright day" or a "sunny day"? After all, that's what I call it when the cloud-coverage breaks around here.
Re:Say what you will by Zemran · 2013-08-25 15:59 · Score: 5, Funny

"In Soviet Russia, company's customers go down on YOU!"
Now we know the truth about why Snowden went there...

--
I love stacking my barbecues in the shed at the end of summer - you can't beat a bit of grill on grill action.
Re:Say what you will by Zemran · 2013-08-25 16:02 · Score: 4, Insightful

"nothing it does is new or cheaper than hosting 10 years ago."
Welcome to the wonderful world of marketing. Sell people what they already have for 50% more.

--
I love stacking my barbecues in the shed at the end of summer - you can't beat a bit of grill on grill action.