Dark Day In the AWS Cloud: Big Name Sites Go Down

← Back to Stories (view on slashdot.org)

Dark Day In the AWS Cloud: Big Name Sites Go Down

Posted by timothy on Sunday August 25, 2013 @12:02PM from the central-authority-vs-resilience dept.

An outage of one company's servers might only affect that company's customers — but when a major data center for Amazon hits kinks, sites that rely on the AWS cloud services all suffer from the downtime. That's what happened today, when several major sites or online services (like Instagram and AirBnB) were knocked temporarily offline, evidently because of problems at an Amazon data center in Northern Virginia. From TechCrunch's coverage of the outage: "The deluge of tweets that accompanied the services’ initial hiccups first started at around 4 p.m. Eastern time, and only increased in intensity as users found they couldn’t share pictures of their food or their meticulously crafted video snippets. Some further poking around on Twitter and beyond revealed that some other services known to rely on AWS — Netflix, IFTTT, Heroku and Airbnb to name a few — have been experiencing similar issues today."

11 of 182 comments (clear)

Min score:

Reason:

Sort:

Re:Say what you will by Anonymous Coward · 2013-08-25 12:12 · Score: 5, Funny

In Soviet Russia, company's customers go down on YOU!
Re:Say what you will by teknopurge · 2013-08-25 12:30 · Score: 5, Insightful

That's expensive. "Cloud" hosting services cost about 1.5x traditional hosting. When you want multiple locations("regions" in aws) you need to pay for resources in each additional region, then pay another cost to provide that failover. Cloud hosting is great, but it's nothing it does is new or cheaper than hosting 10 years ago.

--
Website Hosting
Has Rackspace had any outages in 10 years or so? by MillerHighLife21 · 2013-08-25 12:40 · Score: 5, Interesting

I've run servers on both Amazon and Rackspace for several years now and I can't recall a single instance of Rackspace having an outage. On the other hand, Amazon seems to have major issues at least 2 or 3 times a year. Is this stuff tracked anywhere?

--
"Don't teach a man to fish, feed yourself. He's a grown man. Fishing's not that hard." - Ron Swanson
Re:Say what you will by Anonymous Coward · 2013-08-25 12:47 · Score: 5, Insightful

No they didn't lie. You can set things up that way-simply set up your servers in multiple data centers(AWS availability zones) and load balance between them. It's foolish to just throw things up in the cloud and think magically I won't ever have to worry about downtime ever again. It's foolish-but a lot of companies act this way.
Somehow cloud hosting is taken as the silver bullet to prevent outages-it isn't. You still have to architect things the way you would normally if you're looking for things like disaster recovery, high availability, etc...etc..
Re:Say what you will by chrisgeleven · 2013-08-25 12:54 · Score: 5, Informative

Assuming you mean traditional round-robin A records, the timeout(s) you still have to suffer through would kill your latency.
If your talking about DNS providers (disclaimer, I work for Dyn) with advanced features that detect a failover event occurring and will only serve healthy A records, then that is a different story.
Re:Say what you will by Cyberax · 2013-08-25 13:38 · Score: 5, Interesting

Well, right now I have 500 machines running some heavy calculations in multiple AZs. Works perfectly fine, we have noticed the recent problems but simply stopped using the affected region (us-east-1) for the time being, shifting our calculations to other regions.

AWS is really great at scaling. It's better than anything else on the market, but it does require a lot of work.
Re:Say what you will by Anonymous Coward · 2013-08-25 13:48 · Score: 5, Informative

either you don't speak English or you need to take your meds. no offense. so i'll try muddling a reply together for you.
There are many ways to setup remote failover systems. Most of them rely on some type of heartbeat system where there's a "heartbeat message" which they all send each other periodically, and if the current Active goes out of response for too long the others choose one to take over. So it doesn't matter if they're all in one room connected with a single switch, or spread all over the planet.
The real rub for any mechanism is DNS... if the primary server your FQDN points at drops then you might have redundancy but most people won't be able to take advantage of it. With more manual mechanisms (such as telling users "If our primary site goes down, try here instead!") that's not as much of a concern, just a PITA to keep track of.
actually, no by Chirs · 2013-08-25 14:09 · Score: 5, Informative

"cloud" is sold as a *convenient* way to compute, where it's quick to add resources when needed so you can start small and scale up (and down) with demand.
It is *not* generally considered a cheap or particularly reliable solution. So far at least none of the cloud providers are offering five nines--if you want that, you should (for now at least) jbe looking at enterprise/telecom gear.
Re:Has Rackspace had any outages in 10 years or so by CritterNYC · 2013-08-25 14:29 · Score: 5, Informative

It depends which data center you're in. PortableApps.com has been hosted at Rackspace for years and we had multiple major outtages due to ongoing power issues in the Dallas data center in 2009. The switch from grid to ups was failing and would take the whole wing of the data center out with every server crashing hard. It would take quite a while to come back up. Then we'd have to wait hours for the Rackspace folks to rebuild our corrupted database (fully managed account on a dedicated server). It happened two weekends in a row in June and one other time if I recall correctly, basically costing us a full day of downtime each time.

--
Portable versions of Firefox, GIMP, LibreOffice, etc
Wrong terminology? by elfprince13 · 2013-08-25 15:25 · Score: 5, Funny

Shouldn't this, technically speaking, be a "bright day" or a "sunny day"? After all, that's what I call it when the cloud-coverage breaks around here.
Re:Say what you will by Zemran · 2013-08-25 15:59 · Score: 5, Funny

"In Soviet Russia, company's customers go down on YOU!"
Now we know the truth about why Snowden went there...

--
I love stacking my barbecues in the shed at the end of summer - you can't beat a bit of grill on grill action.