Amazon EBS Failure Brings Down Reddit, Imgur, Others

I hope this doesn't affect Facebook. by Anonymous Coward · 2012-10-22 08:05 · Score: 2, Funny

Or else my afternoon is going to totally suck.

Re:I hope this doesn't affect Facebook. by IonOtter · 2012-10-22 08:07 · Score: 1

Nope, FB is alive and well.
To be fair, I find a lot more entertainment in Reddit and Imgur than FB...

--
[End Of Line]
Re:I hope this doesn't affect Facebook. by sortius_nod · 2012-10-22 08:24 · Score: 4, Interesting

I'm just glad I moved my hosting away from AWS. It seems they've had a few problems lately in their datacentres. Local Aussie hosting seems to have better bandwidth anyway.
Re:I hope this doesn't affect Facebook. by helix2301 · 2012-10-22 10:16 · Score: 1

This is the problem with non self hosted sites your at someone else mercy.

--
http://www.thetechnologygeek.org
Re:I hope this doesn't affect Facebook. by SpazmodeusG · 2012-10-22 10:26 · Score: 1

Local Aussie hosting is easily double the cost though. I have servers with Crucial Paradigm Australia and Crucial Paradigm USA. The websites appear just as fast to the average user but the USA hosting is 1/3rd the price.
Re:I hope this doesn't affect Facebook. by foniksonik · 2012-10-22 11:40 · Score: 1

If you primarily serve Australia then a local host is fine. If you serve an international audience that host is going to have poor latency for a majority of your visitors.
AWS is an option. You could also use an edge caching service like Akamai. Akamai is likely much more expensive than AWS.

--
A fool throws a stone into a well and a thousand sages can not remove it.
Re:I hope this doesn't affect Facebook. by eWarz · 2012-10-22 13:31 · Score: 1

My afternoon did totally suck, but only because we use AWS.
Re:I hope this doesn't affect Facebook. by darguskelen · 2012-10-22 13:54 · Score: 2

TBH self hosted sites are at the mercy of your ISP or Data center.
Re:I hope this doesn't affect Facebook. by McFadden · 2012-10-22 16:12 · Score: 1

AWS's CloudFront is an edge delivery network.

Productivity up by Phisbut · 2012-10-22 08:05 · Score: 5, Funny

Productivity reached a record high this afternoon.

--
After 3 days without programming, life becomes meaningless
- The Tao of Programming

Re:Productivity up by fustakrakich · 2012-10-22 08:16 · Score: 5, Funny

Should we expect a baby boom in nine months?

--
“He’s not deformed, he’s just drunk!”
Re:Productivity up by Anonymous Coward · 2012-10-22 08:21 · Score: 3, Funny

More blind and hairy hand people, probably.
Re:Productivity up by interkin3tic · 2012-10-22 08:25 · Score: 1

If this lasts all week, and slashdot goes down, who knows what heights I could attain! Maybe two or three levels in borderlands 2!
Re:Productivity up by MichaelSmith · 2012-10-22 08:48 · Score: 4, Funny

Not from reddit users (or slashdotters for that matter).

--
http://michaelsmith.id.au
Re:Productivity up by Gripp · 2012-10-22 09:16 · Score: 1

... I actually spent more time trying to figure out why it wasn't working for me, but was for other, than I normally spend on reddit ... so, not so much!
Re:Productivity up by bigtrike · 2012-10-22 10:07 · Score: 1

With imgur down, don't you mean less blind and hairy people?

But But But by Anonymous Coward · 2012-10-22 08:05 · Score: 5, Insightful

It's the cloud! It's like never like down, and webscale!

Re:But But But by Anonymous Coward · 2012-10-22 08:32 · Score: 0

Unfortunately they don't use MongoDB. If they dod, this would have never happened.
Re:But But But by shentino · 2012-10-22 09:49 · Score: 1

dod
I'm afraid your typo is indefensible.
Re:But But But by Anonymous Coward · 2012-10-22 10:20 · Score: 0

It's the cloud! It's like never like down, and webscale!
Yeah, because local site hosted servers are never down or overloaded.
Re:But But But by thetoadwarrior · 2012-10-23 06:00 · Score: 1

What a shame people couldn't look at cat pictures for a few hours. Apparently they forget the days when was far more common for sites to be down for days.

Interestingly enough... by Anonymous Coward · 2012-10-22 08:06 · Score: 5, Funny

Since no one can go on reddit, they will come back to /. only to find out why reddit is down!

Re:Interestingly enough... by KodaK · 2012-10-22 08:14 · Score: 1

Confirmed.

--
--J(K) DOS is like Unix in exactly the same way that a pinto is like an aircraft carrier.
Re:Interestingly enough... by maxwell+demon · 2012-10-22 08:36 · Score: 2, Funny

Hey, confirming is Netcraft's job!

--
The Tao of math: The numbers you can count are not the real numbers.
Re:Interestingly enough... by Tenareth · 2012-10-22 08:38 · Score: 1

And it worked, too.

--
This sig is the express property of someone.
Re:Interestingly enough... by Anonymous Coward · 2012-10-22 08:43 · Score: 1

Long string of responses? Increasingly off-topic? Heavy use of in-jokes? Yep, these are redditors!
Re:Interestingly enough... by KodaK · 2012-10-22 08:46 · Score: 3, Informative

All of those things were done here before they were done at reddit. You might want to get a new prescription for your rose colored glasses.

--
--J(K) DOS is like Unix in exactly the same way that a pinto is like an aircraft carrier.
Re:Interestingly enough... by Anonymous Coward · 2012-10-22 08:54 · Score: 1

All of those things were done here before they were done at reddit. You might want to get a new prescription for your rose colored glasses.
But you have to give them credit, Reddit is tops when it comes to off-topic meme bullshit and oversized, lousy photoshops in the "me too" reply that every single fucking person on that site has to throw behind a post.
Re:Interestingly enough... by Anonymous Coward · 2012-10-22 09:04 · Score: 1

Long strings of nonsense happen here as well, but they are the exception at slashdot, and the rule at reddit.
Too many stories there contain slews of threads that go on complete tangents once they are 2-3 comments deep. And within those threads, most comments are single-word or single-line, nearly all meme-based.
Methinks CmdrTaco was right when he replaced the karma counter with the generalized karma status. Collecting points becomes a stupid game that undermines conversation.
Re:Interestingly enough... by babywhiz · 2012-10-22 09:18 · Score: 1

If I had points, I'd so give em here...
Re:Interestingly enough... by spazdor · 2012-10-22 09:32 · Score: 0

credit, Reddit.
I love you.
[sing to the tune of Dammit Janet]

--
DRM: Terminator crops for your mind!
Re:Interestingly enough... by Galestar · 2012-10-22 09:35 · Score: 1

Pretty much why I'm here...

--
AccountKiller
Re:Interestingly enough... by Anonymous Coward · 2012-10-22 11:19 · Score: 0

Redditor detector returns a null response. No pun thread found.
Re:Interestingly enough... by Anonymous Coward · 2012-10-22 12:28 · Score: 0

Yup, Slashdot is my friends with benefits reliable booty call
Re:Interestingly enough... by Captain.Abrecan · 2012-10-22 23:41 · Score: 0

It got a lot better when I registered and unsubscribed from everything I didn't like. I haven't seen a meme in about a year now, ymmv
Re:Interestingly enough... by Anonymous Coward · 2012-10-23 03:31 · Score: 0

Pretty sure Reddit occasionally just collapses from the weight of the huge douchiness of its commenters.

Other Victims by Revotron · 2012-10-22 08:06 · Score: 4, Informative

Coursera is also down as a result.

define "leading" ... by magarity · 2012-10-22 08:06 · Score: 4, Funny

/. is working just fine.

Are those karma points in the mail?

Re:define "leading" ... by Anonymous Coward · 2012-10-22 08:22 · Score: 1

Personally I only come to slashdot when reddit is down.

Oblig by sortius_nod · 2012-10-22 08:06 · Score: 4, Funny

It's as if millions of geek voices cried out in terror & were suddenly silenced.

Re:Oblig by Anonymous Coward · 2012-10-22 08:15 · Score: 0

Alternatively with Reddit in mind:
And there was much rejoicing.
Re:Oblig by Anonymous Coward · 2012-10-22 08:20 · Score: 1

How about: andnothingofvaluewaslost?
Re:Oblig by sortius_nod · 2012-10-22 08:23 · Score: 2

Holy shit, when did memes get banned from the internet?
Re:Oblig by jeffmeden · 2012-10-22 08:31 · Score: 3, Funny

Holy shit, when did memes get banned from the internet?
reddit is down, he is expecting to see nothing but NEW shitty in-jokes and hasty photoshops as he takes refuge from the storm... your attempt to re-use old humor would normally earn you a downvote but he cant find the thumb buttons on this jalopy of a website.
Re:Oblig by Bigby · 2012-10-22 08:35 · Score: 2

If a geek cries out in terror and there's not site to read it on, do they really cry out in terror?
Re:Oblig by michrech · 2012-10-22 08:45 · Score: 1

I think Natalie Portman and the Hot Grits had something to do with it...

Holy shit, when did memes get banned from the internet?

--
bork bork bork!
Re:Oblig by Anonymous Coward · 2012-10-22 09:15 · Score: 0

That works too.

Single AZ my butt by Anonymous Coward · 2012-10-22 08:07 · Score: 3, Informative

We are seeing EBS problems across multiple AZs with our services, as are many others. Amazon is downplaying the issue.

See HN for ongoing discussion as well: http://news.ycombinator.com/

Re:Single AZ my butt by SlippyToad · 2012-10-22 08:44 · Score: 1

Yeah, I'm in southern Indiana and Reddit has been down all day.

--
One day I feel I'm ahead of the wheel / the next it's rolling over me / I can get back on / I can get back on
Re:Single AZ my butt by Anonymous Coward · 2012-10-22 08:57 · Score: 0, Funny

If you're in southern Indiana, you have bigger problems than Reddit being down.
Re:Single AZ my butt by tibman · 2012-10-22 09:07 · Score: 1

Like when you ask for Tea and the waitress doesn't know if you mean southerner Sweet Tea or northerner style Unsweetened Tea?

--
http://soylentnews.org/~tibman
Re:Single AZ my butt by babywhiz · 2012-10-22 09:21 · Score: 1

EVERYONE knows that just Tea = Unsweetened Tea. No matter where you go, you have to ask for Sweet Tea to get it Southern style. Oh God. Sorry for just Redditing this thread.
Re:Single AZ my butt by tombeard · 2012-10-22 10:48 · Score: 1

Here in the South, if you just ask for "tea" you will get horribly over sweetened iced tea. You can ask for "unsweet" tea or hot tea. Hot tea is served unsweetened. If self serve dispensers are available it is common for those of reasonable taste to use sweet tea to sweeten their unsweet tea. Don't ask for it mixed though, counter monkeys don't do complicated.

--
The reason we subjugate ourselves to law is to better procure justice. If law does not accomplish this purpose then it m
Re:Single AZ my butt by Anonymous Coward · 2012-10-22 11:32 · Score: 0

Why that low, lying cloud...
More like Amazon Fog.

Same region as the storm in June by bill_mcgonigle · 2012-10-22 08:07 · Score: 4, Informative

Bad luck if you're hosted in the US-East-1 Region, I guess.

Heh, I should really start advertising the LVS clusters I tend to as 'private clouds with better uptime than Amazon'.

--
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)

Re:Same region as the storm in June by malakai · 2012-10-22 08:44 · Score: 2

According to amazon, it's not an outage, it's a "performance disruption". My guess is, this will negate costly concessions based on SLA's.

--
-Malakai
A Dragon Lives in my Garage
Re:Same region as the storm in June by RulerOf · 2012-10-22 08:58 · Score: 3, Informative

Real bad luck.

Desk phones and SIP clients out for 2.5 hours for me. Calls rolled over at the provider level like they were supposed to though. Didn't think I'd have to put that to the test so soon.

The server qualifies for the free tier, and that's probably why it just went straight unresponsive for two hours. Maybe I should upgrade to a slightly larger paid/reserved instance and..... Wait, I smell conspiracy.

--
Boot Windows, Linux, and ESX over the network for free.
Re:Same region as the storm in June by Anonymous Coward · 2012-10-22 09:01 · Score: 1

According to amazon, it's not an outage, it's a "performance disruption". My guess is, this will negate costly concessions based on SLA's.
And at the Springfield Nuclear Power Plant it wasn't a "meltdown", it was a "fission surplus".
Re:Same region as the storm in June by Patch86 · 2012-10-22 09:21 · Score: 1

Unless they have performance SLAs as well as uptime SLAs. Which they really should. Who the hell would move their system/site to a server hosting business without a performance SLA? I mean, you wanted 23 second page load times on your site, right?
Re:Same region as the storm in June by Third+Normal+Form · 2012-10-22 10:52 · Score: 1

FWIW, I'm running a free-tier-for-now micro instance in us-east (luckily not using RDS, I'll run my own databases thankyouverymuch), and everything has been normal today.
During the episode with the storms over the summer, I saw my steal% in sar spike considerably- I assume some reddit nodes were moved on to our quiet little hypervisor, and the sheer volume of cat pictures was probably affecting everybody.
Re:Same region as the storm in June by mayko · 2012-10-22 12:02 · Score: 1

Well don't worry about the conspiracy. Here we spend hundreds of thousands of dollars a month with AWS, much of that is reserved instances and nearly all of our US-East instances were affected. Mostly in the "east-1b" AZ, but it was not isolated to that. Anything using RDS in that region is still down.
Re:Same region as the storm in June by dotancohen · 2012-10-22 13:00 · Score: 1

The server qualifies for the free tier, and that's probably why it just went straight unresponsive for two hours. Maybe I should upgrade to a slightly larger paid/reserved instance and..... Wait, I smell conspiracy.
I'm right now hacking away at an EC2 instance with an EBS volume in the affected region, with no disruptions. The EC2 is an "Extra Large Instance" (need it for the IOPS more than the CPU or memory), though I don't think that matters so far as EBS is concerned.

--
It is dangerous to be right when the government is wrong.
Re:Same region as the storm in June by lonecrow · 2012-10-23 04:56 · Score: 1

If memory serves this was the same AZ that had troubles the other year. My memory should be pretty good since this is the AZ that my server is in and I am trying to understand what I did in a past life to be in the one AZ out of more then 30 that keeps having troubles :(

Low Availability? by mkosmo · 2012-10-22 08:08 · Score: 4, Interesting

I have to admit, due to this outage I just logged in to Slashdot for the first time in a year. We're experiencing our own outages at work, unrelated to AWS, but I'd hate to be an AWS admin during one of these major outages. This makes me wonder why Reddit, Imgur, etc., don't have presences in multiple availability zones to prevent this kind of outage.

Re:Low Availability? by Anonymous Coward · 2012-10-22 08:09 · Score: 4, Informative

>Reddit, Imgur, etc., don't have presences in multiple availability zones to prevent this kind of outage
They do. It's a multi-AZ outage, despite what Amazon is saying.
Re:Low Availability? by ShaggusMacHaggis · 2012-10-22 08:14 · Score: 1

yeah my ec2 instance that is hosted in east-1a is up and the management console tells me it's just east-1d that is down...but i have a hard time believing that
Re:Low Availability? by Anonymous Coward · 2012-10-22 08:15 · Score: 0

They may but it doesn't seem as if there sites are capable of running without all the zones being up.
Re:Low Availability? by i_hate_robots · 2012-10-22 08:20 · Score: 0

how do you know it's a multi-az outage?
Re:Low Availability? by segedunum · 2012-10-22 08:20 · Score: 5, Interesting

We're experiencing our own outages at work, unrelated to AWS, but I'd hate to be an AWS admin during one of these major outages.
I used to be an admin working on AWS through some of these outages, and it's not pleasant let me tell you. The amount of redundancy you need to get through this makes putting stuff in the cloud prohibitively expensive and things are basically out of your hands. When you run your own servers you know how long it will take to replace a piece of hardware or take emergency measures to keep things running. At least you know you have control over the process. Amazon? They recover what they can of your EBS disks in a few days without telling you anything and in the case of the European outage they actually screwed the EBS snapshots with a recovery job they ran. Thankfully I ran backups every night that took all data off Amazon's system. All I didn't know was when I could be back up and running.

Using AWS for throwaway computing where you just want some computing power for a few weeks of the year? Yes, fine. Permanently running stuff in it? Nope.
Re:Low Availability? by segedunum · 2012-10-22 08:30 · Score: 5, Interesting

They do. It's a multi-AZ outage, despite what Amazon is saying.
Amazon's multiple availability zones stuff is total bullshit. It has become painfully apparent during every single one of these outages that the so-called availability zones are not separate because an EBS problem propagates everywhere. No one can actually work the availability zones out either because what Amazon cunningly does is call zones by different letters for different customers, so availability zone 'a' for one might be availability zone 'c' for another so no one can actually compare. That fact alone sent my bullshit meter off the scale. It just seems excessively evasive and sneaky for my taste.

If you want redundancy you are going to have to go to completely geographically separate zones. Keeping those zones in sync is prohibitively expensive for the vast majority. Either that or you have a backup cloud provider, but again you have to be so paranoid and trust Amazon so little that you have to be able to have your data out and off Amazon's infrastructure at least nightly at a moment's notice. Sorry, but that just doesn't work.
Re:Low Availability? by segedunum · 2012-10-22 08:31 · Score: 4, Insightful

Remember that availability zone 'a' might be 'd' for others. Amazon does not let you work out what availability zones everyone really has.
Re:Low Availability? by Anonymous Coward · 2012-10-22 08:41 · Score: 1

Seems to me that the answer is just to host things yourself, instead of relying on another company's infrastructure.
Re:Low Availability? by eln · 2012-10-22 08:46 · Score: 4, Funny

That's old Web-2.0 thinking. We're in the era of the cloud now, and the cloud is magic. Trust the cloud.
Re:Low Availability? by segedunum · 2012-10-22 08:52 · Score: 4, Interesting

....and in the case of the European outage they actually screwed the EBS snapshots with a recovery job they ran. Thankfully I ran backups every night that took all data off Amazon's system. All I didn't know was when I could be back up and running.
I felt this was worth emphasising. These are EBS snapshots, not just the EBS disks - the ones supposedly stored in S3 and immune to corruption. Your backups, in other words. If you use RDS you rely on these completely for backup.

AWS is OK to get yourself up and running without paying huge amounts up front for hardware, but be aware that you just simply cannot trust this infrastructure.
Re:Low Availability? by Mephistophocles · 2012-10-22 08:54 · Score: 1

It's a multi-AZ outage, despite what Amazon is saying.
And/or AZ's aren't quite as physically isolated as Amazon makes out, which I've suspected for a while.

--
Deja Moo: The distinct feeling that you've heard this bull before.
Re:Low Availability? by i_hate_robots · 2012-10-22 08:55 · Score: 2, Informative

Multi AZ IS "completely geographically separate zones" and yes, you can specifically define which ones. Amazon is very clear that US East 1a,b,c,d are all the same physical data center. However, West is not. It's in Oregon (as opposed to VA for East) I've seen no evidence that true Multi AZ instances (as described by Amazon) are down. If you've got some though, I would be interested to see it because I would be pretty concerned.
Re:Low Availability? by hawguy · 2012-10-22 09:06 · Score: 4, Insightful

Seems to me that the answer is just to host things yourself, instead of relying on another company's infrastructure.
How do you host anything without relying on another company's infrastructure? Do you purchase right-of-way's between your site and all of your customers and string your own fiber? Do you run your own power plant? Do you build your own UPS, right down to the batteries so you don't need to trust a UPS vendor? Do you build and service your own CRAC's?
It's impossible for any company to *not* rely on another company's infrastructure even if just for internet connectivity, the only question is where to draw the line - do you really want to rack and stack your own servers? Do you trust a vendor to do periodic preventative maintenance on your generators, or do you use your own staff? Do you certify your own staff to service your fire suppression system, or do you contract out to a vendor? Do you want to own your own network equipment and do your own network admin? Do you want to swap out servers and disk drives when they fail? Do you keep staff electricians on-hand to take care of electrical issues? Do you want to run a 24x7 NOC to monitor and maintain your datacenter?
While a large company may be able to keep many of these tasks in-house, many small companies can't afford the staff it would take to control all of their infrastructure.
Re:Low Availability? by hawguy · 2012-10-22 09:10 · Score: 3, Informative

Multi AZ IS "completely geographically separate zones" and yes, you can specifically define which ones.
Amazon is very clear that US East 1a,b,c,d are all the same physical data center. However, West is not. It's in Oregon (as opposed to VA for East)
I've seen no evidence that true Multi AZ instances (as described by Amazon) are down. If you've got some though, I would be interested to see it because I would be pretty concerned.
Availability Zones are not geographically separate - regions are:
http://aws.amazon.com/ec2/#features

Availability Zones are distinct locations that are engineered to be insulated from failures in other Availability Zones and provide inexpensive, low latency network connectivity to other Availability Zones in the same Region. By launching instances in separate Availability Zones, you can protect your applications from failure of a single location. Regions consist of one or more Availability Zones, are geographically dispersed, and will be in separate geographic areas or countries
Re:Low Availability? by segedunum · 2012-10-22 09:17 · Score: 4, Interesting

Multi AZ IS "completely geographically separate zones" and yes...
Availability zones are not geographically separate nor is there any evidence that they are geographically or even logically separate from the nature of every major EBS outage there has been.

Amazon is very clear that US East 1a,b,c,d are all the same physical data center. However, West is not. It's in Oregon (as opposed to VA for East)
a, b, c and d are availability zones. US East, West etc. are different regions. I'm afraid you're not understanding just what is meant by availability zones or just muddying the waters.

I've seen no evidence that true Multi AZ instances (as described by Amazon) are down. If you've got some though, I would be interested to see it because I would be pretty concerned.
As I've said above, Amazon makes it as difficult as possible to verify availability zone failures because AZ 'a' for one customer might be 'c' for another and 'b' for another, so you can't verify anything with others. However, it becomes very clear when you get on Amazon's forums and look at major sites that have implemented in multiple zones from their perspective that they are down and have EBS problems in different zones they have. You don't get much more evidence than that.

If you're not concerned when looking at that then I smell some apologism I'm afraid.
Re:Low Availability? by Anonymous Coward · 2012-10-22 09:24 · Score: 0

I block amazonaws anyway so it doesn't matter which sites are affected.
Re:Low Availability? by aaarrrgggh · 2012-10-22 09:41 · Score: 2

No, most companies can't take full control of their infrastructure. But, they can diversify across two providers, and try to ensure that there is no major work in common change control windows. In a perfect world, you would have hosted services that support 100% of your peak needs, plus a hot disaster recovery site in your own facility that can handle your full average load.
Unfortunately, I am sure that there is some sorry company out there that says "Let's PM all of our generators at the same time this weekend!"
Re:Low Availability? by Anonymous Coward · 2012-10-22 09:41 · Score: 0

We're experiencing our own outages at work, unrelated to AWS,
Wouldn't it be interesting if it turned out that your work outages *were* related to AWS, and you just didn't know it?
Re:Low Availability? by Anonymous Coward · 2012-10-22 11:48 · Score: 0

I haven't tried it, but this guy claims it is possible to work out the true availability zone:
http://alestic.com/2009/07/ec2-availability-zones
Re:Low Availability? by eWarz · 2012-10-22 13:41 · Score: 2

I'm afraid to say, you guys are doing it wrong. Currently building an eCommerce platform that scales across any server, even if said servers are across multiple providers. Oh and it'll only cost us about a hundred bucks a month. The cloud isn't about throwaway computing, the cloud is about scalable applications. If you use EC2 for static hosting you are doing it wrong.
Re:Low Availability? by petsounds · 2012-10-22 20:35 · Score: 2

Caution: Magic Cloud may suddenly accelerate to dangerous speeds.
Do not taunt Magic Cloud.
Warning: Failure to believe in Magic Cloud may result in a targeted nuclear strike in your availability zone.
Magic Cloud should not be used if you are feeling angry.
Never ask Magic Cloud to play a game.
Magic Cloud: satisfaction guaranteed!*
(*) Except for satisfaction-free areas. Please consult your Service Level Agreement for more information.
Re:Low Availability? by Compaqt · 2012-10-22 21:36 · Score: 1

You forgot one:
Magic cloud has Super Cow Powers (if you believe in it).

--
I'm not a lawyer, but I play one on the Internet. Blog
Re:Low Availability? by Anonymous Coward · 2012-10-23 00:20 · Score: 1

While to you it may be AZ east-1d that is "down", to others it may be another zone. In other words each account has the AZ's labeled differently. This is because most people will use east-1a (or west 1a) instead of 1b, 1c... when deploying instances and possibly overload 1a. So basically all we know is one AZ went down. BTW, the AZ never went down, some of the EBS storage experienced performance "slowness" that would result in instances not responding. This happened to me, a SQL server's data drive was unresponsive thus making SQL unresponsive. I did have another SQL server mirrored in another AZ.
Re:Low Availability? by lonecrow · 2012-10-23 05:00 · Score: 1

What? Like run your server under your desk at home powered by solar cells on your roof so your not relying on the power company? and getting an uplink...somehow?

Face it, everything is contingent. I run on AWS now and I used to run a dedicated physical server hosted at the planet, and before that I used to host from under my desk on consumer ADSL. Please explain which of those makes me more or less dependent on services provided by others.

If only there was an alternative... by Anonymous Coward · 2012-10-22 08:09 · Score: 0

http://www.rackspace.com/blog/rackspace-cloud-block-storage-making-progress-towards-a-fall-release/

"Nearing fall release"?!? Help us out!

Bright and Sunny Skies Today! by IonOtter · 2012-10-22 08:09 · Score: 4, Insightful

Do you still think that putting your digital life in the "cloud", without any ability to fall back on a physical hard drive or device, is a good idea?

--
[End Of Line]

Re:Bright and Sunny Skies Today! by Anonymous Coward · 2012-10-22 08:12 · Score: 0

yes
Re:Bright and Sunny Skies Today! by Anonymous Coward · 2012-10-22 08:15 · Score: 3, Insightful

Because physical servers don't ever fail?
Re:Bright and Sunny Skies Today! by gstoddart · 2012-10-22 08:16 · Score: 4, Interesting

Do you still think that putting your digital life in the "cloud", without any ability to fall back on a physical hard drive or device, is a good idea?
My first thoughts as well.
A friend was recently telling me about an issue they were having at work ... they host stuff for other people, and have very high-availability SLAs. Unfortunately, the support they have from some of their own internal people is "weekdays 9-5". So when an outage happened, they were dead in the water, because their own people basically said "sorry, we don't do after hours support".
Your SLA is only as good as your weakest link. Granted, some of these sites may not have SLAs, but if you have an external vendor providing some of this stuff, and their service levels suck, then your service level can't be any better.
For me, I can't see why companies would be willing to do this kind of thing. The risks are just too high.

--
Lost at C:>. Found at C.
Re:Bright and Sunny Skies Today! by Anonymous Coward · 2012-10-22 08:23 · Score: 0

Yes
Re:Bright and Sunny Skies Today! by Anonymous Coward · 2012-10-22 08:23 · Score: 5, Funny

For me, I can't see why companies would be willing to do this kind of thing. The risks are just too high.
That's because you don't have an MBA.
Re:Bright and Sunny Skies Today! by Frosty+Piss · 2012-10-22 08:31 · Score: 1

So when an outage happened, they were dead in the water, because their own people basically said "sorry, we don't do after hours support".
This is not a system failure, it's a Human Resources failure.

--
If you want news from today, you have to come back tomorrow.
Re:Bright and Sunny Skies Today! by Anonymous Coward · 2012-10-22 08:31 · Score: 0

Do you still think that putting your digital life in the "cloud", without any ability to fall back on a physical hard drive or device, is a good idea?
The "cloud" is nothing more than somebody else's physical hard drive or device.
In other words, Somebody Else's Problem.
Re:Bright and Sunny Skies Today! by rrohbeck · 2012-10-22 08:32 · Score: 2

No but you can make them reliable if needed.
In the cloud you're at the mercy of the beancounters at Amazon & co.

--
thegodmovie.com - watch it
Re:Bright and Sunny Skies Today! by TubeSteak · 2012-10-22 08:35 · Score: 1

Your SLA is only as good as your weakest link.
It seems like Amazon's weakest link is Virginia.
I recall from the last Amazon outage thread on /. that Virginia seems to be the epicenter for epic fail.

--
[Fuck Beta]
o0t!
Re:Bright and Sunny Skies Today! by hawguy · 2012-10-22 08:51 · Score: 4, Insightful

Your SLA is only as good as your weakest link. Granted, some of these sites may not have SLAs, but if you have an external vendor providing some of this stuff, and their service levels suck, then your service level can't be any better.
For me, I can't see why companies would be willing to do this kind of thing. The risks are just too high.
Because many companies are not willing to spend what it takes to get availability greater than what they can get at Amazon - especially if they take advantage of multi-AZ or multi-region redundancy.
Sure, having a physical server at the office that you know you can fix by buying parts at the local computer store sounds attractive. Until the day you find that your building has burnt to the ground. Or a truck knocked over the utility pole providing network and electricity to your building. Or you discover that when you looked at the flood maps to make sure you weren't in a flood zone, the maps didn't account for a water main breaking and flooding the basement where your telecom equipment is... or the clogged roof drains that let 20,000 gallons of water to build up on the roof during a rainstorm until the roof collapsed and flooded your datacenter. Or the earthquake (or hurricane or tornado or flood or whatever) that takes down your site for days or weeks or even months, and your employees are more concerned with surviving than trying to get your critical systems back online.
Meeting an SLA for your own facility only works when that facility is running, and often the company that rents office space has little control over the facility.
My company has a number critical services running in one Amazon region with replication to a second region for failover. The second region costs very little, just a single instance to hold data replicated from the primary instance, then if we need to spin up the servers in the secondary region, it takes about 10 minutes to push the data from the local copy to the other servers once we start them up.
We could automate the whole process, but Amazon problems are rare enough that it hasn't been worth it.
We do have a couple servers in us-east-1a but so far those servers appear to be fine, although the AWS management interface has not been working for managing servers in that region/AZ. If we ran servers out of our local office instead of Amazon, we would have had at least 2 instances of complete downtime in the past year - one 3 hour internet outage, and a 48 hour power failure on a weekend when a transformer blew and the power company didn't have an available spare and had to truck it in from out of area.
Re:Bright and Sunny Skies Today! by Anonymous Coward · 2012-10-22 08:58 · Score: 1

You are always at the mercy of someone else's beancounters even if you run your own datacenter.
Re:Bright and Sunny Skies Today! by CastrTroy · 2012-10-22 09:05 · Score: 1

You're talking like hosting your own servers on premises or being in the cloud are your only choices. You could also rent space in a high quality data center and replicate you data out to another high quality datacenter where you also rent space in a different geographic location. Then, when your primary data center goes down, you switch over to the other one. Or run off both at the same time if your architecture allows you do do that. That basically covers you in most instances. If both your rented datacenters go out at the same time, and they are in different locations, there's probably much bigger things to worry about. Or you didn't pick very good datacenters in the first place.

--

Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Re:Bright and Sunny Skies Today! by MoNsTeR · 2012-10-22 09:08 · Score: 2

If you think the risks of running in the cloud are less than the risks of running in a traditional data center, you're very much mistaken. If one AWS AZ goes down I can bring up servers in a second one. If one AWS region goes down I can bring up servers in a second one. In fact to hedge against these risks I *already have* servers in multiple zones and regions. Sure you can do that with traditional data centers. Just host your stuff across more than one, right? Do you have any concept of what that COSTS? Especially if you, say, want to add servers in multiple data centers, or move servers from one to another. Plus now you have multiple vendors, contacts, SLAs, and so son, and so forth. And heaven help you if you ever want to *decrease* your capacity. Have fun selling those servers on ebay. Reddit and friends are suffering downtime from a single AZ outage because their architectures have single points of failure. Don't build your systems that way! If you have single points of failure it doesn't matter whether you're hosted in the cloud, in a commercial data center, or in your own data center. Conversely if your architecture is good and doesn't have single points of failure, the hosting question comes down to this: what do you specialize in as a business? If that list doesn't include "running a data center", don't run your own data center. If it doesn't include "maintaining a shit load of hardware", then don't host in a commercial data center either, run in the cloud. I think you will find that this latter category includes 99.99% of businesses.
Re:Bright and Sunny Skies Today! by gstoddart · 2012-10-22 09:08 · Score: 1

No, it is more of a salesman failure, or a side effect of working in a large company.
People sell outsourcing services, and they use products that other divisions of the company make and support.
If something goes wrong, one division is on the hook for a high service level, and the other division provides the same level of crappy support they provide their customers.
The group on the hook for the service has no clout over the group that makes the product in use.
I have seen software sales where the salesman bundled a bunch of different products in order to check all of the boxes ... With the end result being that the cost of the bundle was half (or less) than the un-bundled components would have been. But, the sales guy gets his bonus, and everyone else is l holding the bag to deliver on a money losing contract.
Not nearly enough companies fully understand what they have contracted to do, and ultimately can't.

--
Lost at C:>. Found at C.
Re:Bright and Sunny Skies Today! by hawguy · 2012-10-22 09:15 · Score: 2

You're talking like hosting your own servers on premises or being in the cloud are your only choices. You could also rent space in a high quality data center and replicate you data out to another high quality datacenter where you also rent space in a different geographic location. Then, when your primary data center goes down, you switch over to the other one. Or run off both at the same time if your architecture allows you do do that. That basically covers you in most instances. If both your rented datacenters go out at the same time, and they are in different locations, there's probably much bigger things to worry about. Or you didn't pick very good datacenters in the first place.
Isn't that the same as putting your servers into multiple Amazon regions? You're still putting your destiny in your hands of the datacenter.
Re:Bright and Sunny Skies Today! by Anonymous Coward · 2012-10-22 09:28 · Score: 0

You laugh, but that's precisely true. These websites that went down will see little monetary consequence: at least it will be less than the extra cost to host things off the cloud.
Re:Bright and Sunny Skies Today! by Anonymous Coward · 2012-10-22 09:36 · Score: 0

You are always at the mercy of someone else's beancounters even if you run your own datacenter.
Nah. If you roll your own thing you mostly don't need a datacenter. Most probably just a server.
One that's physically there, and which you can kick into working.
And in case you cannot fix the problem, you can always flip the switch to 'more magic'.
Re:Bright and Sunny Skies Today! by Smauler · 2012-10-22 09:45 · Score: 1

As much as I, like most people here grin with a certain kind of glee* when something this big goes down, the fact is that doing it yourself is nearly always less reliable.
Also, there's nothing necessarily exclusive about the cloud - you can back up your data too, right?
*Yes, it's evil - but it's because I've had the adrenaline in the past and know what it is like - despite it being one of the worst times, it can also be one of the best for loads of people.
Re:Bright and Sunny Skies Today! by aaarrrgggh · 2012-10-22 09:47 · Score: 1

Fundamentally it is different because modes of common failure are much less severe. If Amazon takes a hit to one facility, it is going to load up other facilities.
Re:Bright and Sunny Skies Today! by Octopus · 2012-10-22 10:02 · Score: 1

Sshhhh! It's the cloooouuud. *throws glitter in the air*
Re:Bright and Sunny Skies Today! by Anonymous Coward · 2012-10-22 10:06 · Score: 1

But the NSA promised it would feel good when they installed their hardware in the back door.....
Re:Bright and Sunny Skies Today! by Anonymous Coward · 2012-10-22 10:19 · Score: 0

because their own people basically said "sorry, we don't do after hours support".
What the hell kind of dodgy business do they run?? Even in the smallest wannabe company we always had somebody on duty for the after hours times. That person earned money for that time, so it was popular. And it saved us many times, so it was damn worth the money.
How can a big company not have an emergency service?? It's not like it's hard to do! We did it from our *phones* (PuTTY on Nokia / Symbian S60) back in 2004, if we had to! (The cases where you actually have to drive over *right then* were extremely rare, because of the simple fact of having *backup servers*.)
Re:Bright and Sunny Skies Today! by Anonymous Coward · 2012-10-22 10:21 · Score: 0

Netflix certainly has bragged that they have redundancy, yet every time Amazon shits the bed Netflix and all their redundancy goes down with it. Your arguments aren't backed by reality.
Re:Bright and Sunny Skies Today! by Anonymous Coward · 2012-10-22 10:43 · Score: 0

depends on who is using the service. if your employees are using a cloud service and the wan goes down, you're screwed. if you're hosting it on site, they continue working like nothing happened.
Re:Bright and Sunny Skies Today! by Anonymous Coward · 2012-10-22 12:12 · Score: 0

a side effect of working in a large company.
If it was a large company then they should have set up shifts. Getting two college kids to work 12 hours a day each stops being professional once you're no longer a "startup".
In other words, an HR problem.
Re:Bright and Sunny Skies Today! by Anonymous Coward · 2012-10-22 12:16 · Score: 0

"having a physical server at the office that you know you can fix by buying parts at the local computer store sounds attractive"
That doesn't sound attractive at all. I am sure a few major cities have stores that would have necessary server parts but otherwise good luck.
Re:Bright and Sunny Skies Today! by makomk · 2012-10-23 03:20 · Score: 1

If the management API and web interface go down throughout the entireity of AWS, as they did in this outage according to some users, good luck bringing up another server. Besides, everyone else had the same idea - if one region goes down, we can just bring up a server in another region, so we don't have to pay for servers in multiple data centers - so it turned out there weren't actually nearly enough servers available for them to do this.
Re:Bright and Sunny Skies Today! by Anonymous Coward · 2012-10-23 04:42 · Score: 0

Show me the statistics backing up your claim. I want to see downtime of an Amazon service vs the downtime of the average self-hosted service. Then I want some prices.

That eye surgeon has a failure rate of 1 botched surgery every 5 years! I think I will fix my own eye, thank-you very much.
Re:Bright and Sunny Skies Today! by DMUTPeregrine · 2012-10-26 14:23 · Score: 1

Trusting a single "cloud" provider for all your hosting is silly. Something like a Xen vm image backed up nightly can be hosted on a large number of cloud provider systems, and you can use round-robin DNS to help limit the impact of downtime at any one provider. And if a provider will be down for a significant period (longer than it will take for DNS caching to expire), just remove that IP from the DNS records.

Of course you can also have a local server/datacenter that can run your same VM image, and use it for extra redundancy.

One of the major points of the "cloud" hype was that the underlying provider shouldn't matter. If you can't swap cloud hosts easily then cloud hosting is no different from shared hosting with extra buzzwords.

--
Not a sentence!

multi AZ? by i_hate_robots · 2012-10-22 08:10 · Score: 3, Interesting

An honest question, why don't these large, big-name sites utilize the Multi Availability Zone failover that Amazon offers? It seems these AWS outages make for good headlines, but shouldn't any large site be co-located in multiple physical locations to ensure uptime? If they WERE using Multi AZ, or there is some other technical reason why it wouldn't help, I'm really curious to know why...

Re:multi AZ? by segedunum · 2012-10-22 08:40 · Score: 1

An honest question, why don't these large, big-name sites utilize the Multi Availability Zone failover that Amazon offers?
They do. Plenty of people do. The problem is that these EBS failures always propagate across availability zones no matter what Amazon says.

If they WERE using Multi AZ, or there is some other technical reason why it wouldn't help, I'm really curious to know why...
Because you have no hard experience of what multiple availability zones practically means in Amazon's infrastructure.
Re:multi AZ? by Anonymous Coward · 2012-10-22 08:42 · Score: 0

Money! It costs a lot more to go multi AZ. The only real reason that the use the cloud is the same reason most people use OSS, low cost! Multi AZ negates the low cost attribute.
Re:multi AZ? by hawguy · 2012-10-22 08:53 · Score: 1

An honest question, why don't these large, big-name sites utilize the Multi Availability Zone failover that Amazon offers?
It seems these AWS outages make for good headlines, but shouldn't any large site be co-located in multiple physical locations to ensure uptime?
If they WERE using Multi AZ, or there is some other technical reason why it wouldn't help, I'm really curious to know why...
There are rumors floating around that this affects more than one AZ - I'd never host critical infrastructure entirely in a single region even across multiple AZ's - much better to have it spread across multiple regions would eliminate most failure modes that could affect one region (like an East Coast Hurricane).
Re:multi AZ? by Anonymous Coward · 2012-10-22 08:54 · Score: 0

If they WERE using Multi AZ, or there is some other technical reason why it wouldn't help, I'm really curious to know why...
Because you have no hard experience of what multiple availability zones practically means in Amazon's infrastructure.
All this downtime is GP's fault? Was he running AWS or something? What do you know that we don't, segedunum?
Re:multi AZ? by Anonymous Coward · 2012-10-22 08:54 · Score: 0

Because it isn't an easy problem to solve. The biggest issue is always the database backend. Cassandra is pretty much the only DB that can run multi-region without much effort. However, Cassandra presents a ton of problems it self ... its just not good for production. When it fails, it fails *VERY* badly.
MySQL, MongoDB, Redis, etc all have single points of failure. MongoDB sharded will fail when a single MongoC instance disappears.
Re:multi AZ? by i_hate_robots · 2012-10-22 09:04 · Score: 1

They do. Plenty of people do. The problem is that these EBS failures always propagate across availability zones no matter what Amazon says.
Do you have any evidence of this? Because I haven't seen any. And it sounds tin-foil-hat.

Because you have no hard experience of what multiple availability zones practically means in Amazon's infrastructure.
Actually, I run a load-balanced, redundant site on AWS. I ask the question because Multi-AZ (as defined by AWS) means geographically different, as in US West (in Oregon) vs US East (in Virginia) - NOT just the difference between US-East-1a,b,c,d (which Amazon makes very clear are in the same data center). That's why it's odd that Virginia's issues would affect Oregon (or any of the other AZs) Try being helpful next time and answering the genuine question instead of smarting off because you can't get on reddit.
Re:multi AZ? by Pinhedd · 2012-10-22 09:08 · Score: 1

Multi-AZ is only available for certain services. It's slower and costs twice as much. There's also replication delay issues between multi-AZ instances.
Re:multi AZ? by delirium28 · 2012-10-22 09:28 · Score: 1

That's odd, I was just re-reading their docs today since we were affected and they make it clear that AZ refers to the instances in the same Region (i.e. the us-east-1a,b,c,d you mentioned). See: http://docs.amazonwebservices.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html

--
Who is John Galt?
Re:multi AZ? by segedunum · 2012-10-22 09:40 · Score: 4, Interesting

Do you have any evidence of this? Because I haven't seen any. And it sounds tin-foil-hat.
Sites who implement multiple across multiple zones are down and the forums are full of customers who complain about EBS slowdowns and problems regardless of the availability zones they personally use. You're an apologist if you haven't grokked this yet.

Actually, I run a load-balanced, redundant site on AWS. I ask the question because Multi-AZ (as defined by AWS) means geographically different...
This is total rubbish. Availability zones are not geographically separate, and don't give me that 'as defined by AWS' crap to give yourself a back door (they don't, anyway). Expanding to multiple regions which is the only thing you can do is not the same thing.

as in US West (in Oregon) vs US East (in Virginia) - NOT just the difference between US-East-1a,b,c,d (which Amazon makes very clear are in the same data center). That's why it's odd that Virginia's issues would affect Oregon (or any of the other AZs)
No, Amazon is very, very clear on what an availability zone actually is. Stop trying to make AZs out to be separate regions to get yourself out of this. They are not.

Try being helpful next time and answering the genuine question instead of smarting off because you can't get on reddit.
I'm afraid you don't run any geographically separate system that spans multiple regions because it is prohibitively expensive to do so. You don't maintain AMIs and backups in different regions and you don't pay for the extremely large amount of bandwidth you need to keep those regions mirrored and synchronised.

Sorry, but you aren't doing what you say you're doing and you don't know what the difference between availability zones and regions actually are, which was central to the question you asked. You were called out on it.
Re:multi AZ? by c0lo · 2012-10-22 09:58 · Score: 3, Informative

If they WERE using Multi AZ, or there is some other technical reason why it wouldn't help, I'm really curious to know why...
Here's your answer: cascading failures.
In short, the cascading failures don't happen because one local failure cause the entire capacity of the network to be exceeded... you see, it is not a case of every node connected to every node (O(N^2) connections), thus a failure only need to overload the capacity of the nodes connected to the failing one...

--
Questions raise, answers kill. Raise questions to stay alive.
Re:multi AZ? by Anonymous Coward · 2012-10-23 04:18 · Score: 0

You're obviously incompetent when it comes to this, why don't you let the adults discuss?

But the cloud is so much better to use! by BetaDays · 2012-10-22 08:10 · Score: 2

But the cloud is so much better to use!

--
Paul: Father... father, the sleeper has awakened! - Dune

Re:But the cloud is so much better to use! by Anonymous Coward · 2012-10-22 08:16 · Score: 0

What, you think outages don't occur in non "cloud" hosting situations?
If these sites were properly architected for high availability (i.e. multi AZ setups), they would be up still, though perhaps with degraded performance.
Re:But the cloud is so much better to use! by BetaDays · 2012-10-22 09:16 · Score: 1

The point that I have issue with is that all the sites affected can only sit back and wait and hope that things get back to normal. I have had many times to wait for the cloud provider to fix an issue with no notification back to me that it was fixed or where the status of the fix was at. I like to own the equipment I use and when it breaks my people are on it and I can ask them what is going on and a time frame. Sure sometimes the time frame is off and sometimes they don't know what is going on but at least I wasn't just sitting back and waiting for a blog to get update to let me know what is happening. That's internal to my location, if you want to talk about external to the satellite locations, that's another story.

--
Paul: Father... father, the sleeper has awakened! - Dune
Re:But the cloud is so much better to use! by DragonWriter · 2012-10-22 11:33 · Score: 1

The point that I have issue with is that all the sites affected can only sit back and wait and hope that things get back to normal.
Well, no. They likely could do lots of other things -- they probably will choose not to do much else if Amazon doesn't take a ludicrously long time to fix the problem because that's easier than the other things they can do, and if they can't do anything else its not because they used the cloud, but because they both used didn't do contingency planning that enabled them to address Amazon failures other than by waiting for Amazon to fix them. This is no different than any other hosting scenario, including your own data center. There are always external facilities (e.g., your network providers) that can fail, and you can either spend the effort to make a contingency plan to deal with their failures or simply accept the risk of their failure.

I like to own the equipment I use and when it breaks my people are on it and I can ask them what is going on and a time frame.
Owning the equipment seems to be a non-sequitur. Whether you own the equipment or not, or whether the relationship is employee-employer or a non-employment contract relationship, you can have someone you can ask what is going on and a time frame. Of course, depending on who you having working for you (and, again, this is true regardless of whether you own the hardware and regardless of whether your relationship to the people responsible to you for the systems is an employee-employer or a contract relationship) you might not get a good response, but the solution to that is to choose well when you choose who to work with.

As Usual... by broginator · 2012-10-22 08:13 · Score: 3, Funny

There's an oblig xkcd: http://xkcd.com/908/ Guess someone tripped over the wire.

--
s/[stupid comments]/[intelligent discourse]/gi

Re:As Usual... by 21mhz · 2012-10-22 09:05 · Score: 1

Nah, xkcd is down too.

--
My exception safety is -fno-exceptions.

other sites down by Falc0n · 2012-10-22 08:13 · Score: 1

turntable.fm is also down -- I guess the NYC tech startup community is going crazy right now. Time to diversify!

To the dudes working at AWS by ctime · 2012-10-22 08:18 · Score: 1

This too shall pass

No Fancy Uptime Numbers for them by NinjaTekNeeks · 2012-10-22 08:18 · Score: 2

Looks like there won't be any fancy reports about the "cloud" having spectacular up times, with over an hour passed they can no longer claim more than 3 nines uptime.

Re:No Fancy Uptime Numbers for them by Revotron · 2012-10-22 08:32 · Score: 3, Insightful

Well... as it's currently referred to, the "cloud" is a singular entity. So, as long as there's one single server running as part of that infrastructure, you could weasel your way around any downtime and reassure the ignorant masses that "the cloud" is is still up, even if the only remaining piece is a Raspberry Pi running over a cable modem in some guy's basement.

Hey, look everybody, the cloud is still up! You can't do near as much as you usually can, but it's up! 100% uptime! Woo!

I don't... by future+assassin · 2012-10-22 08:24 · Score: 1

My life and business doesn't rely on ANY internet based social service things and I make sure my customers are not dependent on social media to know whats going on with my business. Hell even if the internet would go down I still have a phone book and a land line.

--
by TheSpoom (715771) Uncaring Linux user here. I have nothing to add to this but please continue. *munches popcorn*

Re:I don't... by Antipater · 2012-10-22 08:36 · Score: 2

Hell even if the internet would go down I still have a phone book and a land line.
Hey, me too! It's always good to have things lying around to club people with when civilization ends.

--
Everything is better with chainsaws.

Why does this still happen by Anonymous Coward · 2012-10-22 08:26 · Score: 0

The tech is available to amazon to migrate running vms from one cluster to another. Why do we still have these outages.

Amazon all of the problems of hosting stuff yourself and all of the problems of cloud hosting with out any of the advantages. I have single boxes at my house running on cable modems that have better uptime than EC2 right now. (baring a few power outages cause I don't have ups on my crap :-(, )

Again! by Anonymous Coward · 2012-10-22 08:29 · Score: 0

This keeps happening.

Amazon claims "degraded performance" in EBS. But, then RAID rebuilding and instance migration/failover increases the load so much that everything else around it crashes as well.

This is yet another major outage for hundreds of (some) significant sites and apps. I'd call it a cloud burst.

Amazon is dead by AragornSonOfArathorn · 2012-10-22 08:30 · Score: 1

Netcraft confirms it.

--
sudo eat my shorts

wow, mainframe problems in the cloud by Dan667 · 2012-10-22 08:30 · Score: 4, Insightful

If only there were some lessons learned over decades and decades of mainframe use that that could be applied to the cloud.

Re:wow, mainframe problems in the cloud by Anonymous Coward · 2012-10-22 09:06 · Score: 0

Anyone that actually used mainframes in the real world tell you they went down too. The difference is the people using had an excuse to slack off and it wasn't publicized around the world every time.
Re:wow, mainframe problems in the cloud by Dan667 · 2012-10-22 09:20 · Score: 1

if it is a critical system then you have a contingency system. The first rule you learn is no one cares about your service / data as much as you do and if you trust only one infrastructure you are waiting for an outage. This is a case in point.
Re:wow, mainframe problems in the cloud by StormReaver · 2012-10-22 09:48 · Score: 1

If only there were some lessons learned over decades and decades of mainframe use that that could be applied to the cloud.
Fans of, "I don't want to have to do my job" will never learn the lessons of the past.
Re:wow, mainframe problems in the cloud by Smauler · 2012-10-22 10:00 · Score: 1

Yeah - it's like availability and uptime is getting worse, rather than better.
What do you mean, it's not...

The Cloud makes everything awesome! by 1_brown_mouse · 2012-10-22 08:31 · Score: 1

These sites load so much faster.

Is storage that expensive or is it the bandwidth costs associated with them?

Ahhhhh.... by Anonymous Coward · 2012-10-22 08:36 · Score: 1

I love the smell of cloud failure in the afternoon..

Sick day? by Anonymous Coward · 2012-10-22 08:43 · Score: 0

Should I take a sick day for a loved one?

habit forming by samjam · 2012-10-22 08:54 · Score: 1

This zapping people's data is getting to be habit forming for Amazon I think.

I guess we're just waiting to hear if it was a mistake or on purpose.

--
blog.sam.liddicott.com

Failing at Cloud is Unacceptable by Anonymous Coward · 2012-10-22 09:00 · Score: 0

Why is it that these same companies keep failing at cloud instead of learning from their mistakes and using technology like failover and high availability that has been around for years to ensure that if their is an outage their service remains online and solid? A single point of failure should never cripple a site or service.

http://benjaminkerensa.com/2012/06/30/reflecting-on-netflix-instagram-pinterest-downtime
http://www.brandonholtsclaw.com/blog/2012/how-not-to-fail-at-the-cloud/

The purpose of the Internet... by fufufang · 2012-10-22 09:09 · Score: 1

The Internet was meant to be resilient to nuclear attacks... Now major websites simply go down when you take out major cloud service providers... This whole development is just silly.

Re:The purpose of the Internet... by cjc25 · 2012-10-22 09:40 · Score: 1

Non sequitur
The part of the internet that "was meant to be resilient to nuclear attacks" (if that's even true) would be the routing. If a nuclear strike hits the machine you're trying to talk to, redundancy among communication channels doesn't do anything. The endpoints went down, and so things failed.
Re:The purpose of the Internet... by Smauler · 2012-10-22 10:04 · Score: 1

Some major websites go down.... the internet has stayed rock solid throughout this.
Those major websites will be back up in a few hours.
You expect the internet to be infallible? Not going to happen.

Lol u outsourced! by Anonymous Coward · 2012-10-22 09:10 · Score: 0

You asked for it!

Our office uses this "great" cloud for its dev! by Anonymous Coward · 2012-10-22 09:10 · Score: 0

Funny, and I am ALWAYS the one criticize for my "don't use a cloud" attitude.

My productivity level for today just hit an all-time low.

Why would you ever put a DEV environment in the cloud? Why? Really? time to go surf some more... i bet monster is still up...

Oh yay by Anonymous Coward · 2012-10-22 09:17 · Score: 0

I'm noticing that the amazon web interface is also slow in responding - looks like specific parts of cloudfront are also having issues.

I restored the wrong backup by smitsco · 2012-10-22 09:52 · Score: 1

Amazon must have some wires crossed, Minecraft.net is now rendering the website Room Key.

Cloud computing works well and is awesome... by Anonymous Coward · 2012-10-22 10:17 · Score: 0

When you own the infrastructure, know what you are doing and control your own cloud. Private clouds are the only real future for businesses who need 24x7 uptime (if you have hired the right folks to build and manage it).

the best way to avoid EBS failures by Anonymous Coward · 2012-10-22 10:49 · Score: 0

is to use Zadara Storage instead of EBS.

http://blog.zadarastorage.com/2012/10/comparing-provisioned-iops-ebs-vs.html

Minecraft login is down too! by ukemike · 2012-10-22 13:00 · Score: 1

Minecraft login is down too!

--
-- QED

And you really surprised? by Anonymous Coward · 2012-10-22 13:07 · Score: 0

When you put something out of your hand. You have no control where it's replicated over what many server. Then yes, this incident should be no surprised of you. I just waiting for big data breach, loss of data in a spectacular way.

I sense a new game by Skapare · 2012-10-23 03:24 · Score: 1

In a virtual world, you put on your roller blades, and administer a failing data center. Level 1 is your home LAN. Level 2 is a law office and all the attorneys want the morning's court briefs immediately because court starts in 45 minutes and the file server screen says "RAID array offline". Level 3 is a small ISP. Level 4 is AWS. Level 5 is Google. Good luck!

--
now we need to go OSS in diesel cars

Slashdot Mirror

Amazon EBS Failure Brings Down Reddit, Imgur, Others

176 comments