Major Outage At the Amazon Web Services

← Back to Stories (view on slashdot.org)

Major Outage At the Amazon Web Services

Posted by ryuzaki0 on Thursday April 21, 2011 @03:45AM from the but-the-cloud-fixes-everything dept.

ralphart writes "The Northern Virginia datacenter for Amazon Web Services appears to be having a major outage that affects EC2 services. The Amazon Forums are full of reports of problems. Latest update from the status page: 2:49 AM PDT We are continuing to see connectivity errors impacting EC2 instances, increased latencies impacting EBS volumes in multiple availability zones in the US-EAST-1 region, and increased error rates affecting EBS CreateVolume API calls. We are also experiencing delayed launches for EBS backed EC2 instances in affected availability zones in the US-EAST-1 region. We continue to work towards resolution."

247 comments

Min score:

Reason:

Sort:

Reddit is down because of this by HelioWalton · 2011-04-21 03:48 · Score: 1

How am I supposed to be able to not do work?
1. Re:Reddit is down because of this by cobrausn · 2011-04-21 03:50 · Score: 5, Funny
  
  You're posting on Slashdot, so I believe you already found the answer.
  
  --
  How does it feel to be a liar with pants constantly on fire?
2. Re:Reddit is down because of this by Anonymous Coward · 2011-04-21 03:51 · Score: 1
  
  Digg is still up
3. Re:Reddit is down because of this by wiggles · 2011-04-21 03:57 · Score: 1
  
  They took Digg down last year and replaced it with this horrible monstrosity they called 'v4' or something. It's a shame they just took such a popular site offline and haven't provided a decent replacement.
4. Re:Reddit is down because of this by jpmoney · 2011-04-21 03:59 · Score: 2
  
  People still go to digg? Oh, I see what you did there.
  I actually went to Digg this morning since Reddit is down. I haven't been in months since I removed them from my RSS reader. All I have to say is "ouch". Front page stories with a whopping 5 comments? Its pretty sad.
  
  --
  unf.
5. Re:Reddit is down because of this by badran · 2011-04-21 04:03 · Score: 2
  
  Productivity in Offices will reach record levels today.
6. Re:Reddit is down because of this by Anonymous Coward · 2011-04-21 04:08 · Score: 0
  
  I actually added it to my site block list to retrain my finger memory after v4 hit and it went to hell. If my fingers magically typed it, Leechblock would tell me no. Reddit, Slashdot, & Mefi eat up enough time.
7. Re:Reddit is down because of this by Anonymous Coward · 2011-04-21 04:11 · Score: 0
  
  Upvoted, commented for the same reason...
8. Re:Reddit is down because of this by Anonymous Coward · 2011-04-21 04:15 · Score: 0
  
  Popurls has everything!
9. Re:Reddit is down because of this by Anonymous Coward · 2011-04-21 04:19 · Score: 0
  
  Better drink my own piss.
10. Re:Reddit is down because of this by jafuser · 2011-04-21 04:29 · Score: 1
  
  This is the first time I've been back here in a while. I decided to try it when I realized reddit's downtime is probably going to be a while. I still feel a reverence for this place. It sort of reminds me of going back and visiting my university.
  Digg can rot in hell.
  
  --
  Please consider making an automatic monthly recurring donation to the EFF
11. Re:Reddit is down because of this by MobileTatsu-NJG · 2011-04-21 04:35 · Score: 1
  
  You're posting on Slashdot, so I believe you already found the answer.
  Yeah but maybe he's hungry for news.
  
  --
  
  "I like to lick butts!" by MobileTatsu-NJG (#32700246) (Score:5, Informative)
12. Re:Reddit is down because of this by lumbercartel.ca · 2011-04-21 04:40 · Score: 1
  
  I mostly come here for humour, and once again I'm not disappointed!
  
  --
  The Lumber Cartel, local 42 (Canadian branch)
  British Columbia, Canada
13. Re:Reddit is down because of this by 91degrees · 2011-04-21 04:43 · Score: 1
  
  It's not the same! I want atheists being smug about not believing in god (and refusing to capitalise), and liberal lefties telling each other that the government needs to be more liberal while Libertarians accuse them of worshipping Obama, photoshopped pictures that have been debunked dozens of times before, people claiming to be things that they aren't and answering questions, and hero worshipping of Ron Paul!
  
  You just don't get enough of that here.
14. Re:Reddit is down because of this by Richard_at_work · 2011-04-21 04:46 · Score: 4, Informative
  
  Don't worry - Slashdot just did something similar. When I try and reply to comments through my accounts comments history page, its horribly horribly broken. Each attempt to click in the reply box loads a new comment further up in the comment tree, and scrolls the page to the newly loaded comment. Scroll back down, click in the box again and it loads anotehr comment and shunts me back up the page. It can get really fucking annoying when you are trying to reply to a comment thats quite a way down a long tree.
15. Re:Reddit is down because of this by Scorchio · 2011-04-21 04:47 · Score: 1
  
  You'd get an upvote, but I haven't seen mod points in a long time...
16. Re:Reddit is down because of this by mtutty · 2011-04-21 04:51 · Score: 1
  
  I see what you did there.
  Hey, feels just like Reddit!
17. Re:Reddit is down because of this by recoiledsnake · 2011-04-21 05:03 · Score: 1
  
  That even happens if you just click on posts. Not to mention that the comment scores are sometimes hidden randomly and you have to do all the clicking till you see them.
  
  --
  This space for rent.
18. Re:Reddit is down because of this by lgw · 2011-04-21 05:15 · Score: 1
  
  It's not the same! I want atheists being smug about not believing in god (and refusing to capitalise), and liberal lefties telling each other that the government needs to be more liberal while Libertarians accuse them of worshipping Obama, photoshopped pictures that have been debunked dozens of times before, people claiming to be things that they aren't and answering questions, and hero worshipping of Ron Paul!
  You just don't get enough of that here.
  You said it ... we can't post pictures here (for which those of us here in the Goatse spam days were quite thankful).
  
  --
  Socialism: a lie told by totalitarians and believed by fools.
19. Re:Reddit is down because of this by Anonymous Coward · 2011-04-21 05:35 · Score: 0
  
  120K deaths a year from physicians? Wow. That number must include every person that dies while "under a physicians care" and assume it they only died as a direct result of something the physician did or did not do correctly.
20. Re:Reddit is down because of this by mini+me · 2011-04-21 05:45 · Score: 1
  
  Digg never had much comment activity when compared to similar sites (Slashdot, Reddit, etc.). Which is a shame, because the comments are usually more entertaining than the actual links.
21. Re:Reddit is down because of this by Anonymous Coward · 2011-04-21 05:48 · Score: 0
  
  So you are saying that If I ever get extremely ill, break a bone, or get in a serious car accident, I would have a better chance of surviving if I'm a gun owner. Wow, that's an eye opener. Next time something happens to me, I'm going out hunting instead of my doctors office.
22. Re:Reddit is down because of this by Ant+P. · 2011-04-21 06:01 · Score: 1
  
  There's an easy fix for that: block javascript and turn on classic discussion mode. Not only will /. actually work, it'll feel 10 times faster!
23. Re:Reddit is down because of this by deserttrail · 2011-04-21 07:57 · Score: 1
  
  I realize that it's just for amusement, but that's an improper comparison. In this case, the physician is analogous to the gun, not the gun owner. It should compare the number of people who see a physician to gun owners.
  Even if you assume that everyone in the country sees a physician, gun owners still come out ahead, but not nearly as dramatically.
  
  --
  Be civil to all; sociable to many; familiar with few; friend to one; enemy to none. --Benjamin Franklin
24. Re:Reddit is down because of this by Joe+Tie. · 2011-04-21 08:21 · Score: 1
  
  Glad to see I'm not the only one here primarily for that reason. I feel bad too. I noticed the prolongged downtime and my first thought was simply a judgmental "pft, damn it reddit, again?" And, it turns out I really shouldn't have been so quick to place blame. That said, it's bad enough that I'm actually going out to be social. I don't like the strange world that lack of reddit is thrusting me into.
  
  --
  Everything will be taken away from you.
25. Re:Reddit is down because of this by Caerdwyn · 2011-04-21 08:31 · Score: 1
  
  * cocks hammer * Go ahead, splint my bone.
  
  --
  Everybody gets what the majority deserves.
26. Re:Reddit is down because of this by 1s44c · 2011-04-21 08:43 · Score: 1
  
  Each attempt to click in the reply box loads a new comment further up in the comment tree, and scrolls the page to the newly loaded comment. Scroll back down, click in the box again and it loads anotehr comment and shunts me back up the page. It can get really fucking annoying when you are trying to reply to a comment thats quite a way down a long tree.
  So it's not just me that happens to.
27. Re:Reddit is down because of this by Walt+Dismal · 2011-04-21 15:01 · Score: 1
  
  In the excitement of the accident, I lost count of the cracks. Was it 5 cracks in the bone, or 6? Go ahead, doc, make my cast.
28. Re:Reddit is down because of this by Anonymous Coward · 2011-04-23 08:37 · Score: 0
  
  Nope. He's correct- These deaths are directly attributable to physician / hospital mistakes. Don't know how many of these overlap, but around 75,000 people per year die of direct medication mistakes (wrong drug, wrong dose, overlooking drug-drug interactions, etc.). Look up the LeapFrog group's report.
29. Re:Reddit is down because of this by JesseDegenerate · 2011-04-23 11:59 · Score: 1
  
  nice subtle insertion of your own views there 25degrees.
30. Re:Reddit is down because of this by Anonymous Coward · 2011-04-23 22:33 · Score: 0
  
  Honestly, it was just a rant. I'm not as smart as you're suggesting.
No Way! by Frosty+Piss · 2011-04-21 03:48 · Score: 5, Funny

But how can this be possible? It's The Cloud . This sort of this simply doesn't happen.

--
If you want news from today, you have to come back tomorrow.
1. Re:No Way! by alphatel · 2011-04-21 03:51 · Score: 2
  
  It didn't happen. The cloud can erase history in a planck!
  
  --
  When the foot seeks the place of the head, the line is crossed. Know your place. Keep your place. Be a shoe.
2. Re:No Way! by Anonymous Coward · 2011-04-21 03:53 · Score: 0
  
  Seems to happen really rarely, since it now seems to be huge news and slashdot is reporting on it right away.
3. Re:No Way! by jellomizer · 2011-04-21 04:00 · Score: 0
  
  The cloud isn't immune to problems. But it is normally more tolerant to problems then your/your businesses internal systems. Unless you spend a great deal for a full infrastructure then you probably get just as good. A major outage on most professional cloud setups means it is down for a few hours. A major outage at work means the full day. It is like saying driving my car is so much safer then flying because I never got into an accident.
  
  --
  If something is so important that you feel the need to post it on the internet... It probably isn't that important.
4. Re:No Way! by pdbaby · 2011-04-21 04:03 · Score: 1
  
  Jokes aside, if people use The Cloud (I'm using this tongue in cheek...) rather than a cloud this thing doesn't happen.
  We use a number of providers which means that even if Amazon fell over completely our systems would be fine -- it looks like a lot of sites (reddit, for instance) don't bother to do this.
  
  --
  Global symbol "$deity" requires explicit package name at line 2. - If only $scripture started "use strict;"
5. Re:No Way! by Anonymous Coward · 2011-04-21 04:05 · Score: 2
  
  But it's not supposed to happen, because "if" (when!) it does, the impact is HUMONGOUS. "You're welcome to store all your data in our fast, easy and safe cloud storage. Downtime? Don't worry, it'll only experience hour long outages intermittently." Yeah, that's how they sold it in the first place, isn't it?
  This will become quite the event in data warehouse circles I bet, because the cost of 'being in the cloud' just doubled; it's not enough to buy storage from one provider. The "always there" quality that's supposedly the benefit of cloud storage is a facade.
6. Re:No Way! by cduffy · 2011-04-21 04:08 · Score: 2, Insightful
  
  This will become quite the event in data warehouse circles I bet, because the cost of 'being in the cloud' just doubled; it's not enough to buy storage from one provider. The "always there" quality that's supposedly the benefit of cloud storage is a facade.
  You can buy from one provider -- every major cloud provider has multiple availability zones. But yes, lots of people buy in only one zone because it's cheaper, and then suffer for that mistake -- in situations just like this.
7. Re:No Way! by 0123456 · 2011-04-21 04:10 · Score: 4, Informative
  
  A major outage on most professional cloud setups means it is down for a few hours. A major outage at work means the full day. It is like saying driving my car is so much safer then flying because I never got into an accident.
  Last time I remember a day-long outage at work was 1994, and that was because the license server failed so we couldn't run our own software (we couldn't recompile it to remove the DRM because the compiler also needed a license to run).
  I seem to remember that the Mac guys at the company also had a long outage when they couldn't connect to one of their Mac servers, but eventually someone actually went to the server room and discovered that it had been stolen.
  Back on topic, I just don't see all these day-long outages that apparenty seem to happen all the time in companies that haven't moved their servers to The Cloud(tm).
8. Re:No Way! by Anonymous Coward · 2011-04-21 04:18 · Score: 0
  
  Oh. MY. GOD. A website is DOWN!!!!!
  The end times are upon us! A website went down today! A website! This was foretold in scripture! RUN AND PANIC IN THE STREETS NOW. THAT IS ALL THERE IS LEFT. Nothing else makes any sense! Because a website is down!! Don't you understand?!?
9. Re:No Way! by ron_ivi · 2011-04-21 04:23 · Score: 1
  
  But how can this be possible? It's The Cloud . This sort of this simply doesn't happen.
  To be fair to Amazon - on a good cloud (incl. Amazon's) you can launch instances in completely different data centers, so your most critical services have somewhere to fail over to.
  Though, personally I'd feel even better if my nodes were distributed across two different clouds; to avoid the single-point-of-failure of the Amazon account itself. For example, despite running in both their East and West data centers, I'm still vulnerable to a sales/billing miscommunication that freezes my whole account.
10. Re:No Way! by TooMuchToDo · 2011-04-21 04:33 · Score: 2
  
  But when it's your gear, you have some control over the situation. When it's "in the cloud", you sit and get yelled at by the CXO and sweat if you'll still have a job while cloud provider X works to fix the problem (and their liability? whatever you paid for the service).
11. Re:No Way! by dkleinsc · 2011-04-21 04:35 · Score: 1
  
  This sort of this simply doesn't happen.
  Now we know: All it takes is one admin screwing up and replacing an "ng" with an "s".
  
  --
  I am officially gone from /. Long live http://www.soylentnews.com/
12. Re:No Way! by 91degrees · 2011-04-21 04:37 · Score: 1
  
  I'm not a databases guy, so sorry if this is a silly question, but reddit does have a lot of stuff being written to the database all the time.
  
  So if you spread over multiple sites, is this managable without dramatically increasing server load?
13. Re:No Way! by Synn · 2011-04-21 04:44 · Score: 2, Insightful
  
  "Back on topic, I just don't see all these day-long outages that apparenty seem to happen all the time in companies that haven't moved their servers to The Cloud(tm)."
  You must not get out much. Atlantic.net I had a 11 hour outage due to the staff not understanding how to update a Cisco router. Then a 4 hour outage when they screwed up billing and shut down our service with no warning. Then there was that time they didn't like our DNS traffic and shut down DNS with no warning or notice. That was a fun hour or so of me trying to figure out why our applications were having issues.
  So.. we go to replace them and one of the places we visit(that was highly recommended), first words outta their mouth was "So I'm sure you heard about that 6 hour outage we had xxx back. Here's what we've done to make sure it doesn't happen again..."
  Long outages happen. I'm pretty much in the firm belief that if your app can't scale out across large geography automatically the only thing giving you solid uptime is pure luck. Any time someone wants 5 9's with a single data center I just laugh. But setting up your app and data to work that way takes Work.
  Which, btw, would've also prevented outages for companies here. Only 1 zone was affected, EC2 also has zones on the west coast, Ireland and Asia pacific. If you built your app to use those and balanced via ELB, you likely wouldn't be impacted with this outage. But again, that takes $$$. Most companies don't want to spend that and frankly most companies probably don't need to.
14. Re:No Way! by Anonymous Coward · 2011-04-21 04:50 · Score: 0
  
  Hey, would you quit clouding the issue?
15. Re:No Way! by lumbercartel.ca · 2011-04-21 04:57 · Score: 0
  
  Al Gore can probably explain it best since he invented the internet and warned us all of global warming!
  
  --
  The Lumber Cartel, local 42 (Canadian branch)
  British Columbia, Canada
16. Re:No Way! by pdbaby · 2011-04-21 04:59 · Score: 1
  
  I believe they generate a new HTML document each time a comment is added or up/downvoted - they could replicate the comment and vote data to another site.
  It'd be an increase in traffic but not necessarily a huge increase in load (since they wouldn't be generating HTML at the second site unless they're in failover mode).
  I don't know whether the increased reliability would be worth the extra load in their case, however, since I doubt they lose that much money from downtime (given how frequently they're down)
  
  --
  Global symbol "$deity" requires explicit package name at line 2. - If only $scripture started "use strict;"
17. Re:No Way! by recoiledsnake · 2011-04-21 05:04 · Score: 1
  
  But how can this be possible? It's The Cloud . This sort of this simply doesn't happen.
  
  Yay, cloud!
  
  --
  This space for rent.
18. Re:No Way! by Blakey+Rat · 2011-04-21 05:07 · Score: 1
  
  Each data center also has independent zones.
  It looks like in this case, only one zone in one data center was affected-- that's bad, but that's not "end-of-the-world" bad. If sites are going down, they should have been more careful to distribute redundant servers in different zones.
  (Where this is a problem is if you're a small shop with a single DB server, and the zone holding your DB server goes down-- in that case you're kind of SOL.)
  
  --
  Comment of the year
19. Re:No Way! by lgw · 2011-04-21 05:19 · Score: 4, Insightful
  
  his will become quite the event in data warehouse circles I bet, because the cost of 'being in the cloud' just doubled; it's not enough to buy storage from one provider. The "always there" quality that's supposedly the benefit of cloud storage is a facade.
  The cloud doesn't have to be perfect - it just has to be as good in the eyes of VPs as the contractors they'd otherwise hire to run their internal datacenter. What's the value of an IT guy in the eyes of an MBA? Yeah, this sort of reality check wont phase them at all.
  
  --
  Socialism: a lie told by totalitarians and believed by fools.
20. Re:No Way! by j_l_larson · 2011-04-21 05:23 · Score: 1
  
  there will no doubt be a good number of heads rolling down halls there within a day or so
21. Re:No Way! by indeterminator · 2011-04-21 05:51 · Score: 1
  
  But when it's your gear, you have some control over the situation. When it's "in the cloud", you sit and get yelled at by the CXO and sweat if you'll still have a job while cloud provider X works to fix the problem.
  Try to focus on fixing something yourself while being yelled at. Now how annoying is that?
22. Re:No Way! by watanabe · 2011-04-21 06:00 · Score: 1
  
  As an example, we run our production servers on EC2 East; they have load balancers failing them between zones. The Database and webservers are fine, and have been fine today.
  The dev servers do not have load balancers running on them, and they have been choking in a miserable hell all morning.
23. Re:No Way! by indeterminator · 2011-04-21 06:00 · Score: 1
  
  Where this is a problem is if you're a small shop with a single DB server, and the zone holding your DB server goes down-- in that case you're kind of SOL.
  Even using traditional methods (i.e. non-cloud), a small shop would be unlikely to have enough redundancy when there's a datacenter-wide issue.
24. Re:No Way! by im_thatoneguy · 2011-04-21 06:01 · Score: 2
  
  We were out for a good portion of the day Monday after a bird flew into the telephone pole outside our office and then caused a critical server to go wonky after the UPS battery ran out and we didn't have the auto-shutdown settings correct.
25. Re:No Way! by ron_ivi · 2011-04-21 06:02 · Score: 1
  
  (Where this is a problem is if you're a small shop with a single DB server, and the zone holding your DB server goes down-- in that case you're kind of SOL.)
  IMHO the main beauty of a cloud is that you're NOT SOL.
  For one of the sites I manage, I am a small shop.
  The beauty of a cloud is that with Amazon's $0.02/hr micro instances, and $0.007 spot-priced micro instances I can *still* do things right (failover to remote data center, backups in different data center), even for clients that can only afford under $50/month in hosting.
26. Re:No Way! by TooMuchToDo · 2011-04-21 06:20 · Score: 1
  
  Annoying > Shitcanned.
27. Re:No Way! by The+End+Of+Days · 2011-04-21 06:51 · Score: 1
  
  Hey, what's that logical fallacy called when you make up the argument for the other side so you can shoot it down? I figured you would know, since you just did it right there.
28. Re:No Way! by El_Isma · 2011-04-21 07:04 · Score: 1
  
  The Amazon cloud not working? Already has happened at least once: http://blog.reddit.com/2011/03/why-reddit-was-down-for-6-of-last-24.html
29. Re:No Way! by alvieboy · 2011-04-21 07:16 · Score: 1
  
  It's raining.
  You know clouds often do that, don't you ?
  Prepare for thunderstorm one of these days. Your bits will be electrified to death, your bytes will bite you and apocalypse will finally arrive.
  Digital zombies. Hurry for canned tuna.
30. Re:No Way! by Anonymous Coward · 2011-04-21 07:35 · Score: 0
  
  Oh. MY. GOD. A website is DOWN!!!!!
  The end times are upon us! A website went down today! A website! This was foretold in scripture! RUN AND PANIC IN THE STREETS NOW. THAT IS ALL THERE IS LEFT. Nothing else makes any sense! Because a website is down!! Don't you understand?!?
  Unfortunately, using a HOSTS file would not work at all in this situation. It is just a bad solution.
31. Re:No Way! by sulfur · 2011-04-21 14:08 · Score: 1
  
  But... but I thought nobody ever got fired for choosing IBM^H^H^H Amazon!
32. Re:No Way! by Anonymous Coward · 2011-04-21 19:00 · Score: 0
  
  Fucking moron. What's wrong with you?
33. Re:No Way! by JesseDegenerate · 2011-04-23 12:02 · Score: 1
  
  I was an IT guy who's company got eaten by a larger marketing firm. We used to have a bunch of bound T1's. Now we have a DS3. Speed increase! however: amount of days down for 7 years pre buyout: 1, and that's combining 3 or so 3-4 hour oh shit the mail server times. it's been since jan 1, and we've been down for 8 solid days. They do everything in the cloud. in other news, anecdotal evidence is total bullshit.
Severe weather in Virginia likely the culprit by stopacop · 2011-04-21 03:50 · Score: 3, Informative

Severe weather hit the area. They shutdown Surry Power Station in Surry County, Virginia after a tornado took the power out that powers the power station.

--
http://www.stopacop.so -- You have rights. How about standing up for them before they go away?
1. Re:Severe weather in Virginia likely the culprit by Anonymous Coward · 2011-04-21 03:54 · Score: 0
  
  They don't have backup generators?!
2. Re:Severe weather in Virginia likely the culprit by OverlordQ · 2011-04-21 03:55 · Score: 0
  
  after a tornado took the power out that powers the power station.
  Does not compute. Once it's running why can't a power station use it's own power.
  
  --
  Your hair look like poop, Bob! - Wanker.
3. Re:Severe weather in Virginia likely the culprit by MintyGreenMedia · 2011-04-21 03:56 · Score: 1
  
  I find it slightly more concerning that the power plant didn't. They're not designed to be self-sufficient?
4. Re:Severe weather in Virginia likely the culprit by jtdennis · 2011-04-21 03:59 · Score: 1
  
  it was probably a distribution station, not a power generation facility.
  
  --
  -- "Freedom is the right of all sentient beings" -Optimus Prime
5. Re:Severe weather in Virginia likely the culprit by getagrip · 2011-04-21 03:59 · Score: 4, Informative
  
  I am in Northern Virginia. There is no power outage or severe weather here.
6. Re:Severe weather in Virginia likely the culprit by MmmmAqua · 2011-04-21 03:59 · Score: 1
  
  If it's a substation, it doesn't have its own power.
  
  --
  Arr! The laws of physics be a harsh mistress!
7. Re:Severe weather in Virginia likely the culprit by stopacop · 2011-04-21 04:00 · Score: 1
  
  Going by a news report I saw!
  
  --
  http://www.stopacop.so -- You have rights. How about standing up for them before they go away?
8. Re:Severe weather in Virginia likely the culprit by pdbaby · 2011-04-21 04:00 · Score: 2
  
  Amazon's Availability Zones are designed to have separate power, cooling and network so I don't think this is the issue. It was (is) a problem with their disk subsystem in multiple availability zones so I suspect they were in the process of pushing out some new storage controller code and some bug didn't appear until the later stages of their rollout. From their status log it looks like they're manually correcting the issue with each disk.
  
  --
  Global symbol "$deity" requires explicit package name at line 2. - If only $scripture started "use strict;"
9. Re:Severe weather in Virginia likely the culprit by Anonymous Coward · 2011-04-21 04:01 · Score: 0
  
  so Clouds take down Cloud
10. Re:Severe weather in Virginia likely the culprit by Burdell · 2011-04-21 04:02 · Score: 1
  
  No, they're not (see Fukashima, Japan). Basically, you don't just flip a switch and have a power plant go dark; you have to follow a shutdown procedure that takes both time and power. I don't know the requirements for coal or natural gas plants, but US nuclear plants are required to have multiple backup power sources (IIRC at least two independent diesel generator systems as well as off-site power). If the plant loses one backup power source for more than a certain period, it is required to shut down. IIRC if it loses two, it must shut down immediately (before potentially losing the remaining backups).
11. Re:Severe weather in Virginia likely the culprit by Anonymous Coward · 2011-04-21 04:02 · Score: 0
  
  Original poster should have included "Nuclear" in the Surry Power Station explanation.
12. Re:Severe weather in Virginia likely the culprit by metrometro · 2011-04-21 04:02 · Score: 2
  
  Amazon's comments on the outage do not mention weather as a cause: http://status.aws.amazon.com/
  "8:54 AM PDT We'd like to provide additional color on what were working on right now (please note that we always know more and understand issues better after we fully recover and dive deep into the post mortem). A networking event early this morning triggered a large amount of re-mirroring of EBS volumes in US-EAST-1. This re-mirroring created a shortage of capacity in one of the US-EAST-1 Availability Zones, which impacted new EBS volume creation as well as the pace with which we could re-mirror and recover affected EBS volumes. Additionally, one of our internal control planes for EBS has become inundated such that it's difficult to create new EBS volumes and EBS backed instances. We are working as quickly as possible to add capacity to that one Availability Zone to speed up the re-mirroring, and working to restore the control plane issue. We're starting to see progress on these efforts, but are not there yet. We will continue to provide updates when we have them. "
13. Re:Severe weather in Virginia likely the culprit by Gothic_Walrus · 2011-04-21 04:03 · Score: 0
  
  I am in Northern Virginia. There is no power outage or severe weather here.
  I'm gonna believe the multiple news stories that say you're wrong.
  
  --
  Goo goo g'joob.
14. Re:Severe weather in Virginia likely the culprit by alphatel · 2011-04-21 04:03 · Score: 1
  
  So they can't failover like a normal ESX instance? So my cloud computer is actually just a rack in Virgnia?
  
  --
  When the foot seeks the place of the head, the line is crossed. Know your place. Keep your place. Be a shoe.
15. Re:Severe weather in Virginia likely the culprit by Anonymous Coward · 2011-04-21 04:03 · Score: 0
  
  Probably contractors then.
16. Re:Severe weather in Virginia likely the culprit by kevinNCSU · 2011-04-21 04:06 · Score: 1
  
  after a tornado took the power out that powers the power station.
  Does not compute. Once it's running why can't a power station use it's own power.
  Because you tend to want to have power available to cool nuclear fuel even if you decide to stop producing power for whatever reason (maintenance, mechanical failure, tornado, earthquake, tsunami, nazi zombi attack)
17. Re:Severe weather in Virginia likely the culprit by Anne_Nonymous · 2011-04-21 04:08 · Score: 1
  
  >> Severe weather hit the area.
  So you're saying clouds took out the cloud?
18. Re:Severe weather in Virginia likely the culprit by xnpu · 2011-04-21 04:10 · Score: 2
  
  Those news reports do not rule out the possibility that he's in a place in Northern Virginia without severe weather or a power outage. How do you conclude that he is wrong?
19. Re:Severe weather in Virginia likely the culprit by Hal_Porter · 2011-04-21 04:14 · Score: 1, Funny
  
  Kill all permies
  
  --
  echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
20. Re:Severe weather in Virginia likely the culprit by Anonymous Coward · 2011-04-21 04:15 · Score: 2, Informative
  
  First: Please look at a map. Surry County is east of Richmond on the way to VA Beach. An outage at Surry Power Station would not affect a data center over in Dulles, VA. That power station does not server this area at all.
  Second: Read the news. Every comment above is wrong in one way or another. Here is a local news article about what happened down there, if you are curious:
  http://www.examiner.com/progressive-in-richmond/surry-power-station-under-repair-the-aftermath-of-tornado
  You people know nothing, and you post crap without doing any research at all.
21. Re:Severe weather in Virginia likely the culprit by Anonymous Coward · 2011-04-21 04:20 · Score: 0
  
  We don't normally count Southeastern Virginia as part of Northern Virginia.
22. Re:Severe weather in Virginia likely the culprit by hawguy · 2011-04-21 04:23 · Score: 1
  
  News reports are spotty, but I imagine that the plant tripped the turbines offline after the tornado damaged the power distribution equipment.
  When it's generating 1GW of power and suddenly the load goes down to 0GW, the turbines have to trip offline automatically and immediately to prevent damage.
  This may have also triggered a shutdown of the nuclear reactor, and it may take days or longer to bring it online after an emergency shutdown.
23. Re:Severe weather in Virginia likely the culprit by Anonymous Coward · 2011-04-21 04:27 · Score: 0
  
  The tornado was several days ago, on the 18th.
24. Re:Severe weather in Virginia likely the culprit by MintyGreenMedia · 2011-04-21 04:27 · Score: 1
  
  I think you're ignoring the fact in the case of Fukashima, they were set up to be self-sufficient -- it's just that the tsunami knocked out their backup generators.
25. Re:Severe weather in Virginia likely the culprit by MintyGreenMedia · 2011-04-21 04:29 · Score: 1
  
  Surry Power Station
26. Re:Severe weather in Virginia likely the culprit by MobileTatsu-NJG · 2011-04-21 04:29 · Score: 1
  
  Severe weather hit the area. They shutdown Surry Power Station in Surry County, Virginia after a tornado took the power out that powers the power station.
  Of course we all know that the not-cloud would have been impervious to that.
  
  --
  
  "I like to lick butts!" by MobileTatsu-NJG (#32700246) (Score:5, Informative)
27. Re:Severe weather in Virginia likely the culprit by MintyGreenMedia · 2011-04-21 04:34 · Score: 1
  
  I'd agree, except a) it's called "Surry Power Station," and b) a quick Google on that name gives you all the gory details.
28. Re:Severe weather in Virginia likely the culprit by TooMuchToDo · 2011-04-21 04:35 · Score: 1
  
  Your cloud computer is a Xen instance in Virginia, and your "EBS block storage" is an iSCSI target. Magic it ain't.
29. Re:Severe weather in Virginia likely the culprit by tlhIngan · 2011-04-21 04:35 · Score: 1
  
  I think you're ignoring the fact in the case of Fukashima, they were set up to be self-sufficient -- it's just that the tsunami knocked out their backup generators.
  Only due to cost savings. The tsunami wall required was half the height required (6M instead of 12M). Naturally, a 10M high tsunami hit. And no placement of the generators would've helped (they were in the basement, and that got flooded, but if they were outside, they could've gotten washed away).
30. Re:Severe weather in Virginia likely the culprit by SecurityGuy · 2011-04-21 04:37 · Score: 1
  
  In which case being unable to use a secondary source (self-generated power) would be a bad thing, no?
31. Re:Severe weather in Virginia likely the culprit by Anonymous Coward · 2011-04-21 04:39 · Score: 0
  
  He's wrong because he assumes his place is indicative of the entire Northern Virginia?
32. Re:Severe weather in Virginia likely the culprit by alphatel · 2011-04-21 04:40 · Score: 1
  
  Essentially half-cloudassed clouding.
  
  --
  When the foot seeks the place of the head, the line is crossed. Know your place. Keep your place. Be a shoe.
33. Re:Severe weather in Virginia likely the culprit by coastal984 · 2011-04-21 04:40 · Score: 1
  
  ...That was on Saturday.
34. Re:Severe weather in Virginia likely the culprit by Anonymous Coward · 2011-04-21 04:40 · Score: 0
  
  Being that this power station is in southern Virginia, I think that is highly possible.
35. Re:Severe weather in Virginia likely the culprit by coastal984 · 2011-04-21 04:42 · Score: 1
  
  We in southeastern Virginia are normally offended when coupled with Northern Virginia :)
36. Re:Severe weather in Virginia likely the culprit by Anonymous Coward · 2011-04-21 04:43 · Score: 0
  
  Sort of like a Tsunami knocking out the diesel generators that pump water for a Nuclear Power Plant?
37. Re:Severe weather in Virginia likely the culprit by Anonymous Coward · 2011-04-21 04:43 · Score: 0
  
  Ok, so what happened is the severe weather from Saturday caused them to shut down the reactor today. Stopacop made it seem like there was severe weather last night.
38. Re:Severe weather in Virginia likely the culprit by Kamiza+Ikioi · 2011-04-21 04:45 · Score: 1
  
  But the scanner says their power level is Over 9000!
  
  --
  I8-D
39. Re:Severe weather in Virginia likely the culprit by TooMuchToDo · 2011-04-21 04:50 · Score: 1
  
  Not really half-assed from an implementation perspective, but from a marketing perspective. Amazon likes people to think it's magic, which is fine if it worked flawlessly all the time. But it doesn't, because it's just a technical solution for a specific problem. Unless you run instances in multiple zones, use redundant EBS volumes, and your entire app is built to handle global redundancy, it's not just going to be 100% uptime out of the box. I fault Amazon for lying to technical-enough people.
40. Re:Severe weather in Virginia likely the culprit by Wornstrom · 2011-04-21 04:53 · Score: 1
  
  it's true: http://www.examiner.com/progressive-in-richmond/surry-power-station-under-repair-the-aftermath-of-tornado
  Tornado was Saturday. I live on the other side of the James River from Surry.
41. Re:Severe weather in Virginia likely the culprit by Synn · 2011-04-21 04:54 · Score: 1
  
  "Essentially half-cloudassed clouding."
  EC2 is just tools. It's as cloudassed as you make of it.
  I can take ESX and use a Netapp for data storage and if my Netapp cluster takes a dive, you can't fail over to anything since your data is down.
  On the other hand I can take EC2 and run apps and clustered DBs across the east and west coast and put ELB on front of it. If the east coast takes a nuke, everything will keep on running.
42. Re:Severe weather in Virginia likely the culprit by kevinNCSU · 2011-04-21 05:01 · Score: 1
  
  A secondary source would be the backup generators or off-site power from the grid. If you lose one of your secondary sources it becomes unacceptably risky to keep your reactor running at full steam because you no longer have the safety net of as many backup sources. The safe play is then to shut down the plant so it begins to cool immediately before something can go wrong and your left with no backup sources to provide cooling power.
43. Re:Severe weather in Virginia likely the culprit by Jawnn · 2011-04-21 05:02 · Score: 1
  
  Wow, then it's understandable. Good thing they weren't running a nuclear power plant or something.
44. Re:Severe weather in Virginia likely the culprit by Overzeetop · 2011-04-21 05:11 · Score: 1
  
  Well, that just about sums up the attitude of Northern Virginia towards the rest of the state.
  
  --
  Is it just my observation, or are there way too many stupid people in the world?
45. Re:Severe weather in Virginia likely the culprit by recoiledsnake · 2011-04-21 05:11 · Score: 1
  
  N. Va is not really that big. All the article cited talk about VA, not NVA.
  
  --
  This space for rent.
46. Re:Severe weather in Virginia likely the culprit by Drathos · 2011-04-21 05:14 · Score: 1
  
  Yeah, that may be true, but it has nothing to do with anything going on in Northern Virginia. Surry is in Southeastern Virginia, over 150 miles away.
  
  --
  End of line..
47. Re:Severe weather in Virginia likely the culprit by Anonymous Coward · 2011-04-21 05:16 · Score: 1
  
  Oh yea, we're gonna trust papers that print scare-mongering stuff like this: "This incident highlights, again, the precariousness of nuclear power in the face of natural disasters. Had the tornado hit one or both of the nuclear reactors, the damage could have been immense." Absolute nonsense. Like a tornado has anything like the energy in a Tsumani. You sir, are a gullible idiot.
48. Re:Severe weather in Virginia likely the culprit by Anonymous Coward · 2011-04-21 05:21 · Score: 0
  
  Regardless of whether or not it is causing the issue.. this made me laugh.
  "Fortunately, the tornado did not directly hit the two nuclear reactors" from the Examiner link.
  Seems like that would have been the better option.. I'm going to wager the nuclear reactors could easily withstand it.
49. Re:Severe weather in Virginia likely the culprit by tunapez · 2011-04-21 05:35 · Score: 1
  
  .Your cloud computer is a Xen instance in Virginia, and your "EBS block storage" is an iSCSI target. Magic it ain't.
  There is no room for accurate or useful specifications in the flamboyant, misrepresentation of marketing. Please enjoy the cuddly puppies and warm fuzzys.
  
  --
  Imagination drew in bold strokes, instantly serving hopes and fears, while knowledge advanced by slow increments...
50. Re:Severe weather in Virginia likely the culprit by krnpimpsta · 2011-04-21 05:39 · Score: 1
  
  Well, that just about sums up the attitude of Northern Virginia towards the rest of the state.
  There's a "rest of the state?" :)
  
  (Also in NoVA, no outages or severe weather here)
  
  --
  New webcomic updated on Sundays: HERE
51. Re:Severe weather in Virginia likely the culprit by Anonymous Coward · 2011-04-21 06:00 · Score: 0
  
  Amen Bro. In other related new, the Ashburn Chipotle had the shortest lunch line ever recorded when I picked up lunch today, and I'd wager that the same could be said for Reston and Sterling. LOL.
52. Re:Severe weather in Virginia likely the culprit by inject_hotmail.com · 2011-04-21 06:03 · Score: 2
  
  Why not put them on the roof? I think any datacenter designer would say that, first thing...I mean, they stored their precious depleted uranium and plutonium on the roof...why not the generators too?
  
  The real problem everywhere...and I do see it everywhere...is that the people paid to be the people that 'know' simply don't know, or have no sense of creativity or foresight. I mean come on, they built a tsunami wall because they have a high probability of tsunamis, and then they go and put the most mission-critical, life-saving, life-altering power generators in the path of a tsunami-we-are-protected-from+1. For crying out loud, when I moved into my house I very easily decided that I won't put anything in my basement that I -really- care about...and I'm not even -near- a flood plane...let alone on a coastal fissure infested area known as the frickin' ring-of-fire!
  
  I should be in charge of everything...that way crap would get done, it wouldn't be obsolescently planned, no one would die from corporate gree^H^H^H^H"mistakes", and life would be easy for everyone.
53. Re:Severe weather in Virginia likely the culprit by Anonymous Coward · 2011-04-21 06:23 · Score: 0
  
  So my cloud computer is actually just a rack in Virgnia?
  Most of the internet is actually just a bunch of racks along the Dulles Toll Road in Ashburn, Sterling, Dulles, Herndon, Reston, and Tyson's Corner (Vienna + McLean). Between all the commercial and government hosting sites plus MAE-EAST, this has to be the most densely wired and computationally powerful place on the planet.
54. Re:Severe weather in Virginia likely the culprit by The+End+Of+Days · 2011-04-21 07:02 · Score: 1
  
  None of those places are Northern Virginia
55. Re:Severe weather in Virginia likely the culprit by kcitren · 2011-04-21 07:03 · Score: 1
  
  I'm outside a starbucks a very close to the data center; power is fine, weather is fine [maybe a little too sunny to be working on a laptop outside]
56. Re:Severe weather in Virginia likely the culprit by The+End+Of+Days · 2011-04-21 07:03 · Score: 1
  
  You should be, you basically rely on us to power the economic engine that drives the whole state and we make you all look like inbred retards who spend most of your time worshipping sky fairies and trying to figure out how to bring back the confederacy.
  Well technically, you people make yourselves look that way. We just make you look bad in contrast.
57. Re:Severe weather in Virginia likely the culprit by Anonymous Coward · 2011-04-21 07:51 · Score: 0
  
  So what you are saying is the cloud got hit by another cloud?
58. Re:Severe weather in Virginia likely the culprit by Fulcrum+of+Evil · 2011-04-21 08:13 · Score: 1
  
  Which is why you design nuke plants that don't actually need power to avoid disasters. If only we could build some of those...
  
  --
  "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
59. Re:Severe weather in Virginia likely the culprit by Anonymous Coward · 2011-04-21 08:18 · Score: 0
  
  Doubtful. There are at least two more nuclear plants in Virginia going up towards D.C. One near RIchmond and one near D.C. I think there are even a few more north of D.C.
60. Re:Severe weather in Virginia likely the culprit by guruevi · 2011-04-21 08:51 · Score: 1
  
  Yup, if you have large enough needs for it, it's better to roll your own. "Clouds" (hosted, virtual services for the rest of us) are great for small companies and single-man businesses. You put an instance in the cloud and only pay for what you use. As long as it's cheaper than 1 or 2 servers and a part-time sysadmin you're good to go.
  As soon as you NEED your system to be up and RELY on it you need something more expensive and even in the cloud the price goes up quickly. Pull a couple of TB's out of Amazon or require more than 2 full time processors and you'll see how quickly it adds up. After all, those people need to invest in the same tech you should've invested in AND they need to make a healthy profit.
  Funny you mention EBS block storage. I tried an instance using ZFS and it seems you may not get a cleanly separated iSCSI target. More than once in a 3 months period they sent an e-mail that "oops - one of our storage modules wiped out and we don't have a backup" and ZFS reported errors across multiple targets even though they were 'promised' by the sales person (but not in writing) that each instance is on a different physical device.
  
  --
  Custom electronics and digital signage for your business: www.evcircuits.com
61. Re:Severe weather in Virginia likely the culprit by Anonymous Coward · 2011-04-21 09:35 · Score: 0
  
  WOW. Where in the fuck did that come from? What the fuck is wrong with you? Would you like someone to call you Canadian? No? Well its the same fucking thing. Please don't post again, you clearly have absolutely no positive contributions to make...
62. Re:Severe weather in Virginia likely the culprit by DCFusor · 2011-04-21 10:00 · Score: 1
  
  Yeah, like my non-cloud network with 3-5 days of battery backup, generators, and solar PV power. Is that what you mean? And I'm just a small time guy (only about 10 machines 24/7). They can't provision for their pro network servers? What a friggin joke.
  
  --
  Why guess when you can know? Measure!
63. Re:Severe weather in Virginia likely the culprit by DCFusor · 2011-04-21 10:12 · Score: 1
  
  Yeah, it's where the intelligent people all live, but you'd not understand that, or why. Who wants to live jam packed like sardines with a bunch of politicians, lawyers, beltway bandits, and criminals? Evidently a lot of people in NOVA. You know, the ones MY tax dollars subsidize because you can't control your development, your road costs, and the general assholery of your local governments.
  
  But shhhh - don't tell anyone. We like it here without y'all. West Virginia is only "almost heaven" after all.
  
  --
  Why guess when you can know? Measure!
64. Re:Severe weather in Virginia likely the culprit by Kyusaku+Natsume · 2011-04-21 21:18 · Score: 1
  
  Because the roof of the reactor building is not strong enough to bear the load of a 6.5 MW diesel power plant. I doubt that the roof of any building is able to bear the load an vibration of that kind of equipment.. The source of the troubles in Fukushima is that they effectively, like GP said, put a too low sea barrier for the tsunamis that have it the area in previous times. Additionally, they ignored internal reports that suggested improved tsunami countermeasures like walls and water proof doors. The tsunami ripped open the door of the turbine building from the side of unit 4, the sea gone inside the building with full force. the necessary walls and doors that would have prevented the current disaster would had come at a price of less of 100 million USD, far cheaper than any of the units that have been scuttled in the current disaster.
  
  --
  Mexico: 100% conservative's America now!
65. Re:Severe weather in Virginia likely the culprit by kevinNCSU · 2011-04-22 01:12 · Score: 1
  
  The fine print on those designs is generally around the order of 72 hours of passive cooling at which point you're going to need power to pump water back up to the gravity fed reservoirs. It's a great safety feature but it still doesn't eliminate the risk to the point where you wouldn't want to shutdown should one of your redundant backups be taken offline.
66. Re:Severe weather in Virginia likely the culprit by kevinNCSU · 2011-04-22 01:13 · Score: 1
  
  sorry, gravity drain* not gravity fed. Otherwise you wouldn't have to pump water up there ;)
67. Re:Severe weather in Virginia likely the culprit by Anonymous Coward · 2011-04-23 17:03 · Score: 0
  
  after a tornado took the power out that powers the power station.
  I am sorry but it did what?
68. Re:Severe weather in Virginia likely the culprit by inject_hotmail.com · 2011-04-24 03:32 · Score: 1
  
  6.5MW is a lot...over 10,000 fuel rods were stored there (well, maybe not right on the roof)...what can I say, I'm an arm-chair expert in the field.
  
  Ok, I'll shut up.
69. Re:Severe weather in Virginia likely the culprit by Kyusaku+Natsume · 2011-04-24 11:29 · Score: 1
  
  6.5 MW are enough to power 3 Shinkansen N700 train sets fully loaded at 270 km/h. You need 2 generators per unit, the static load is not much of a problem since the containment building is designed, has you point out, to bear the load of a pool full of heavy metal, but the huge vibrations from the machines are the main problem. At our datacenter, our puny 350 kW generator is strong enough to shake the windows of the management building 20 m away despite the generator being encased in its own building and you can feel the vibration in the datacenter that uses the super structure used previously to house 3 50 MW power generators that used oil has fuel. Maybe you can guess now that I work for a utility company.
  Best Regards.
  
  --
  Mexico: 100% conservative's America now!
70. Re:Severe weather in Virginia likely the culprit by inject_hotmail.com · 2011-04-26 02:04 · Score: 1
  
  Hmmm, now you've got me thinking. I think the engine in my car puts out 340kW (460HP). A 6.5MW generator would have to be something close to 20 times that. I've heard of some vehicle engines being 1,000HP, which is roughly double, so, maybe 10 of those high-output engines would do it...but that's peak output..not a happy place to sit for any motor or engine. I can only imagine feeding something of that size...and then cooling...
  
  Large-scale engineering amazes me.
71. Re:Severe weather in Virginia likely the culprit by Kyusaku+Natsume · 2011-04-26 17:11 · Score: 1
  
  I was wrong about the power output of our emergency power plant, is 650 kW not 350 kW. Is for 100kW of computing and comms equipment and 8 HVAC. In one of the previous discussions over Fukushima there was a very good post about a conference in Caltech from an expert in nuclear safety that had a picture of the knocked out generators. Also, you can see in this link the pictures and videos of Fukushima directly from TEPCO:
  http://www.tepco.co.jp/en/news/110311/index-e.html
  Best Regards
  
  --
  Mexico: 100% conservative's America now!
Oh boy by MintyGreenMedia · 2011-04-21 03:50 · Score: 1

I'm glad everyone's moving to the cloud for reliability and scalability purposes!
1. Re:Oh boy by codepunk · 2011-04-21 04:04 · Score: 1
  
  In about the time it took you to write that message I spun up a standby deployment in another data center smart guy.
  
  --
  
  Got Code?
2. Re:Oh boy by cduffy · 2011-04-21 04:05 · Score: 1
  
  Amazon has "availability zones" for a reason, as do other cloud vendors.
  If your infrastructure isn't resilient against everything in a zone suddenly disappearing, you're Doing It Wrong.
3. Re:Oh boy by sbrown123 · 2011-04-21 04:06 · Score: 1
  
  Scalability: yes.
  Cheap: yes.
  Reliability: they don't say they are 100% fail safe. I think the figure is still in the 90's though which is pretty good.
  If anyone tries to sell you 100% they are liars.
4. Re:Oh boy by characterZer0 · 2011-04-21 04:08 · Score: 1
  
  How long does it take you to have the IP addresses rerouted?
  
  --
  Go green: turn off your refrigerator.
5. Re:Oh boy by petteyg359 · 2011-04-21 04:15 · Score: 0
  
  The Christians try to sell me 100% coverage...
6. Re:Oh boy by TooMuchToDo · 2011-04-21 04:36 · Score: 1
  
  Really? Wow. Perhaps you should let major sites like Reddit know. They've been down for *hours*.
  The cloud works if you don't care about having control over when your business is down.
7. Re:Oh boy by moj0e · 2011-04-21 04:38 · Score: 0
  
  How long does it take you to have the IP addresses rerouted?
  With Amazon's Elastic IPs, it takes seconds to reroute an IP address to another machine. Very handy in situations like these.
8. Re:Oh boy by cduffy · 2011-04-21 04:49 · Score: 1
  
  Really? Wow. Perhaps you should let major sites like Reddit know. They've been down for *hours*.
  The cloud works if you don't care about having control over when your business is down.
  Last time I had a physical DC go down it was a cooling failure. Didn't have much control over that either.
  Moreover, with a cloud vendor I can have servers in multiple sites with different power, connectivity, and geographic location without massive investment in each.
9. Re:Oh boy by mini+me · 2011-04-21 05:57 · Score: 1
  
  I understand the need for physical availability zones, but the whole idea behind the cloud is that you, the end user, need not care about those details. It is up to the cloud provider to figure it out. The cloud represents a black box, of sorts. If they are having trouble in one zone, everything should automatically migrate to another without anyone outside of the operation knowing it.
  I'm not saying Amazon's solution is bad, but I'm not sure it is in the spirit of what I would consider real cloud hosting. Really, they are providing the tools so that you can build a cloud service.
10. Re:Oh boy by codepunk · 2011-04-21 07:49 · Score: 1
  
  eip's move in seconds but in my use case I do not need eip's since a front end is handing off the requests to the cloud systems.
  
  --
  
  Got Code?
11. Re:Oh boy by cduffy · 2011-04-22 12:10 · Score: 1
  
  I'm not saying Amazon's solution is bad, but I'm not sure it is in the spirit of what I would consider real cloud hosting. Really, they are providing the tools so that you can build a cloud service.
  What you would consider "real cloud hosting" doesn't exist. Moreover, it *can't* exist in a general-purpose[1] way that doesn't require applications to be rewritten to make CAP-theorem tradeoffs explicit -- if you're writing an electronic medical record system with massive financial penalties if you lose someone's prescription, you simply can't afford to lose a few seconds of data when one of your DCs goes offline, whereas you can't afford not to defer synchronization of less critical data in cases where 200ms of extra latency (at barest minimum) makes the difference between your application being considered usable and trash.
  So -- here in the Real World, we have a system of regions and availability zones, first courtesy Amazon and more recently picked up by other providers, which gives you the tools to build your own solution. Sure, it's not what pie-in-the-sky dreams are made of... but it's a heckuvalot cheaper and easier to run a "knife provision" command and spin up a fleet of VMs in a different region or zone than it is to get on the phone, rent rack space, buy servers, pay someone to install them, and so on and so forth for every geographic location in which you want to have a presence.
  Let's not let the perfect be the enemy of the good, eh?
  [1] - "General purpose" meaning that you could certainly build something like this if you were willing to force a particular set of tradeoffs -- say, if your customers were required to agree that they could lose up to a certain amount of committed data without warning (with availability failures in cases where the amount of potential dataloss would go beyond that amount), or that they would use a datastore (like CouchDB) with explicit conflict resolution support for split-brain cases when multiple DCs thought they held the "live" master copies of the data -- but these solutions are incompatible with giving people the conventional "I have a server, I can install whatever I want on it, my data is persistent" view of the world they expect.
12. Re:Oh boy by codepunk · 2011-04-23 06:45 · Score: 1
  
  The fact that reddit was down for hours is as much their fault as it was amazons. Just because you run servers in the cloud does not mean you don't have to worry about disaster recovery.
  
  --
  
  Got Code?
13. Re:Oh boy by mini+me · 2011-04-23 06:48 · Score: 1
  
  I understand your technical points, but it is not "the cloud" if they cannot provide those features. Amazon is just another generic hosting service, not a cloud service. You can build a cloud application that can withstand errors like this on top of AWS, but AWS itself is not the cloud.
  I don't fault Amazon for implementing the services they way they do. They don't need to be in the cloud because their market isn't catering to people who need cloud-based services. Their market is the people who are building the cloud.
14. Re:Oh boy by cduffy · 2011-04-23 09:31 · Score: 1
  
  I understand your technical points, but it is not "the cloud" if they cannot provide those features. Amazon is just another generic hosting service, not a cloud service.
  If you close the definition of a cloud service to the point that nobody can provide a generic cloud service... then what good is the definition anyhow?
  An excessively closed definition is similarly useless to an excessively open one -- neither is optimal for purposes of expressing meaning to others. As such, I'll stick with the conventional definition rather than accept your proposed amendment.
Increased Latencies by JamesonLewis3rd · 2011-04-21 03:54 · Score: 1

Bummer.

--
Hebrews 11:8
Jeremiah 33:3
Lucky by denshao2 · 2011-04-21 03:57 · Score: 1

My instance is on us-east-1d which is still up.
1. Re:Lucky by pdbaby · 2011-04-21 04:06 · Score: 1
  
  Their API gives different names for the availability zones for each user (so your us-east-1d could be my us-east-1a) which complicates talking about issues (since all you can say is "two availability zones are experiencing problems"), especially when your system uses multiple accounts
  
  --
  Global symbol "$deity" requires explicit package name at line 2. - If only $scripture started "use strict;"
2. Re:Lucky by Anonymous Coward · 2011-04-21 04:09 · Score: 0
  
  Same here - no issues in us-east-1d with instances or EBS. Their status page at http://status.aws.amazon.com/ gives no indication of which availability zones in east-1 are ok and which are having problems.
3. Re:Lucky by Blakey+Rat · 2011-04-21 05:13 · Score: 1
  
  Really? What's the purpose of that? Some kind of half-assed based-on-human-psychology load balancing?
  My servers are in us-east-1d as well, and they didn't go down, but maybe that's just dumb luck as my 1d is your 1b.
  I can't do a really redundant setup, though, because I need a MS SQL instance and we don't have the budget for a second one to mirror to, so ... if the zone with our MS SQL instance goes down, or app is sunk regardless of how distributed the web servers are.
  
  --
  Comment of the year
4. Re:Lucky by pdbaby · 2011-04-21 06:11 · Score: 1
  
  Yeah, I think that's what they're trying to do. I suppose it makes sense in a way, they want to make sure load is evenly distributed across their availability zones . But it seems to me they could have prevented that through better API design (e.g. users expressing a constraint that 2 resources should be in the same zone where that's meaningful but otherwise not permitting the selection of a specific zone)
  
  --
  Global symbol "$deity" requires explicit package name at line 2. - If only $scripture started "use strict;"
It was the anonymous! by Anonymous Coward · 2011-04-21 03:58 · Score: 0

The DDoS didn't work so they tried something else.
I know I'm a coward member.
Slashdot 'em while they are down by phorwich · 2011-04-21 03:59 · Score: 1

Well... I am sure the additional server load from curious slashdotters like myself can only be helping.

--
Wait. Stop scrolling for a sec. O.K. Thanks. - P
The dark side of outsourcing by HangingChad · 2011-04-21 04:03 · Score: 2

Slashdot and Digg have one day traffic surges because Reddit is down. I'm getting way too much done today not being distracted by the GoneWild girls. This productivity must cease at once!
Does go to show what can happen when your business depends on an outsource provider. Everyone has to depend on service providers to some extent, but sometimes it's a good exercise to see how many of your company eggs are in one basket. Redundancy is expensive, but so is losing business. Even Google has had Gmail interruptions, lost some customer data and experienced slow downs.

--
That's our life, the big wheel of shit. - The Fat Man, Blue Tango Salvage
tested by nickb64 · 2011-04-21 04:07 · Score: 1

so this is why tested.com is down...
Give me my Reddit back! by Frederic54 · 2011-04-21 04:09 · Score: 1

Else I don't know what to do? I almost went to Digg! so please amazon guys, work on your stuff!

--
"Science will win because it works." - Stephen Hawking
1. Re:Give me my Reddit back! by Anonymous Coward · 2011-04-23 08:39 · Score: 0
  
  Man you must be desperate for something to do. I didn't even consider going back to DIGG.
Emergency Plan by sycorob · 2011-04-21 04:11 · Score: 4, Interesting

I didn't even realize that one of our partners was using Amazon EWS until suddenly they were down all day. Amazon is really stable historically, but it's frustrating when you're out of business and all you can do is wait and see if Amazon will fix it soon.
In the "old school" thinking, smart companies have a redundant data center somewhere, humming along and waiting to be switched on if the main data center ever goes down. "The cloud" was supposed to solve that - massive redundancy within Amazon's services were supposed to protect you from outages. Not the case, apparently, since it looks like Amazon is going to fall below their promised 99.95% uptime (4.38 hours per year downtime).
I think the answer is to have redundant cloud services online, so you could switch from Amazon to Google or DevGrid if you had issues. The problem is, there's nothing quite like Amazon right now, it's not easy to switch from Amazon to some random service. This might be the biggest argument against virtual services - lack of standardization makes it hard to move from one to another, and hard to set up backup services in case of emergency.
1. Re:Emergency Plan by MariusBoo · 2011-04-21 04:26 · Score: 3, Insightful
  
  Actually in the case of EC2 the smart thing would have been to have your instances spread over different availability zones...
2. Re:Emergency Plan by ron_ivi · 2011-04-21 04:27 · Score: 2
  
  Just using Amazon West as well as Amazon East would have saved customers from this outage.
  I think Amazon actually does great at covering all the technological single-points-of-failure.
  The only reason I'd want a second cloud vendor is for the sales/account related single-point-of-failure of the Amazon Account being frozen due to a sales miscommunication or a MPAA/RIAA takedown notice,etc.
3. Re:Emergency Plan by Synn · 2011-04-21 04:28 · Score: 1
  
  "In the "old school" thinking, smart companies have a redundant data center somewhere, humming along and waiting to be switched on if the main data center ever goes down. "
  The problem is that gets really really expensive and it's actually quite hard to do properly.
  You can do this with EC2 though, just have your application cross various geographical zones. Things like ELB even make this somewhat easier. But you still have to solve all the application problems that exist when your data stores exist across large distances.
4. Re:Emergency Plan by Anonymous Coward · 2011-04-21 04:28 · Score: 1
  
  Amazon does have multiple datacenters -- it's your partners that didn't take advantage of that. The west coast datacenter costs a little more, but nothing is preventing them from starting instances there, except that maybe they don't have their database mirrored.
5. Re:Emergency Plan by Alarash · 2011-04-21 04:31 · Score: 3
  
  Even by using only AWS you can set up redundancy across multiple North America's regions. Even across continents, with one data center in Ireland and one in Singapore. But obviously it costs extra as they bill you the bandwidth between the regions. That's how you use The Cloud (c) (tm) (R). Using a single data center to set up redundancy is dumb because it's not redundancy. You need high availability for your VMs, but also for your data center.
  
  This is why banks or large businesses, for instance, have two or more data centers they always keep synchronized and have at least 50 kilometers between them. Thinking "well it's in one AWS data center so it's safe" is wrong, and this incident is a fine example of that.
6. Re:Emergency Plan by Anonymous Coward · 2011-04-21 04:35 · Score: 0
  
  Amazon is really stable historically...
  I think the users over at Reddit would beg to differ.
7. Re:Emergency Plan by pdbaby · 2011-04-21 04:38 · Score: 1
  
  Amazon have complete isolation between Regions and good isolation between Availability Zones.
  At work we'd recommend people use 2 cloud providers for their important services (which could be 2 Amazon regions or it could be Amazon and Rackspace) to prevent this sort of failure taking your business offline. You can't rely on any particular cloud provider to be reliable but it's a reasonably safe bet that a selection of cloud providers won't have significant overlapping downtime
  It's also worth pointing out that all cloud SLAs are basically useless: if Amazon falls below their advertised uptime they'll refund you some of your charges - but they'll never refund more than what you've paid them: they don't compensate you for all the money you're losing (and the AWS charges are likely pocket change compared to this)
  
  --
  Global symbol "$deity" requires explicit package name at line 2. - If only $scripture started "use strict;"
8. Re:Emergency Plan by Anonymous Coward · 2011-04-21 05:01 · Score: 0
  
  In the "old school" thinking, smart companies have a redundant data center somewhere, humming along and waiting to be switched on if the main data center ever goes down...
  A certain credit card company (for whom I worked for 25+ years) maintains multiple data centers, any of which can carry the full load and all running simultaneously in synch. Instead of a "Fail and recover" mode, it's a "Never fail" mode. Pricey, though.
9. Re:Emergency Plan by Anonymous Coward · 2011-04-21 05:01 · Score: 2, Informative
  
  50km is not a far enough distance. I witnessed this first hand for the employer I worked for on the Gulf Coast during Katrina. That storm jacked up about 120 miles, took down our primary AND failover sites.
10. Re:Emergency Plan by mikeytag · 2011-04-21 05:02 · Score: 2
  
  Nail on the head here. We were affected today and while I have full offsite backups of everything we don't have a second datacenter to switch on because of cost and complexity. It's not too difficult to have webservers span different parts of the globe, but DB servers like MySQL are a whole different story and usually very crucial.
11. Re:Emergency Plan by hey! · 2011-04-21 05:09 · Score: 2
  
  Actually, I'm more concerned about the *organization* as a single point of failure. If you rely on, say, Oracle (ugh), and Oracle goes bankrupt or a court orders them to stop selling their database or they simply decide to stop supporting some feature, you're still in business, and have a pretty good shot at moving to some similar database management system.
  If you built a mission critical system on Amazon's cloud services, a single court order not aimed at you could put you out of business. If Amazon was forced or decided to get out of the cloud hosting business, you'd have a heck of a time transitioning over to another cloud service because Amazon's services are so architecturally unique.
  
  --
  Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
12. Re:Emergency Plan by Anonymous Coward · 2011-04-21 05:24 · Score: 0
  
  Redundancy in AWS comes in three forms:
  * Multiple availability zones per region ('North America' has US-EAST, US-WEST)
  * Multiple sub zones per availability (US-EAST has US-EAST-1, US-EAST-2 etc.)
  * Multiple regions - North America / Europe / Asia Pacific
  The failure appears to lie in US-EAST-1 - one chunk of one availability zone.
  Depending on how critical your service is, you would most likely want to spread your services across multiple availability zones, or perhaps even multiple regions.
  The Amazon EC2 Service Level Agreement commitment is 99.95% (4.38 hours per year) availability for each Amazon EC2 Region (http://aws.amazon.com/ec2-sla/), which is what really means anything - according to the SLA, customers would receive a credit; but that's of course no consolation in some cases. I'm sure these customers will review their redundancy and start ploughing capacity into US-EAST (or indeed a different cloud service provider).
13. Re:Emergency Plan by Anonymous Coward · 2011-04-21 06:38 · Score: 0
  
  Our master database was in the availability zone that was hit the hardest and the RDS (hosted MySQL basically) fail-over happened as promised. AWS gives you the tools to provide for your own availability and a great many of their customers took advantage of it. We're load-balanced using ELB between two different availability zones and the only effect on our operations has been intermittent network connectivity problems to our database. That's not an uncommon setup and people who took advantage of that are doing ok.
14. Re:Emergency Plan by hey · 2011-04-21 07:50 · Score: 1
  
  Doesn't RackSpace offer the same same thing as AWS?
15. Re:Emergency Plan by Thundersnatch · 2011-04-21 09:59 · Score: 1
  
  The problem is, there's nothing quite like Amazon right now
  Rackspace's Cloud is about as close as it gets. They are the clear #2 player in the IaaS market. DCs in Chicago, Dallas, VA, and Cali as I recall. Not quite as mature as AWS from a features standpoint, but they seem to have made better design choices in many ways. No transient instances that disappear all your data for example. They just introduced a feature comparable to ELB as well.
16. Re:Emergency Plan by Slashdot+Parent · 2011-04-22 03:22 · Score: 1
  
  Actually in the case of EC2 the smart thing would have been to have your instances spread over different availability zones...
  This is exactly what AWS recommends, and this would not have saved you yesterday.
  The reason why yesterday was such a Big Deal(TM) is that a software failure in one AZ took out an entire region. That is absolutely not how EC2 is supposed to work. Each AZ in a Region is supposed to function like a separate data center: independent power supply, uplink, etc. But in yesterday's outage (they're still having issues today, by the way), an entire Region failed, and it failed for reasons other than a huge natural disaster.
  I've always been a big fan of AWS, and have used them for a long time. I will continue to do so, but make no mistake about it, yesterday's event is a colossal egg in the face of EC2's EBS team. Many AWS users are seriously bent out of shape over this, and I tend to agree with them. They architected their applications to failover to another AZ, just as AWS recommended. They did the right thing, yet they still got burned.
  
  --
  They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock
17. Re:Emergency Plan by Slashdot+Parent · 2011-04-22 03:31 · Score: 1
  
  Just using Amazon West as well as Amazon East would have saved customers from this outage.
  This is hindsight talking.
  AWS has always maintained that deploying an app across multiple Availability Zones was sufficient for High Availability. They introduced Regions for geographical reasons (reduced latency, compliance with EU data laws, etc.), not for HA reasons.
  What happened yesterday should not have happened. It is one big, giant egg in the faces of AWS's EBS team.
  
  --
  They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock
18. Re:Emergency Plan by Slashdot+Parent · 2011-04-22 03:36 · Score: 2
  
  It's also worth pointing out that all cloud SLAs are basically useless: if Amazon falls below their advertised uptime they'll refund you some of your charges - but they'll never refund more than what you've paid them: they don't compensate you for all the money you're losing (and the AWS charges are likely pocket change compared to this)
  FYI, I don't think this outage even falls under EC2's SLA. The Region was still technically on line. Only EBS was down.
  Granted, many customers depend heavily on EBS, but the SLA doesn't cover an outage in just one specific EC2 feature. That being said, I wonder if AWS will honor SLA claims anyway, as a PR move. This outage is just so clearly Amazon's fault: a network hiccup causes EBS to overload in one Availability Zone, which cascades into all Availability Zones in the Region.
  Personally, I think that they should honor SLA claims. But you're right, any money recovered would be chump change compared to the cost of the downtime.
  
  --
  They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock
19. Re:Emergency Plan by ron_ivi · 2011-04-22 03:56 · Score: 1
  
  That's a good point.
  I'm surprised that when they launched different Regions they didn't advocate them for High Availability -- if only to protect from tsunamis, earthquakes, terrorists, etc that can wipe out a whole state's infrastructure.
  If they should be most embarrassed of one thing, I'd say that not spinning Regions into their High Availability PR would be the biggest one.
20. Re:Emergency Plan by Slashdot+Parent · 2011-04-22 04:54 · Score: 1
  
  If they should be most embarrassed of one thing, I'd say that not spinning Regions into their High Availability PR would be the biggest one.
  I suppose that's true, but in the end, what's the difference between deploying your app across Regions and deploying your app across AWS's competitors? One of the biggest value-adds that AWS provides, as far as I'm concerned, is the ability to do scaling, HA, and DR with roughly zero effort.
  Need more capacity? You can clone a running server from a consistent snapshot.
  Need HA? Spin up the clone in a different Availability Zone.
  Need DR? Take automated consistent snapshots of your running server at whatever interval is appropriate for your application.
  So, sure. They could just update their best practices and say, "Yeah, you should really be multi-Region if you want to be HA". But that would leave a lot of implementation work to the customers. And if I have to implement it myself, do you really think my HA site will be hosted with Amazon? Hell, no. If I'm going to go to the trouble of custom-developing a solution, the HA site will be with a different provider since that even further reduces my risk.
  This outage exposed a huge wart in EC2's AZ isolation. The root cause of this was a network fault that caused EBS to fail in 1 AZ. As far as I'm concerned, I'm totally cool with that happening. Stuff happens in the datacenter, and I get that. But when a network fault in one AZ takes out an entire Region, that shows that there is insufficient isolation between AZs, and I am definitely not cool with that.
  As far as I can see, a useful option for AWS (wholly aside from fixing their isolation architecture) would be to introduce the ability to copy an EBS snapshot from one Region to another. That would have helped out a lot of their customers who got caught by this, because while we were fully ready to launch our applications in us-west-1, we couldn't get our data out of us-east-1.
  I was fairly close to restoring an older backup to us-west-1 anyway when AWS got an AZ online for EBS-backed instances. At that point, I was able to get back online without too much fuss.
  Oh well, those are my thoughts. As a result of this, I may do more frequent off-site backups, but I still maintain that I shouldn't have to. I was using AWS infrastructure in the manner that they recommend, and yet I still suffered a 5-hour downtime.
  
  --
  They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock
21. Re:Emergency Plan by xelah · 2011-04-23 01:21 · Score: 1
  
  RackSpace's cloud is not PCI DSS compliant, whereas Amazon's is. If you process credit cards then you can't do it in RackSpace's cloud (although you can on their dedicated servers). IIRC, Verizon's is, though. Of course, depending on what you do, it's not necessarily very straightforward to replicate your systems across the two and there's a lot of extra cost. I agree with the organization-as-a-point-of-failure argument, though. Never mind court orders, you could also suffer from simple administrative mess-ups (like someone not paying the bill or some sort of dispute over money owed or terms of service) or the compromise of your account with them.
Disaster Recovery by trainwrek · 2011-04-21 04:11 · Score: 1

This is why, regardless of whether you're in the cloud or not, you need to have the ability to fail over to multiple datacenters in different geographical locations. Availability Zones are good but don't cut it. Unfortunately, Amazon doesn't make transferring backups between regions easy or cheap.
Not so bad.. by kevinNCSU · 2011-04-21 04:12 · Score: 1

I was wondering why it took longer to start up my hadoop cluster this morning on EC2, but it still beats the living hell out of buying and configuring large numbers of machines for short term testing.
a downed cloud is fog by Anonymous Coward · 2011-04-21 04:17 · Score: 0

so we wait for the fog to lift...
cloudfail by imp7 · 2011-04-21 04:17 · Score: 1

We're now approaching our final destination, a datacenter of the future where nothing can possi-blye go wrong. Er, possi_bly_ go wrong. Heh, that's the first thing that's ever gone wrong.
Judgement Day by treerex · 2011-04-21 04:22 · Score: 1

Hmmmm... today *is* Judgement Day... perhaps Skynet's first target is AWS's East-Coast data center. Coincidence? I think not.
1. Re:Judgement Day by futuramarama · 2011-04-21 04:54 · Score: 1
  
  Well, the Cloud is a SKY-themed interNET system
  
  --
  "And that solves the mystery of the missing ring" - Bender
2. Re:Judgement Day by Anonymous Coward · 2011-04-21 08:35 · Score: 0
  
  It's only Judgement Day for the tv series timeline. The original movie had Judgement Day on August 29th, 1997.
  (source: http://en.wikipedia.org/wiki/Skynet_%28Terminator%29)
6 weeks before the AWS summit 2011 by grapeape · 2011-04-21 04:26 · Score: 3, Interesting

Gotta wonder what kind of flack Amazon is going to take for this one. I've had a couple clients looking into cloud services including moving to AWS and have already had one of them call me and cancel a meeting about it. While I understand stuff happens, the entire sales pitch for AWS was redundancy and build as you grow. Redundancy has obviously not worked in this case, while I usually support cloud services, this is definitely going to be a hard example to counter when trying to sell it to potential customers.
1. Re:6 weeks before the AWS summit 2011 by darjen · 2011-04-21 04:29 · Score: 0
  
  Even with this, Amazon still probably has more uptime than somebody managing their own servers, especially the larger you get. It's pretty short sighted to simply dismiss them out of hand because of one incident.
2. Re:6 weeks before the AWS summit 2011 by Synn · 2011-04-21 04:32 · Score: 1
  
  "Redundancy has obviously not worked in this case"
  Only 1 region is effective. If your app was set to work with multiple zones then it likely wouldn't be impacted by this outage.
  The thing with EC2 is it gives you the tools to build complex clusters. It doesn't do it for you.
3. Re:6 weeks before the AWS summit 2011 by TooMuchToDo · 2011-04-21 04:39 · Score: 4, Informative
  
  It's not short sighted at all. When someone else runs your gear, all you can do is sweat until they get things back online, and they can take their time under what's known as "commerically reasonable SLAs". When you own your own gear, your own colo, etc., how much effort you use to get back up and running is up to you.
  "The Cloud" for mission critical businesses is a joke.
4. Re:6 weeks before the AWS summit 2011 by grapeape · 2011-04-21 04:55 · Score: 1
  
  I understand that but it still makes it a hard sell in the short-term until this all blows over.
5. Re:6 weeks before the AWS summit 2011 by darjen · 2011-04-21 04:57 · Score: 1
  
  For a small or medium size business, it could very well take massive amounts of effort and cost to keep your servers going full time. For many people it probably makes sense to outsource that function to dedicated engineers, rather than having to hire and manage your own.
6. Re:6 weeks before the AWS summit 2011 by Anonymous Coward · 2011-04-21 04:59 · Score: 0
  
  Its half of one region of AWS that is out. The redundancy is there, but too many companies punt on actually taking advantage of it. Those are the ones that are down.
  Those of us with our resources distributed across service regions had an almost unnoticeable hiccup.
7. Re:6 weeks before the AWS summit 2011 by davidbrit2 · 2011-04-21 05:10 · Score: 1
  
  It's even nicer working at a place that sells used/refurb IT gear. Main file server is down? No sweat, I'll just stroll out to the warehouse, grab a new RAID controller, and be up and running again in ten minutes. (Yes, we've had that sort of thing happen - hardware failure is just about the least of our worries around here when we have a spare for nearly everything in every one of our servers.)
8. Re:6 weeks before the AWS summit 2011 by Anonymous Coward · 2011-04-21 05:31 · Score: 0
  
  It just goes to show the serious problems resulting from hiring unqualified minorities under the guise of "affirmative action".
  If they had hired its workforce base on talent and merit, this embarrassing episode would never had occurred.
9. Re:6 weeks before the AWS summit 2011 by Anonymous Coward · 2011-04-21 06:59 · Score: 0
  
  Sorry to be a grammar Nazi, but it's "flak" as opposed to "flack". "Catching flak" originates from from ground anti-aircraft fire in WW2. Flack is not a word.
10. Re:6 weeks before the AWS summit 2011 by The+End+Of+Days · 2011-04-21 07:18 · Score: 1
  
  You've made it pretty clear you have no actual understanding of what's going on here.
  What surprises me is that you got modded up. I miss when you could count on Slashdot people at least understanding technical issues. Now it's a crapshoot.
11. Re:6 weeks before the AWS summit 2011 by TooMuchToDo · 2011-04-21 07:39 · Score: 1
  
  Dude, I used to help run a Tier-1 CMS data facility for the LHC. I've done IT for the better part of 14 years. I know exactly what the fuck is going on here. Amazon sells people on the fact that you "put everything in the cloud" and you won't have any problems. Then problems occur and it's all *shrugs, shit happens*.
  Fark. off.
12. Re:6 weeks before the AWS summit 2011 by BitZtream · 2011-04-21 07:52 · Score: 1
  
  My company has about 10 servers that have had last visible downtime in 10 years than Amazon has today alone.
  Its not really hard, it just requires skill, know how, and vigilance.
  
  --
  Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
13. Re:6 weeks before the AWS summit 2011 by Anonymous Coward · 2011-04-21 07:59 · Score: 0
  
  Only 1 region is effective. If your app was set to work with multiple zones then it likely wouldn't be impacted by this outage.
  The thing with EC2 is it gives you the tools to build complex clusters. It doesn't do it for you.
  Quite right - if you were clustered behind load balancers in both US-WEST and US-EAST regions (for instance), you'd still be up and running. Your EAST instances would be offline but your WEST instances would be handling the load.
14. Re:6 weeks before the AWS summit 2011 by codepunk · 2011-04-21 08:03 · Score: 1
  
  I have machines in the effected zone, not a problem with them. If I had a problem it would have had no impact since I spun up a full standby deployment in the west coast data center.
  
  --
  
  Got Code?
15. Re:6 weeks before the AWS summit 2011 by grapeape · 2011-04-21 08:53 · Score: 1
  
  I should have caught that considering I worked on a bombers in the AF, but thank you. BTW, you only need one "from" between originates and ground.
16. Re:6 weeks before the AWS summit 2011 by Slashdot+Parent · 2011-04-22 03:54 · Score: 2
  
  Only 1 region is effective. If your app was set to work with multiple zones then it likely wouldn't be impacted by this outage.
  Not true. My application works just fine in multiple Availability Zones, yet it was knocked out yesterday due to an entire Region getting knocked offline.
  And before you tell me that the application should have been multi-Region, I'm not buying it. AWS has always maintained that deploying an app across multiple AZs is HA. AZs are supposed to be considered as separate datacenters: separate power, separate uplink, etc. And yes, separate EBS infrastructure (you can't attach an EBS volume to an instance that was launched in a different AZ). Multi-Region is for geographic reasons (reduced latency, compliance with EU data laws, etc.) or Disaster Recovery.
  In yesterday's case, a network hiccup triggered EBS to eat itself in one AZ. Fine, I'm totally cool with that. I understand that stuff happens. But for that EBS failure to bring EBS down in all Availability Zones, I am absolutely not cool with. That that happened reveals a serious architectural flaw in the supposed isolation between AZs. Make no mistake about it, it is a huge egg to the face of AWS's EBS team.
  Would making my app multi-Region have saved my bacon? Sure. And so would have deploying across multiple providers, etc. But the point is, I shouldn't have to do that. AWS told their customers that we don't have to do that. So as far as I'm concerned, nobody gets to say, "Well, you should have been multi-Region." That's just hindsight's 20/20 vision talking.
  Personally, it didn't take much effort to get my app back online. Most of the effort was me trying to decide whether or wait it out or go into DR mode. Around the time I decided to go ahead and restore in us-west-1, AWS got EBS-backed instances working in an AZ, so I just relaunched in us-east-1. In all, I didn't lose much. But some people really go hosed by this, and I can't say I blame them for being upset. They did the Right Thing, and they still got hosed.
  
  --
  They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock
17. Re:6 weeks before the AWS summit 2011 by Slashdot+Parent · 2011-04-22 03:55 · Score: 1
  
  I have machines in the effected zone, not a problem with them. If I had a problem it would have had no impact since I spun up a full standby deployment in the west coast data center.
  Only EBS volumes and EBS-backed instances were affected.
  
  --
  They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock
18. Re:6 weeks before the AWS summit 2011 by Anonymous Coward · 2011-04-22 08:41 · Score: 0
  
  I'm starting to get driven a little nuts about this error; you mean "affected" and you do not mean "effected" here. Try this it may help.
19. Re:6 weeks before the AWS summit 2011 by xelah · 2011-04-23 01:33 · Score: 1
  
  You missed 'luck'. What happens when your data centre catches fire, has an Internet outage or has it's generators fail when the power goes out? You can use multiple data centres, but you can do that with AWS, too, and you wouldn't have been affected. I've known two of those three things happen to our data centre in London.
Soo... by Syberz · 2011-04-21 04:33 · Score: 1

Is anybody else suffering from Reddit withdrawal?

--
~Syberz
1. Re:Soo... by Anonymous Coward · 2011-04-21 04:40 · Score: 0
  
  Only reason I am here...
2. Re:Soo... by Anonymous Coward · 2011-04-21 04:45 · Score: 0
  
  Is anybody else suffering from Reddit withdrawal?
  What is that sound? Crickets?
3. Re:Soo... by Anonymous Coward · 2011-04-21 05:46 · Score: 0
  
  Nope.
4. Re:Soo... by Anonymous Coward · 2011-04-21 06:14 · Score: 0
  
  Yes.
5. Re:Soo... by HelioWalton · 2011-04-21 07:08 · Score: 1
  
  See first post.
Inappropriate metaphor - the cloud by NicknamesAreStupid · 2011-04-21 04:38 · Score: 1

It means inclement weather; it rains; it pours; it delays air traffic; it's gloomy. You can look up at it and see whatever you can imagine, but it is not real. It goes away when you most need it. It is all wet.
1. Re:Inappropriate metaphor - the cloud by Bloodwine77 · 2011-04-21 04:43 · Score: 1
  
  My guess is "cloud" is used because networking diagrams have historically used a cloud icon for the internet to mean it was nebulous and alien to the network.
player by callmebill · 2011-04-21 04:39 · Score: 1

Uh-oh... This might be my fault. I've been loading music into my Amazon Cloud Player. Sorry guys.
*sigh* by Grey+Dragon · 2011-04-21 04:49 · Score: 1

damn. Now I feel uninformed and useful to society.

--
If at first you don't feel good.... suffer like the rest of us.
No problem if your prepared. by Anonymous Coward · 2011-04-21 04:53 · Score: 0

It affected me, so I just brought down the instances and databases on the east and launched them in the west coast datacenter. We are always in a position to redeploy elsewhere. Some good scripting and bootstrapping and events like this are not an issue. To believe that just cause its in the cloud means it will never fail is to not understanding what the cloud means. For me it means that when one area goes down, I can come up somewhere else at-will.
Also, it seems from what I can tell only zone 'a' is getting the worst of this down-time.
bean counts screw us again! by Thud457 · 2011-04-21 04:57 · Score: 1

Cheesus Xist! Backup generators taken out again?!! After Chernobyl and Fukashima, I'm starting to thing these " nuclear engineers " aren't rocket surgeons .

--
the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff
1. Re:bean counts screw us again! by rajeevrk · 2011-04-21 16:41 · Score: 1
  
  Cheesus Xist! Backup generators taken out again?!! After Chernobyl and Fukashima, I'm starting to thing these " nuclear engineers " aren't rocket surgeons .
  Rocket Surgeons!!! Now they are real men... Nothing compares with having to run up to the launch platform at T-2mins with a blowtorch and wire cutters to *nurse* a sputtering fuel nozzle back to white-hot health. Be a real man, operate on your first rocket today, at the NASA Rocket Surgeons Training Institute, The only institute in the world with a 50% passing rate and only a 35% fatality rate! Contact your nearest Space cowboy recruiter TODAY!!!!
  Seriously even :D
Factor of safety by Anonymous Coward · 2011-04-21 05:03 · Score: 0

Is less than 2 (that is, over 50% capacity)? Or bad code?
"A networking event early this morning triggered a large amount of re-mirroring of EBS [Elastic Block Storage] volumes in US-EAST-1. This re-mirroring created a shortage of capacity in one of the US-EAST-1 Availability Zones, which impacted new EBS volume creation as well as the pace with which we could re-mirror and recover affected EBS volumes.
They couldn't handle recovering from one Zone going down? Or, it tried to recover from a blip so fast that it re-mirrored volumes more than once?
Either way, it's pretty bad for Amazon. I hope they learn a lot from this.
TMZ is one site. (Co worker's main news site) by Bobzibub · 2011-04-21 05:30 · Score: 1

Their error page is rejected by firefox. So I wgetted it to see why.
At the bottom is a script from RUSSIA (in my best Max Headroom voice) (the src is addonrock.ru/Templatel.js)
So perhaps AWS is hacked?
I'm selling a new product by countertrolling · 2011-04-21 05:42 · Score: 1

It's a way of backing up your 'cloud' data locally
the cloud... What kind of bullshit is this?

--
For justice, we must go to Don Corleone
to the cloud! by MECC · 2011-04-21 05:43 · Score: 1

just like the ad...

--
"We are all geniuses when we dream"
- E.M. Cioran
So... by Anonymous Coward · 2011-04-21 05:44 · Score: 0

Does this mean that...it's foggy on the intarwebz?
Anonymous!! by Anonymous Coward · 2011-04-21 05:48 · Score: 0

It was 4chan, finally exacting their revenge.
IT vendors always get the blame by gtirloni · 2011-04-21 05:56 · Score: 1

Everybody knows that datacenters WILL go down sometimes. Amazon offers availability zones not so you can cherry pick the one you feel good about (that too)... but because you have the option to spread your operation across different zones and be less impacted when shit happens. Of course those hip&cool app developers didn't think of that, right? So Reddit and a bunch of well known companies deployed everything in just one availability zone and hoped for the best. I guess they didn't even think about availability at all ("Amazon will take care of it"). If the engineers working at these companies had spend a single minute thinking about it they'd have figured it out. Amazon can't do this work for them... but perhaps they should add a checkbox to the contract "Have you developed your app so it can be deployed over at least 2 different availability zones?" and make that mandatory.

--
none
1. Re:IT vendors always get the blame by HomelessInLaJolla · 2011-04-21 07:43 · Score: 1
  
  but because you have the option to spread your operation across different zones and be less impacted when shit happens
  Amazon is charging customers to be the motivating force behind an enormous experiment designed to approximate financial markets.
  The lottery allows people to pay to assist the lottery owners in designing a randomized system which is also designed to approximate financial markets.
  The average consumer appears to be paying for the privelege to assist the financial institutions in gathering the data required to reliably predict financial markets. Brilliant.
  
  --
  the NPG electrode was replaced with carbon blac
Next on slashdot by ideaz · 2011-04-21 05:57 · Score: 1

One slashdotter writes: "The damage due to the Amazon services downtime has been estimated to be $, as reported by companies affected."
Change.org sez it's China by K8TIY · 2011-04-21 06:14 · Score: 1

Change.org (which Amazon supposedly hosts) claims it's due to a Chinese attack (this from an email I received):
Here's how we know it's really gotten Beijing’s attention: For the past four days, the Change.org website has been repeatedly targeted by cyber attacks coming from China that aim to bring our site down, which would keep people from signing the petition.
1. Re:Change.org sez it's China by Mephistophocles · 2011-04-22 00:58 · Score: 1
  
  Unsubstantiated gossip. I host a fairly large network at AWS and the damn Chinese try to brute force it continuously. Fortunately, they're morons; all they do is run dictionary attacks on the login "admin" (which doesn't exist for any protocol on any server I host). If that's the best China can do, we don't have much to worry about.
  
  --
  Deja Moo: The distinct feeling that you've heard this bull before.
Coincidence, PSN? by grikdog · 2011-04-21 06:48 · Score: 1

Does Sony's PSN sublet capacity on Amazon's cloud? PSN is down for "a day or two" according to stuff on Google.

--
``Tension, apprehension & dissension have begun!'' - Duffy Wyg&, in Alfred Bester's _The Demolished Man_
Heroku down too by Anonymous Coward · 2011-04-21 06:59 · Score: 0

Someone might have mentioned this, but this has taken Heroku out of service too.
David
a mere two days late Amazon goes self aware by dyshexic · 2011-04-21 07:02 · Score: 1

expect massive sale on roomba's and net connected kindles in the next few days
Did Skynet destroy zone us-east-1c? by turbogizzmo · 2011-04-21 07:48 · Score: 1

Muuu hahahahaha! https://forums.aws.amazon.com/message.jspa?messageID=238872#238872
Skynet? by Anonymous Coward · 2011-04-21 10:59 · Score: 0

Was it skynet?
Are you sure?
The ICON dude.... by Anonymous Coward · 2011-04-21 16:51 · Score: 0

Great Slashdot icon dude...raining bits from the cloud. Sorry it could not have happened to a nicer bunch of creeps.
So much for the cloud...
And on a related note:
"Asked whether the official policy was wrong, Vice Principal Yoshiki Sugawara said, “No. The problem was, the tsunami was too high."
How to avoid Amazon type outages by Anonymous Coward · 2011-04-22 03:11 · Score: 0

Can't rely on just one cloud vendor. Check out this simple animation that shows how to avoid these types of problems: You want to look at the "Complete in the cloud IT Organization" at the link below. http://www.batblue.com/usecases.php?first=499