More Uptime Problems For Amazon Cloud
1sockchuck writes "An Amazon Web Services data center in northern Virginia lost power Friday night during an electrical storm, causing downtime for numerous customers — including Netflix, which uses an architecture designed to route around problems at a single availability zone. The same data center suffered a power outage two weeks ago and had connectivity problems earlier on Friday."
Nuf said
For me, it is far better to grasp the Universe as it really is than to persist in delusion
I live in the affected area and that's what they're saying. May take 7 days for the last person to have their power restored.
We need to invest trillions in roads, water, and electrical infrastructure to keep this country going.
If you let the basic building blocks of civilization rot, don't be surprised when everything else follows suit.
[Fuck Beta]
o0t!
It seems that recently, anything can take down the cloud, or at least cause a serious disruption for any of the major cloud providers. I wonder how many more of these it takes before the cloud-skeptics start winning the debates with management a lot more often.
You can only argue that the extra costs and admin involved with cloud hosting outweigh the extra costs of self-hosting and paying competent IT staff for so long. If you read the various forums after an event like this, the mantra from cloud evangelists already seems to have changed from a general "cloud=reliable, and Google's/Amazon's/whoever's people are smarter than your in house people" to a much more weasel-worded "cloud is realiable as long as you've figured out exactly how to set it all up with proper redundancy etc." If you're going to pay people smart enough to figure that out, and you're not one of the few businesses whose model really does benefit disproportionately from the scalability at a certain stage in its development, why not save a fortune and host everything in-house?
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Cloud computing is nothing more than 1960s timesharing services with modern operating systems. Unless you design for resilience, you're not resilient to problems.
So this is the second time this month Amazons cloud has gone down, there should be serious questions being asked of the sustainability of this service given the extremely poor uptime record and extremely large customer base.
They would have spent millions of dollars installing diesel or gas generators and/or battery banks and who knows how much money maintaining and testing it, but when it comes time to actually use it in an emergency, the entire system fails.
You would think having redundant power would be a fundamental crucial thing to get right in owning and operating a data centre, yet Amazon seems unable to handle this relatively easy task.
Now before people say "well this was a major storm system that killed 10 people, what do you expect", my response is that cloud computing is expected to do work for customers hundreds and thousands of kilometres/miles from the actual data centre so this is a somewhat crucial thing that we're talking about - millions of people literally depend on these services; that's my first point.
My second point is it's not like anything happened to the data centre, it simply lost mains energy. It's not like there was a fire, or flood, or the roof blew off the building, or anything like that; they simply lost power and failed to bring all their millions of dollars in equipment up to the task of picking up the load.
If I were a corporate customer, or even a regular consumer I would be seriously questioning the sustainability of at least Amazons cloud computing, Google and Facebook seem to be able to handle it but not Amazon - granted they don't offer identical products the overall data centres seem to stay up 100 or 99.9999999% of the time unlike Amazons.
However "Netflix, which uses an architecture designed to route around problems at a single availability zone." seems to have efficiently spread the pain of a North Eastern outage to the rest of the country. Sometimes I think redundancy in solutions is better left turned off.
Nullius in verba
http://www.pepco.com/home/emergency/maps/stormcenter/
-- IANAL, this isn't legal advice, and definitely isn't legal advice for you. Also, Squee!
Instagram's servers in that cloud server were also affected, and more people griped about that on my facebook feed than netflix.
as for "an electrical storm", that's a bit of an understatement. The issue was actually more the 80 mph wind gusts as well as the lightning continuing on for 2 hours after the wind and rain had passed (meaning crews couldn't get out there overnight).
The result is some 2 million people without power, 1 million around DC alone. Dominion Power (which services the area where the data center resides, about 5 miles from my house) lost power for more than half of its northern virginia customers, and even now has only restored power to about 60,000* out of 461,000 that lost it. On the Maryland/DC side of the potomac, half a million people may be without power for days through a 100 degree each day heat wave (and more storms like last nights coming...).
* fortunately that would include me...though i'm writing this via my sprint phone as a wifi hotspot 'cause our cable modem is still down ;-)
"But remember, most lynch mobs aren't this nice." (H.Simpson)
-- Joe
it seems like the switching system failed and or the back up power generators did not kick on.
Maybe natural gas ones are better. The firehouses have them. I also see them at a big power sub station as well.
I was in it - it was not a particularly bad storm. Heavy winds, lots of cloud-to-cloud lightning, but very little rain or cloud-to-ground lightning. I lost power repeatedly, but it was always back up within seconds. And I'm located way out in a rural area, where the power supply is much more vulnerable (every time a major hurricane hits, I'm usually without power for about a week - bad enough that I bought a small generator).
According to TFA, they were only without power for half an hour, and that the ongoing problems were related to recovery, not actual power-lossage. So their problems are more "bad disaster planning" than "bad disaster".
Still, you'd think a major data center would have the usual UPS and generator setup most major data centers have - half an hour without power is something they should have been able to handle. Or at least have enough UPS capacity to cleanly shut down all the machines or migrate the virtual instances to a different datacenter.
Which is the problem. Not the power outage itself. ... in 30 minutes, 1hour... alright, but 9 freakin' hrs ?
If the power outage happened, and the servers where back let's say
In my specific case I didn't suffer as much because I have another instance in different zone with db replication and all that, serving as a backup server, and my project there, although very critical (20 people are getting wages out of it) is very low on resource usage... I can imagine there where quite a lot of people that lost quite a lot of money because of this. It's really unacceptable for a DC to have a 9 hrs downtime, whatever the reason is... because.. that's just the standard people are used to.
I never experienced anything like this at any other company in the last 10 years I'm working as a linux admin.. although at all those companies, I used real servers.
Because that cable operator also provides phone service.
The revolution will be mocked
Didn't you get the memo? Netflix barely runs now and this is working as planned. Time Warner had four internet outages in Raleigh THIS WEEK.
Everything everywhere is slowly grinding to a halt. So let's send more work to China and India. Who cares anymore.
"No, if you are a professional stuff doesn't 'happen'"
No, if you are a professional you evaluate risks and adjust your behaviour to an acceptable level and you don't expend a bazillion to protect half a bazillion.
In example, Google designed their applications in a way that stand for a failing server: what's the benefit in their case going with RAID10, doubled PSUs and hot swappable RAM and CPUs? What gives to the table but lost money?
Amazon offers out of Fortune 100 people the ability to do the same, only at the datacenter level. But then, if you can stand a whole datacenter failure by properly using the services they offer, what's the advantage of making the expenditure of making their datacenters five nines instead of four?
"They are still amateurs"
They are there for the money and they are making a lot of money: that's what make them professionals.
I'll tell you who's being unprofessional: all those that think that their critical services are propely protected within a single datacenter just because they read it was "the cloud" in a colourful brochure.
What are you, 14? Democracies don't like War, because they don't like their sons, fathers, brothers, and husbands getting killed. It generally takes quite a lot to motivate Democracies into war, because of the hatred of casualties. Even when it is the best option. Example: going to war against Hitler in 1934, or 1936, or in 1938.
Out here in the real world, the sum total of human experience suggests a strong military is like insurance or a seat belt. You hope you never have to use it, but its a godsend if you need it. Indeed having a strong military deters attacks. Nobody goes down to Venice Beach to pick fights with body builders, or down to the Gracie's gym to start fights.
Like insurance, working out, eating right, avoiding bad areas, a strong military is a pain in the ass. It costs a lot. It is a pain and non-productive to maintain. And sure, you could save a lot by going without auto or health insurance. You could eat more cheaply at McDonalds than cooking healthy meals at home. Its cheaper to live in the ghetto than a nice area.
As far as market value of defense stocks, the market capitalization of Lockheed Martin is 28.27 Billion, of Apple Computer 546.08 Billion. The market value of L'Oreal at 54.83 billion is about twice that of Lockheed Martin, suggesting lipstick pays a lot more than military avionics. Defense firms since their inception have been very cyclical, made relatively little money, and are merging like crazy as war spending winds down. But unless you're going to change human nature with Harry Potter's magic wand, carrying otherwise unprofitable defense firms is worth it because making drones, airplanes, missiles, tanks, ships, and helicopters to kill well-armed enemies is a very narrow engineering niche with knowledge quickly lost.
As soon as your computer runs on unicorn farts and rainbows, we can all forget about dominance in the Persian Gulf and other oil areas. Until then, I'd prefer to drive to work and run the AC not live like a dirty smelly hippie. That AC making life bearable in 118F Kansas? Runs on oil not tree-hugging and drum circles.