Lightning Strike KOs Amazon, Microsoft EuroClouds
1sockchuck writes "A lightning strike has caused power outages at the major cloud computing data hubs for Amazon and Microsoft in Dublin, Ireland. The incident has caused downtime for many sites using Amazon's EC2 cloud computing platform and Microsoft's BPOS (Business Productivity Online Suite)."
...nature wins?
Shocking
Did anyone vet that abbreviation before they launched it?
I see how it is. Verizon workers go on strike, MSFT and Amazon gotta call in for something strike-related that's bigger and flashier. Show-offs.
There's a spot in User Info for World of Warcraft account names? Really?
Considering that my radio stations have been getting hammered for weeks now by this horrible weather in the Southern United States, my sympathies are with them.
I don't care how much protection you put on your system (and when you have giant lightning rods that are hundreds of feet tall, like we do, you DO try to protect things), an occasional strike is going to slip through. When it does it can get ... messy. :)
Cogito, igitur comedam pizza.
I don't know a lot about EC2, but I believe they lose all data when they're powered down, unless special provisions are taken, don't they?
This could result in another series of fuckups like this, where a bitcoin exchange lost its wallet.dat due to a misconfigured EC2 instance.
worldmobilenet.com -- World Prepaid Wireless Internet plans
Sounds about like
http://xkcd.com/908/
In my capacity as a Certified Solution Architect(tm), I often warned that The Cloud was suitable only for dynamic workloads. But did you listen? Oh, no, you just went and let your static workloads build up in the Cloud, increasing TCO and, now, bringing down Disaster on your heads!
for Bpos users - the change is probably not noticeable anyway.
Am I first ??
My understanding of the point of cloud computing was that it would be distributed. I.e. the failure of any one data or computing center would mean the data was still available. Hence, the term "cloud": nebulous, non-localized. Apparently, someone forgot to tell Microsoft and Amazon what the buzzwords they were using actually mean. I more or less expected that of M$, but the fact that Amazon failed too, well, thats pretty a little surprising. I guess it's kinda the norm for all large corporations.
Glancing at the article, it looks like this outage effected only a certain area, but still, cloud should mean other data centers would take over. I particularly love the quote "Dublin has become a key cloud computing gateway." If one city serves as a "gateway", its not a cloud system. I understand using it as one data center, but others should take over automatically for that area in case of a failure. If you don't have a failover system, you don't have a real cloud computing platform. You have a wannabe cloud computing platform. Or maybe they are just taking a buzzword and redefining it to suit their purposes. That's... exactly what we should expect, I suppose.
Or am I completely misunderstanding the meaning of this latest buzzword? It's quite possible, I never quite got down what "Web 2.0" was supposed to mean either. Beyond lots and lots of Flash.
"None can love freedom heartily, but good men; the rest love not freedom, but license." --John Milton
Such as Virtustream
Well, when there are clouds it often rains and occasionally they produce thunder and lightning. I guess Amazon though that they can have clouds without worrying about their byproducts.
We have all read about the difficulty and expense of providing reliability in the cloud (the so-called five nines) as well as the fact that as more popular web services rely on cloud platforms, the more people rely on those services. As such, I cannot help but wonder what kind of fallout will happen after this latest event, but I do get the feeling that this "Lightening strike" may erode the vCloud marketing of 5x9's uptime just a wee bit more.
at that level the safety's tipped foreing a manual cut over to backup. Also Surge Protectors can't really take a direct lighting strike.
But any ways for the cloud to work you will need data centers all over the place for good lag and for back up. Now Dublin is good for not needing lot's of cooling but still you don't want to put all the severs in one area so other stuff like over seas data line cuts can't take down systems far away. If on side severs at least you can get some work done with no or a slow back up link to the out side.
... And if we aren't 1000%, absolutely, positively reliable may God Strike Us... BLAM!!!!!!
Have you ever seen a surge protector after a direct strike? The MOVs don't help much once they vaporize.
A surge protector is mostly useful against the more common near misses.
Office 364
For justice, we must go to Don Corleone
So you're saying there wasn't a cloud in the sky, huh?
Wuddooeyeno? IITYWYBMAD? Like nuts? eclecticallyincorrect.com
Read the article. Please?
Those massive data centers only existed because Microsoft and Amazon channeled profits through Irish subsidiaries to avoid US taxes. They serve some legitimate functions for customers in the UK as a matter of convenience (why build two data centers?), but they're primarily money laundering centers.
I'd call a few lightning strikes the least of the punishments those data centers - and the entire infrastructures to which they're attached - really deserve.
Comment removed based on user account deletion
While working at Chevron Oil Pascagoula Mississippi refinery, I noted Chevron had the same problem. Loss of electrical power to the refinery would be catastrophic. No one wants to be around tons of petrochemical products undergoing serious chemical reactions when one loses control.
To mitigate this threat, Chevron worked with Mississippi Power to operate a power generation facility at the refinery.
I would think that anywhere there is a substantial "data processing farm" with critical power requirements, business arrangements should be made with the power generation utilities to run a natgas power plant in the immediate area.
The utilities often run these plants as "topping" plants, as they needed anyway to even out short-time load variances on the line.
But, in the event of a serious loss of grid power, it can be awful handy to have a few megawatts of power coming from down the street.
"Prove all things; hold fast that which is good." [KJV: I Thessalonians 5:21]
Have you ever seen a surge protector after a direct strike? The MOVs don't help much once they vaporize.
It's amazing how many people who call themselves engineers don't get that surge suppressors are non-resettable.
"what do you mean the power supplies got destroyed. We've had surge suppressors in there for years..."
---
"I can't complain, but sometimes still do..." Joe Walsh
When you buy a surge protector, what you're really buying is that little insurance policy that you're supposed to fill out. That's it.
Life is not for the lazy.
Cloud on cloud violence like this cannot be tolerated!
I just love how Mother Nature gives Mankind a spanking now and again, just to remind us who is boss.
...where life is grand and all your apps are immune to...BzzzzzZZZZZT!
Blue Pane of Sin?
This is not the first time -- http://www.datacenterknowledge.com/archives/2009/06/11/lightning-strike-triggers-amazon-ec2-outage/ .
Distribution means your virtual machine can be on a number of machines inside a cloud. There's nothing in the definition of a cloud that says it has to be in different locations, or running mirrored copies of your instance. Sure, it's possible, just like it's possible with single machines. When will people stop assuming that "cloud" means "indestructible"? This is exactly what happened before with EC2 and lots were hurt then by the same assumption.
I was promised a flying car. Where is my flying car?
Also Surge Protectors can't really take a direct lighting strike.
But lightning arrestors can. A serious lightning arrestor is a spark gap (sometimes open air, sometimes in an inert gas) to ground, with a very heavy cable or busbar to multiple ground rods, and no sharp turns in the path to ground. This is followed up by an inductor which is a few turns of busbar. This gear is usually placed where power lines or antenna feeds enter a building. MOV-type protection is further downstream.
Antenna towers are struck by lightning frequently, and the associated radio gear routinely continues to operate. This isn't rocket science. It's big hunks of copper.
The Hartford Steam Boiler Inspection and Insurance Company, in their publication "The Locomotive" (they've been at this since 1867) has a good article on lightning protection. Hartford Steam Boiler insures not only against boiler explosions, but things like downtime due to lightning strikes. But only after their inspectors (they have 1200) have visited the plant and are satisfied with the equipment.
A question to ask your "cloud" provider - who handles your business interruption insurance, and do they inspect your faclities?
Murphy was an Irishman
Every Silver Cloud has a leather lining.
I can see where that possibility went over their heads.
Despite ALl the market-hype and brew-haha going on, the simple fact remains:
If ALL your computing power is in ONE SINGLE DATACENTRE then what you have is a DAMP SPOT not a CLOUD.
Visit CryptoGnome in his home.
All of the cloud computing hype has business everywhere, once again, buying what they don't understand just as they did during the dotCOM bubble. That particular bandwagon caused all sorts of damage to the industry including a flood of people unsuited to the line of work and suppression of wages that don't seem to have ever returned. Now business continues to crave cheap, Walmart-ized IT services and are seeking to get it any way they can; outsourcing to 3rd world nations and most recently entrusting cloud computing where they put not just all of their eggs, but the eggs of hundreds if not thousands of other businesses in one basket waiting for moments like this.
And it's not like these lessons were never made available in smaller doses. There has been more than one Blackberry service outage to highlight the fact that all Blackberry traffic passes through Blackberry servers and will halt when Blackberry fails. How much worse when you entrust whole parts of critical business functionality to "the cloud"?
Am I the only one that feels TFA isn't correct? From TFA:
Amazon said the power outage began at 10:41 a.m. Pacific time, with instances beginning to recover about two hours later at 1:47 p.m.
To me, that seems like 3 hours, not 2...
(see below) Why does synchronization have to be manual? Or is this a safety feature that the automatic synchronization decides to go offline if things get weird?
From the Article:
âoeNormally, upon dropping the utility power provided by the transformer, electrical load would be seamlessly picked up by backup generators,â Amazon said in an update on its status dashboard. âoeThe transient electric deviation caused by the explosion was large enough that it propagated to a portion of the phase control system that synchronizes the backup generator plant, disabling some of them.â
âoePower sources must be phase-synchronized before they can be brought online to load. Bringing these generators online required manual synchronization. Weâ(TM)ve now restored power to the Availability Zone and are bringing EC2 instances up.â
And there is the marketing bullshit revealed. All the promises of the cloud - down by one lightning strike.
Because, let's face it, the whole "cloud" thing as they sell it is just advanced virtual hosting with a different name. The only real cloud capabilities are those the big companies build for themselves, and they did things like that 10 years ago already, when nobody had ever heard the term "cloud" used in computing contexts.
In the end, it's about selling something to people who already have the older version and convincing them to buy the new one. So you give it a different name because a "new" product sells easier than the upgraded version of an "old" product.
Anyone remember when "Web 2.0" was all the hype? It really wasn't a 2.0 as we all know. There was nothing new in it, all components had been around for a long time. It was a conceptual bundle, but not a new version like the name suggested.But "we're doing more Javascript" now doesn't sell nearly as good as "we're moving to Web 2.0 now".
Assorted stuff I do sometimes: Lemuria.org
BPOS in Europe runs from Dublin as primary, Amsterdam as secondary.
Business Productivity Online Suite?
I always thought it stood for "Big Piece of Sh..... never mind
They should have had three generators!
McCloud? What does the Star Fox team have to do with any of this?
So... Zeus accomplished what Anonymous could not.
If Microsoft really wanted to cut costs, it would allow for up to five or six days a year of downtime and port its client software to a thin client comparable to a game console. Then it could be called Office 360.
Nature bats last.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Regarding Amazon, there was no cloud fail. If you have your instance in more than one availability zone (usually means datacentre) then you wouldn't have an outage. A cloud never implied redundancy like this, its an optitional feature.
You don't need to synchronize the phases if the system is designed right. Keep each generator separate (do not even try to parallel them). Have every machine covered by UPS/battery so they ride through the switching. In high redundancy cases, use an extra transfer switch for a pair of generators for each section. Only do open-transition power transfers (the lights blink out briefly during the change but the UPS/battery system keeps things running). For machines with multiple power input, split them across multiple UPSes for extra redundancy.
now we need to go OSS in diesel cars
It is important to have a backup plan no matter where you run your apps. Even owned and operated datacenters fail from time to time. http://networkingexchangeblog.att.com/technology/outages-happen-disaster-recovery-the-cloud-and-a-lesson-from-cycling/trackback/
Is this how Anonymous is going to attack Facebook?