Seattle Data Center Outage Disrupts E-Commerce

← Back to Stories (view on slashdot.org)

Seattle Data Center Outage Disrupts E-Commerce

Posted by ScuttleMonkey on Friday July 3, 2009 @05:11AM from the no-sigmas-for-you dept.

1sockchuck writes "A major power outage at Seattle telecom hub Fisher Plaza has knocked payment processing provider Authorize.net offline for hours, leaving thousands of web sites unable to take credit cards for online sales. The Authorize site is still down, but its Twitter account attributes the outage to a fire, while AdHost calls it a 'significant power event.' Authorize.net is said to be trying to resume processing from a backup data center, but there's no clear ETA on when Fisher Plaza will have power again."

15 of 118 comments (clear)

Min score:

Reason:

Sort:

Heh by MightyMartian · 2009-07-03 05:15 · Score: 5, Insightful

Redundancy ain't just a river in Egypt.

--
The world's burning. Moped Jesus spotted on I50. Details at 11.
1. Re:Heh by Anonymous Coward · 2009-07-03 05:41 · Score: 3, Informative
  
  It's interesting how many companies have assumed redundancy in place but never take the time to do proper testing. They figure that once a disaster happens, that everything will automatically work because their vendor or staff said so. To achieve true redundancy a company needs to do semi-frequent testing to ensure that everything is working properly. Authorize.net might have had what was assumed a redundant system in place, but once the disaster happen they soon realized their system wasn't designed or configured properly. It is expensive and time consuming to test redundancy, let alone actually paying for the redundant equipment/staff/etc, but in times like this it shows how one gets their moneys worth in doing so.
Failover Planning (and this broke FiOS too) by Cysgod · 2009-07-03 05:29 · Score: 4, Informative
Apparently Verizon has a single point of failure for much of its FiOS for the metro areas of Western Washington state in this building as well so the FiOS customers are offline as well right now.
- Clownshoes: Have no failover plan and be singly homed.
- Meh: Have a failover plan.
- Good: Have a failover plan that requires humans and exercise it regularly.
- Better: Have a failover plan that is automated and exercise it regularly.
- Best: Eliminate single points of failure so failover is turning off the flake or fail and going back to drinking a beer.
Hot/Hot is always a more ideal solution than Hot/Warm or Hot/Cold for disaster recovery (and increasing equipment utilization/ROI), and this event demonstrates why.
Fisher Plaza is a disaster response center by Anonymous Coward · 2009-07-03 05:41 · Score: 4, Informative

Fisher Plaza is supposed to be a regional telecomm / communications / medical care hub for the Seattle area. It was designed and built to *not* crash, even in a magnitude 9.5 quake. Sounds like they've got work to do ...
System failure by ErkDemon · 2009-07-03 05:44 · Score: 5, Informative

There are four main factors that can take a part of a society's key infrastructure offline.
1: ACTS OF GOD
Meteor strike, lightnight strike, extreme weather ...
2: ACTS OF MALICE
War, terrorism, extortion, employee sabotage, criminal attacks ...
3: WEAK INFRASTRUCTRUCTURE
Underpowered networks, inadequate UPS backups, skeleton staffing, the shaving of safety margins as an efficiency exercise, inadequate rate of replacing old hardware ...
4: MANAGEMENT ARSINESS
This is when a problem starts, and the people in charge either don't know how to react, don't care, or prioritise face-saving over actual problem-solving. This happens when you get an outage, and instead of system management promptly calling all their critical clients to inform them, and warn them that there's maybe twenty minutes of UPS capacity in the routers if the system's not fixed by then, they instead cross their fingers and hope that things'll work out, and worry about what to tell the clients afterwards.
Fisher Plaza seems to have suffered from a case of #4 recently, so it's not surprising that they've gone down again. The first time should have been the wakeup call to show them that their human systems were in need of an overhaul. Without that overhaul, you're setting up a dynamic in which the second time it happens, things are even worse (because now people are locked into defensive mode).
No matter how advanced your technological systems, if the people running it have the wrong mindset, you're gonna go down. And when you go down, you're gonna go down far far harder than necessary.

--
Eric Baird
Authorize.Net did have a backup by johnncyber · 2009-07-03 05:46 · Score: 3, Informative

...except it failed as well. From their twitter:

"@gotwww The backup data center was impacted too. Don't have info as to why. The team is solely focused on getting us back up for now."
1. Re:Authorize.Net did have a backup by ZorinLynx · 2009-07-03 07:44 · Score: 3, Interesting
  
  Sometimes folks set up a redundant system and forget to make one key piece redundant.
  Example: A server rack with two UPS systems. Each server has two power cords, one going to each UPS.. but the switch everything is plugged into only has one power input, so it's connected to UPS A.
  Power blinks and UPS A decides to shit itself. Rack goes down, even though all the machines are up, because the network switch loses power.
  Solution? An auto switching power Y-cable with two inputs, and one output. But 80% of people will be lazy and not bother. Oops.
  Happens all the time; I see it everywhere.
Geocaching.com too by dickens · 2009-07-03 05:56 · Score: 4, Informative

And on a holiday. Bummer. :(
The best line from the SANS ISC by Zocalo · 2009-07-03 06:00 · Score: 3, Interesting

The media are also following the story, KOMO a local station was knocked offline but are broadcasting from a backup site.

Way to go guys! At least two national, and maybe even international, ICT companies on whom numerous affiliates depend upon fail to provide for an adequate backup facility and continuity plan, yet the local AM radio station manages to pull it off. I'm guessing that some heads are gonna roll after the holiday weekend...

--
UNIX? They're not even circumcised! Savages!
Re:No Backup?? by Nutria · 2009-07-03 06:09 · Score: 4, Insightful

When this happens in this day and age the CIO should be fired!
And if the CIO recommended a redundant D.C. but the CEO, CFO or Board rejected it as "too expensive"????

--
"I don't know, therefore Aliens" Wafflebox1
Re:No Backup?? by sopssa · 2009-07-03 06:14 · Score: 3, Interesting

I know redundancy and such is better on business stuff, but this kind of reminds me of the fact how customer lines have lots of single failure points aswell. There was a day when TeliaSonera's, large nordic ISP, DHCP stopped working, leading 1/3 of the whole country's residents without internet access. Turns out there was a hardware failure on the dhcp server, leading me to believe that they actually depend on just one server to handle all the dhcp requests coming from customers. They did fix it in a few hours, but it was still unavailable for the rest of the day because hundreds of thousands computer's were trying to get an ip address from it. That being said, I remember it happening only once, but it still seems stupid.
Re:Oh, the humanity! by ErkDemon · 2009-07-03 06:56 · Score: 4, Insightful

Actually -- in a totally unconnected incident -- my grocery shopping was disrupted today because (according to the note pinned to the closed store's shutters) the store's till server was down, and they'd shut up the shop while they waited for an engineer.
I'm guessing that the server was probably local, possibly above the store, and might have gone fritzy in the heat.
So, real-world implications of computer failure. A server goes down, and suddenly Eric Cannot Buy Cheese ("Aaaaiiiieeee!"). Eric has hard cash, store (presumably) has cheese, but store can no longer sell cheese to Eric. Or anything else.
The shop "crashed".
Okay, so I trudged off and did my grocery shopping elsewhere, but it was a little disturbing to think that we've already gotten to the point where a server problem can stop you buying food, in a "real" shop, with "real" money.

--
Eric Baird
Re:sloppy engineering by eln · 2009-07-03 07:05 · Score: 3, Funny

Focusing on something that 99% of us screw up at one point or another, particularly when our primary focus at the time is probably getting the service back online rather than checking the calendar to see if it's Daylight Saving Time or not, for me is always a red flag that you're an insufferable pedant.
Re:sloppy engineering by Achromatic1978 · 2009-07-03 08:36 · Score: 3, Informative

Come on, the guy's sig is a link to some comic rant about "its versus it's" which, whilst it annoys me no end, is most definitely a good indicator that he is, no doubt, an insufferable pedant.
Re:Oh, the humanity! by Spike15 · 2009-07-03 09:55 · Score: 3, Insightful

Or (gasp!) make change without a computron! I wonder if they even train that in grocery stores anymore...scary, indeed.
I think the bigger issue in this case would be manually looking up the price for every single item. We tend to simplify selling things manually in this way (manually processing credit card transactions, making change manually, etc.), when really when really the biggest problem is being without the UPC system.