Amazon Outage Cost S&P 500 Companies $150M (axios.com)

← Back to Stories (view on slashdot.org)

Amazon Outage Cost S&P 500 Companies $150M (axios.com)

Posted by msmash on Friday March 3, 2017 @03:25AM from the aftermath dept.

From a report on Axios: Cyence, an economic modeling platform, shared some data with Axios that show the ramifications: Losses of $150 million for S&P 500 companies. Losses of $160 million for U.S. financial services companies using the infrastructure.

8 of 113 comments (clear)

Min score:

Reason:

Sort:

Maybe you should own your hardware by Chronus1326 · 2017-03-03 03:27 · Score: 1, Insightful

If you took responsibility for your own hardware resources, this wouldn't have been an issue for you.
1. Re:Maybe you should own your hardware by fabriciom · 2017-03-03 03:31 · Score: 3, Insightful
  
  If you ever get to management and you have to answer for errors of your subordinates your opinion will change.
2. Re:Maybe you should own your hardware by James+Carnley · 2017-03-03 03:41 · Score: 5, Insightful
  
  Yeah because self hosted hardware never goes down. Totally rock solid. I don't know why everyone doesn't host their own stuff so that nothing can go wrong.
3. Re:Maybe you should own your hardware by bws111 · 2017-03-03 04:31 · Score: 3, Insightful
  
  Yeah. Do you also
  * Run your own communications system with 2-way radios, or do you trust telcos for that?
  * Run your own wires to every customer, or do you trust ISPs for that?
  * Run your own fleet of trucks to deliver product, or do you trust shipping cos for that?
  * Have all you customers pay you directly in cash that you keep in your own vault, or do you trust credit card companies and banks for that?
  * Perform all your own accounting, or do you trust outside accountants for that?
  The list goes on and on. Every one of those is at least as important as servers (and in some cases they are far more important)
4. Re:Maybe you should own your hardware by JaredOfEuropa · 2017-03-03 04:51 · Score: 3, Insightful
  
  That's just how it works. If your underlings fuck up, it's poor management on your part. If your cloud hosting partner fucks up, it's breach of contract and not your fault. Especially if you went with a well known vendor with all the right ISO stuff. You probably won't even be challenged much on the decision to go to the cloud in the first place, since that's now standard business practice. And to be honest, running your own data centre only makes sense if you know how to do that; I've had a few clients that saw a vast improvement in reliability and delivery after they moved to the cloud.
  
  --
  If construction was anything like programming, an incorrectly fitted lock would bring down the entire building...
Meh by argStyopa · 2017-03-03 03:43 · Score: 5, Insightful

We hear this sort of statistic a lot but I have to ask, did they REALLY?
Anyone with experience with this sort of thing understand how fluffy these numbers are, based on statistics, some WAG, etc.
For example:
We processed $1 million orders per hour.
We were down for 3 hours.
Ergo we "lost" $3 million.
In fact, no such thing is true. At least, not like someone poured $3 million in cash into a furnace and actually LOST the money.
First, there's the missed opportunity sales. What you're talking about in fact is purchases that didn't take place because the seller wasn't available. This has everything to do with flexibility of supply and time-sensitivity of delivery. If in fact John Smith wanted to order shoes from Amazon, and Amazon was down, so he went to company XYZ and bought those shoes or decided not to buy at all, then in fact is is reasonably a "lost sale" for Amazon. HOWEVER, if John couldn't reach XYZ (not unlikely with the broad infrastructure hit that the outage caused), or they didn't have his brand, or he just said "ok, I'll just buy them tomorrow" it WASN'T a lost sale at all. And it's HIGHLY unlikely that the consultants throwing together these figures rationalized any later excess demand back into the 'missing' hours.
Secondly, even if there are actual lost sales, that is NOT the same as lost money. Lost sales are lost margin. If Amazon is selling a shoe for $100, they have to BUY it somewhere, say for $70. So if John didn't buy that shoe, Amazon didn't have to buy that shoe either. Therefore Amazon wasn't out $100, they were out only their margin, or $30. In the interest of fluffing numbers and getting the result quickly (and because the actual result would take hard work as well as involving some proprietary info like margins that you might never get), I've almost never seen "loss" statistics like this reported as anything but gross numbers. Depending on the margins of sale involved, this can easily be 10x what the actual lost margin was. (Plus, the point of course is to show how impactful something is in the first place....)
Combining the two? I'd guess that the actual financial impact is barely 1% of the number stated.

--
-Styopa
Re:Really? by ThomasBHardy · 2017-03-03 03:44 · Score: 4, Insightful

I think it's even more overstated than that.
Without having any indicator other than that link to an article a couple of lines long, we have no info.
Is the $150 million value the "normal throughput of transactions during the regular operation of that same time frame that the outage occurred? Because if so, I highly doubt they lost that much. I tried to place an order somewhere during that outage. There was an error. So i tried again later and placed my order. The company lost nothing in regards to my order. I'm sure mine is not the only transaction that was not re-tried later on.
Bold statements about what an outage costs are not helpful unless the methodology for calculating that cost is both divulged and reasonably calculated.

--
Warning: Teh poster of this messaeg is lysdexic
Cloud != Magic by ErichTheRed · 2017-03-03 04:30 · Score: 3, Insightful

I'm working on a huge migration of an on-site system to Azure right now, and it's hard to convince people paying the bills of what's actually needed to guarantee high availability. The S3 outage is a perfect example of this...we have the same problem with Azure Storage Accounts being treated as a magic box by the developers. For example, Azure storage has locally redundant and geo-redundant levels. People hear "redundant" and assume that there will never be any issues accessing things you store in a storage account. If there was a disaster of some kind, it only protects the _data_ against the failure of a rack (locally redundant) or a datacenter (geo-redundant.) If a problem like what happened with S3 occurred, and access to the actual storage through the software-defined magic is disrupted, you're still going to have a bad day. You just (probably) won't lose the data. Obviously the cloud providers do everything they can to make sure things stay running, but not adding in some sort of failover above the cloud service level is just asking for trouble if you're doing anything critical.
I'm a "classic IT" guy who totally has an open mind about the cloud, but I do think there's lots of hype and misinformation. Designing for high availability is at least as hard as it was. Doing this in the cloud is quite expensive...maybe not as expensive as rolling your own infrastructure, but a wake-up call when the CIO gets the bill. I just wish the hype bubble would die down so people could have rational conversations about public cloud. It's just like on-premises stuff - don't pay for HA and risk downtime, or pay up and get the SLAs you pay for. I just hate that people are going around saying the cloud is bulletproof and immune to failures....it's technology at the end of the day and people make mistakes (especially overworked AWS engineers working 100 hour weeks or Microsoft guys who forgot to renew certificates, etc.)