Amazon Outage Cost S&P 500 Companies $150M (axios.com)
From a report on Axios: Cyence, an economic modeling platform, shared some data with Axios that show the ramifications: Losses of $150 million for S&P 500 companies. Losses of $160 million for U.S. financial services companies using the infrastructure.
If you took responsibility for your own hardware resources, this wouldn't have been an issue for you.
We hear this sort of statistic a lot but I have to ask, did they REALLY?
Anyone with experience with this sort of thing understand how fluffy these numbers are, based on statistics, some WAG, etc.
For example:
We processed $1 million orders per hour.
We were down for 3 hours.
Ergo we "lost" $3 million.
In fact, no such thing is true. At least, not like someone poured $3 million in cash into a furnace and actually LOST the money.
First, there's the missed opportunity sales. What you're talking about in fact is purchases that didn't take place because the seller wasn't available. This has everything to do with flexibility of supply and time-sensitivity of delivery. If in fact John Smith wanted to order shoes from Amazon, and Amazon was down, so he went to company XYZ and bought those shoes or decided not to buy at all, then in fact is is reasonably a "lost sale" for Amazon. HOWEVER, if John couldn't reach XYZ (not unlikely with the broad infrastructure hit that the outage caused), or they didn't have his brand, or he just said "ok, I'll just buy them tomorrow" it WASN'T a lost sale at all. And it's HIGHLY unlikely that the consultants throwing together these figures rationalized any later excess demand back into the 'missing' hours.
Secondly, even if there are actual lost sales, that is NOT the same as lost money. Lost sales are lost margin. If Amazon is selling a shoe for $100, they have to BUY it somewhere, say for $70. So if John didn't buy that shoe, Amazon didn't have to buy that shoe either. Therefore Amazon wasn't out $100, they were out only their margin, or $30. In the interest of fluffing numbers and getting the result quickly (and because the actual result would take hard work as well as involving some proprietary info like margins that you might never get), I've almost never seen "loss" statistics like this reported as anything but gross numbers. Depending on the margins of sale involved, this can easily be 10x what the actual lost margin was. (Plus, the point of course is to show how impactful something is in the first place....)
Combining the two? I'd guess that the actual financial impact is barely 1% of the number stated.
-Styopa
I think it's even more overstated than that.
Without having any indicator other than that link to an article a couple of lines long, we have no info.
Is the $150 million value the "normal throughput of transactions during the regular operation of that same time frame that the outage occurred? Because if so, I highly doubt they lost that much. I tried to place an order somewhere during that outage. There was an error. So i tried again later and placed my order. The company lost nothing in regards to my order. I'm sure mine is not the only transaction that was not re-tried later on.
Bold statements about what an outage costs are not helpful unless the methodology for calculating that cost is both divulged and reasonably calculated.
Warning: Teh poster of this messaeg is lysdexic
I'm working on a huge migration of an on-site system to Azure right now, and it's hard to convince people paying the bills of what's actually needed to guarantee high availability. The S3 outage is a perfect example of this...we have the same problem with Azure Storage Accounts being treated as a magic box by the developers. For example, Azure storage has locally redundant and geo-redundant levels. People hear "redundant" and assume that there will never be any issues accessing things you store in a storage account. If there was a disaster of some kind, it only protects the _data_ against the failure of a rack (locally redundant) or a datacenter (geo-redundant.) If a problem like what happened with S3 occurred, and access to the actual storage through the software-defined magic is disrupted, you're still going to have a bad day. You just (probably) won't lose the data. Obviously the cloud providers do everything they can to make sure things stay running, but not adding in some sort of failover above the cloud service level is just asking for trouble if you're doing anything critical.
I'm a "classic IT" guy who totally has an open mind about the cloud, but I do think there's lots of hype and misinformation. Designing for high availability is at least as hard as it was. Doing this in the cloud is quite expensive...maybe not as expensive as rolling your own infrastructure, but a wake-up call when the CIO gets the bill. I just wish the hype bubble would die down so people could have rational conversations about public cloud. It's just like on-premises stuff - don't pay for HA and risk downtime, or pay up and get the SLAs you pay for. I just hate that people are going around saying the cloud is bulletproof and immune to failures....it's technology at the end of the day and people make mistakes (especially overworked AWS engineers working 100 hour weeks or Microsoft guys who forgot to renew certificates, etc.)