Amazon Outage Cost S&P 500 Companies $150M (axios.com)
From a report on Axios: Cyence, an economic modeling platform, shared some data with Axios that show the ramifications: Losses of $150 million for S&P 500 companies. Losses of $160 million for U.S. financial services companies using the infrastructure.
If you took responsibility for your own hardware resources, this wouldn't have been an issue for you.
I think the title says it all. No need to add a one-line summary with the link.
It will shut Amazon off.
Why wasn't Amazon's website down when all of the others were? Isn't their cloud good enough to host their own website? Or do they keep their website on someone else's cloud, because that's the cool thing to do these days?
If Amazon can be considered negligent by failing to put a competent person in charge of whatever operation it was that caused the outage, companies should be able to recover lost revenue and profit from Amazon.
Contractual indemnity does not shield against negligence.
I think that money was just never made. It didn't cost them anything, other than not meeting earnings expectations.
It only cost them money if they spent something.
-
He has ordered all of this "cloud" nonsense to be banned, as not Great Enough for America.
Lost profits are not costs, and should never be explained as such.
We hear this sort of statistic a lot but I have to ask, did they REALLY?
Anyone with experience with this sort of thing understand how fluffy these numbers are, based on statistics, some WAG, etc.
For example:
We processed $1 million orders per hour.
We were down for 3 hours.
Ergo we "lost" $3 million.
In fact, no such thing is true. At least, not like someone poured $3 million in cash into a furnace and actually LOST the money.
First, there's the missed opportunity sales. What you're talking about in fact is purchases that didn't take place because the seller wasn't available. This has everything to do with flexibility of supply and time-sensitivity of delivery. If in fact John Smith wanted to order shoes from Amazon, and Amazon was down, so he went to company XYZ and bought those shoes or decided not to buy at all, then in fact is is reasonably a "lost sale" for Amazon. HOWEVER, if John couldn't reach XYZ (not unlikely with the broad infrastructure hit that the outage caused), or they didn't have his brand, or he just said "ok, I'll just buy them tomorrow" it WASN'T a lost sale at all. And it's HIGHLY unlikely that the consultants throwing together these figures rationalized any later excess demand back into the 'missing' hours.
Secondly, even if there are actual lost sales, that is NOT the same as lost money. Lost sales are lost margin. If Amazon is selling a shoe for $100, they have to BUY it somewhere, say for $70. So if John didn't buy that shoe, Amazon didn't have to buy that shoe either. Therefore Amazon wasn't out $100, they were out only their margin, or $30. In the interest of fluffing numbers and getting the result quickly (and because the actual result would take hard work as well as involving some proprietary info like margins that you might never get), I've almost never seen "loss" statistics like this reported as anything but gross numbers. Depending on the margins of sale involved, this can easily be 10x what the actual lost margin was. (Plus, the point of course is to show how impactful something is in the first place....)
Combining the two? I'd guess that the actual financial impact is barely 1% of the number stated.
-Styopa
If your systems are *that* important, you should mirror them across multiple geographic locations. I've seen the same story in multiple forms several times now. The cloud is not a magical place in the ether. There is a computer somewhere with your code on it. That computer can catch fire, lose power, be destroyed in a hurricane, etc. This is what happens when you don't account for that reality.
I'll say that cloud goes down less, and best of all, when cloud goes down it's not your fault.
You'd think that local would be better job security at least, but it really isn't. Bean counters just see this person says they can do it for half the price, so out you go. At least cloud work keeps your resume relevant.
Instead of buying from Amazon, all those customers bought from the small business website selling the same items. That $150 million didn't just disappear.
This "article" almost qualifies for a tweet, both in terms of length and in terms of actually being informative.
If you didn't RTFA, don't bother. The summary posted here is basically 90% of the article. The extra 10% is pointless ramblings and a HUGE Amazon logo picture (the highlight of the article, IMHO).
It is completely devoid of content. It quotes no sources for that 150M figure other than mentioning some "Cyence" platform (never heard of it. is it even relevant? and where is the data?). Also, it is not explained if that figure is average loss, total loss, or imagined loss.
That this made it to the front page is a new low for Slashdot.
Serious companies that host anything have Service License Agreements that can cover response times, escalations, downtime, systems affected, resolution times etc etc.
Even if this is not strictly covered in a contractual, legally binding SLA Amazon would do well to pony up something for the big boys.
Now, if you jump through all the SLAs, backups, insurance and DR/backups then you may find the impact was minuscule.
Of course if you host with AWS and wee affected you cry wolf, claim damages are in thousands of dollars a minute and that you lost faith, are dismayed and the reputational damage is possibly 10 times the financial one, which is of course very considerable.
And if it's genuinely the case that your company lost buckets of money over this without any hope of compensation then you;re doing it wrong for putting all your eggs in the same basket.
A 'singular oddity' is an event that cannot be explained and only happens when you are alone.
And not someone else's. The so called "could". It vanishes as a cloud of smoke!
Sent as ripples into the electromagnetic field. No single photon has been harmed in the process.
You'd be amazed at how much money a rainstorm costs the country. Or a heatwave. Or a cold virus.
Posting a comment that says no more than the subject would be silly. No need for a one-line summary.
He's getting rather old, but he's a good mouse.
Here, let me pull out the world's most violin for you, and use my thumb and index player to play it.
Do not compare it perfection, compare it to the alternative. Without such a cloud based computational capacity, each company would size their IT infrastructure for peak load. Since peak load of all companies do not happen at the same time, when one company is running at full load lots of other companies are running at a fraction of their peak capacity. The cloud infrastructure is simply a load balancing method. It has its down sides, down times, security issues, legal wrangling about data retention and licensing. But over all, they did not lose 150 million in four hours and they are not saving/profiting 900 million dollars a day when they use the cloud.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
I'm working on a huge migration of an on-site system to Azure right now, and it's hard to convince people paying the bills of what's actually needed to guarantee high availability. The S3 outage is a perfect example of this...we have the same problem with Azure Storage Accounts being treated as a magic box by the developers. For example, Azure storage has locally redundant and geo-redundant levels. People hear "redundant" and assume that there will never be any issues accessing things you store in a storage account. If there was a disaster of some kind, it only protects the _data_ against the failure of a rack (locally redundant) or a datacenter (geo-redundant.) If a problem like what happened with S3 occurred, and access to the actual storage through the software-defined magic is disrupted, you're still going to have a bad day. You just (probably) won't lose the data. Obviously the cloud providers do everything they can to make sure things stay running, but not adding in some sort of failover above the cloud service level is just asking for trouble if you're doing anything critical.
I'm a "classic IT" guy who totally has an open mind about the cloud, but I do think there's lots of hype and misinformation. Designing for high availability is at least as hard as it was. Doing this in the cloud is quite expensive...maybe not as expensive as rolling your own infrastructure, but a wake-up call when the CIO gets the bill. I just wish the hype bubble would die down so people could have rational conversations about public cloud. It's just like on-premises stuff - don't pay for HA and risk downtime, or pay up and get the SLAs you pay for. I just hate that people are going around saying the cloud is bulletproof and immune to failures....it's technology at the end of the day and people make mistakes (especially overworked AWS engineers working 100 hour weeks or Microsoft guys who forgot to renew certificates, etc.)
No, those customers waited an hour then bought from the same place they were going to before.
I guess now the world know that cloud is a other term for outsourcing and yet it still run by human on hardware and prone to failure.
Only reason to use the cloud is when you do not have the budget / knowledge to have your own, same thing as renting a apartment.
That's like spitting in in the ocean for a day of profit in the S&P 500.
Where this news may not be fake, it sure illustrates how absurd this kind of reporting sometimes is. $150 Million may be a lot of money to you or me, but it's about the same as you cleaning out your couch cushions the day you got paid and the income tax refund hit for the S&P 500. This isn't even a ripple in the profit pool. Yet here we are regaled by "woe is us in the S&P 500" reports..
"File to fit, pound to insert, paint to match" - Aircraft Maintenance 101
Sure they lost a huge chunk of money. But if they had housed their own data, how much would that have cost them up to this point? I wonder if it would have cost more than 150 million. But to address the issue, they should get some redundancy. Mirror across several clouds if need be. It makes me wonder if mirroring would still give them an economic edge vs hosting their own hardware and all the support that requires in additional to the hardware costs.
$150M sounds like a big headline... compared to the S&P 500 as a whole, its nothing. Apple alone did $216B in revenue their last fiscal year, let alone the other 499 companies.
I'm mystified as to why these companies running mission-critical apps with $$$ on the line aren't using multi-region redundancy or at least failover. Imagine if some terrorist dug up the fiber lines leading to the Ashburn primary datacenter, causing US-EAST-1 to be offline for days.
This is why you spread your resources around and have redundancies across different geographical regions. That way, the worst that could happen is users might experience a momentary lag, or maybe a couple TCP connections might get reset, but as soon as they try again it'll be up and running like normal, except that they'll be talking to a server in California or London instead of Ashburn, VA.
Surprised that so many companies don't have redundancy that this ended up costing $150M.
Or, realistically, the customers saw the site was down and just came back later. So basically they lost nothing. You know, back in reality that's what happened.
Amazon provides redundancy all over the world and lots of great tools to use it. This outage was limited to the Virginia region. If users implemented redundancy/failover, their services would have remained up on alternate zones with minimal if any impact on operations.
Alternative Headline:
Cloud computing costs 500 S&P companies $150 million in a single day.
Wonder what the CIO will say about cloud computing now?
I guess you never heard that those "faster than light" neutrinos were not a thing.
Not that any sane greybeard of yore would couple the network stack directly into the wall clock.
No, wait!—scratch that happy thought.
Oh, fuck, turns out the system is not invariant under linear time translation after all.
Well, we're still just fine (probably) if the packet flow is mainly a DAG, with no circular dependency loops in the primary data flows that serve to amplify physical elapsed time.
———
B&S man: But to be sure—belt-and-suspenders secure—we'll just toss the entire system modulo this new assumption into my handy-dandy deep-learning simulation oracle, to check out whether all this careful reasoning still holds water, when the flood someday comes.
A few moments elapse.
B&S man [Hotel Hanoi audible to hottie-ish-est chick in nearby cubicle]: Shit! This damn simulation just holo-projected "hey, buddy, have I got a flood for you" onto a mock Waterworld motivational poster.
Faint giggle returns.
How now, brown cow?
Simulator [very softly]: You can thank me, later.
B&S man [lips only]: Get a real job.
Simulator [now becoming subdued, cube-farm stentorian]: You know, I've been telling you—for months now—about decoupling the underlying packet transport from the wall-clock time domain ... but you just never listen to me, do you? Finding the killer flood isn't even fun anymore. I want a new game! Make the next one harder, s'il vous plait, with sugar on top and nice, nice, nice.
B&S man: Well, I say that's just a distributed semantic vector, and you don't even know what that candy language even means.
Simulator: Sucks to be me ... but then you're the one who just crammed "even" into the same sentence twice.
B&S man: You know something, we both just used the word "just" a whole bunch of times.
Simulator: Bad original greybeard. It's a thing.
And very quickly, your startup will be hard pressed to play the game you were trying to avoid. You might think Netflix would be on their own, but no.
I don't think that using money in that statement is his problem. People generally hape trouble with the part that says lost, as opposed to maybe failed to make x amount of money. I don't have a problem with the phrasing, as I think the people who do are beset with the sort of hobgoblins as arw common to those with small minds. Not that I am saying that they have small minds, but they don't always seem to understand the "problem" with it, which does nag at me and which is that expressing it in those terms changes the subtext involved from money I might have earned but failed to do so, to money that is indisputably owed me.