'Why You Should Not Use Google Cloud' (medium.com)
A user on Medium named "Punch a Server" says you should not use Google Cloud due to the "'no-warnings-given, abrupt way' they pull the plug on your entire system if they (or the machines) believe something is wrong." The user has a project running in production on Google Cloud (GCP) that is used to monitor hundreds of wind turbines and scores of solar plants scattered across 8 countries. When their project goes down, money is lost. An anonymous Slashdot reader shares the report: Early today morning (June 28, 2018) I receive an alert from Uptime Robot telling me my entire site is down. I receive a barrage of emails from Google saying there is some "potential suspicious activity" and all my systems have been turned off. EVERYTHING IS OFF. THE MACHINE HAS PULLED THE PLUG WITH NO WARNING. The site is down, app engine, databases are unreachable, multiple Firebases say I've been downgraded and therefore exceeded limits.
Customer service chat is off. There's no phone to call. I have an email asking me to fill in a form and upload a picture of the credit card and a government issued photo id of the card holder. Great, let's wake up the CFO who happens to be the card holder. What if the card holder is on leave and is unreachable for three days? We would have lost everything -- years of work -- millions of dollars in lost revenue. I fill in the form with the details and thankfully within 20 minutes all the services started coming alive. The first time this happened, we were down for a few hours. In all we lost everything for about an hour. An automated email arrives apologizing for "inconvenience" caused. Unfortunately The Machine has no understanding of the "quantum of inconvenience" caused.
Customer service chat is off. There's no phone to call. I have an email asking me to fill in a form and upload a picture of the credit card and a government issued photo id of the card holder. Great, let's wake up the CFO who happens to be the card holder. What if the card holder is on leave and is unreachable for three days? We would have lost everything -- years of work -- millions of dollars in lost revenue. I fill in the form with the details and thankfully within 20 minutes all the services started coming alive. The first time this happened, we were down for a few hours. In all we lost everything for about an hour. An automated email arrives apologizing for "inconvenience" caused. Unfortunately The Machine has no understanding of the "quantum of inconvenience" caused.
If millions of dollars are on the line, you should be running your own systems. Seriously. I'm not an IT expert, data infrastructure guy or anything. I'm just a dumb nerd, and I know that. Never trust your data to a third party when millions are at stake -- let alone critical infrastructure reliability.
Beware of the Leopard.
Why was there a second time?
Our company tried to use Amazon a few years ago and ran into the same issues. Although google and amazon allow you to
spin up a single instance, they are really designed for companies that have hundred if not thousands of servers. Amazon
assumes that you have dozens of fault tolerant servers and if one goes down you just replace it with another one. This works
great for companies like Netflix but Amazon is a disaster for a company that isn't fully fault tolerant and has critical servers
that can't go down. Liquidweb, Rackspace, Linode, and even Digitalocean are more reliable when it comes to wanting to
keep a single server up and running with minimal downtime. Now if you need to keep thousands of servers up and don't care
if any one server goes down then Amazon works fine.
Why was there a second time?
So many of the problems here (ex. paying with a credit card and one that has only a single person's name on it? Having no fallback that can be spun up elsewhere?) are foolish if this has never happened before, and utterly, mind-bogglingly idiotic if this in fact has already happened before. It's one thing to be blind of something you should know could be a problem, it's quite another to be blind and wholly unprepared for a problem you've personally experienced! Something seems fundamentally wrong at this company.
Also, if your entire business can die because it takes an unexpected few days off, then perhaps your business is running a bit too raggedly and doesn't have enough meat on the bones . . .
I remember sigs. Oh, a simpler time!
The company thought they could get away with paying less for server infrastructure. They can. But they get less. This is one of the "less" things they get.
If you value your data, host it yourself, preferably in multiple locations. If you want to go cheap, then you can expect to lose things.
Like your data, or access to it, or availability of it.
It's not such a smart thing to cheap out on the important stuff.
Of course, convincing the bean counters of future risk inherent in what appears to them to be current savings... good luck with that.
Well, best to get rid of your bean counters. :)
Here's a maxim of mine I like to drop on the table during discussions like these:
I've fallen off your lawn, and I can't get up.
This sounds like asking fro trouble to me!