'Why You Should Not Use Google Cloud' (medium.com)
A user on Medium named "Punch a Server" says you should not use Google Cloud due to the "'no-warnings-given, abrupt way' they pull the plug on your entire system if they (or the machines) believe something is wrong." The user has a project running in production on Google Cloud (GCP) that is used to monitor hundreds of wind turbines and scores of solar plants scattered across 8 countries. When their project goes down, money is lost. An anonymous Slashdot reader shares the report: Early today morning (June 28, 2018) I receive an alert from Uptime Robot telling me my entire site is down. I receive a barrage of emails from Google saying there is some "potential suspicious activity" and all my systems have been turned off. EVERYTHING IS OFF. THE MACHINE HAS PULLED THE PLUG WITH NO WARNING. The site is down, app engine, databases are unreachable, multiple Firebases say I've been downgraded and therefore exceeded limits.
Customer service chat is off. There's no phone to call. I have an email asking me to fill in a form and upload a picture of the credit card and a government issued photo id of the card holder. Great, let's wake up the CFO who happens to be the card holder. What if the card holder is on leave and is unreachable for three days? We would have lost everything -- years of work -- millions of dollars in lost revenue. I fill in the form with the details and thankfully within 20 minutes all the services started coming alive. The first time this happened, we were down for a few hours. In all we lost everything for about an hour. An automated email arrives apologizing for "inconvenience" caused. Unfortunately The Machine has no understanding of the "quantum of inconvenience" caused.
Customer service chat is off. There's no phone to call. I have an email asking me to fill in a form and upload a picture of the credit card and a government issued photo id of the card holder. Great, let's wake up the CFO who happens to be the card holder. What if the card holder is on leave and is unreachable for three days? We would have lost everything -- years of work -- millions of dollars in lost revenue. I fill in the form with the details and thankfully within 20 minutes all the services started coming alive. The first time this happened, we were down for a few hours. In all we lost everything for about an hour. An automated email arrives apologizing for "inconvenience" caused. Unfortunately The Machine has no understanding of the "quantum of inconvenience" caused.
If millions of dollars are on the line, you should be running your own systems. Seriously. I'm not an IT expert, data infrastructure guy or anything. I'm just a dumb nerd, and I know that. Never trust your data to a third party when millions are at stake -- let alone critical infrastructure reliability.
Beware of the Leopard.
Why was there a second time?
Over 90 percent of Google income is adverts. You would be absolutely insane to trust them with your business or educational institution data.
Iâ(TM)m not saying MS or Amazon is great but at least their revenue model is not based exclusively or largely on data mining of users.
You need to design the systems such that they have a fall-back and can continue to operate without an internet connection.
Really ...
What are you going to do the next time a major blackout occurs, and the grid wants you to restart your turbines?
Our company tried to use Amazon a few years ago and ran into the same issues. Although google and amazon allow you to
spin up a single instance, they are really designed for companies that have hundred if not thousands of servers. Amazon
assumes that you have dozens of fault tolerant servers and if one goes down you just replace it with another one. This works
great for companies like Netflix but Amazon is a disaster for a company that isn't fully fault tolerant and has critical servers
that can't go down. Liquidweb, Rackspace, Linode, and even Digitalocean are more reliable when it comes to wanting to
keep a single server up and running with minimal downtime. Now if you need to keep thousands of servers up and don't care
if any one server goes down then Amazon works fine.
If an extended system outage can cause "millions of dollars in lost revenue" then you should have a DR plan. Don't put all your eggs in one basket. Have copies of everything at another site (EC2, Azure, Colo, etc) that you can turn on and switch to in this event. If millions of dollars are on the line, then it shouldn't be unreasonable to have such a plan and infrastructure established.
YouTube users, GMail users, etc. have all complained about similar issues with blackbox, zero accountability. On click, boom, you're done.
IANAL, but this is my theory...
We know that Google is controlled by some highly political people. People who want to be able to disconnect you, deplatform you, etc. at the drop of a dime. The more they make their services a customer service blackbox, the easier it is to get away with acting in bad faith.
By bad faith I mean specifically in contractual bad faith. All of the XKCD-citing hipsters miss a very important nuance of the law regarding "deplatforming assholes:" contracts are judged by the "good faith" conduct of both parties and evaluated by reasonable behavior standards.
They do things like tie your account to all of the services, including purchases, and after a few vague "bad behavior incidents" nuke it. Often taking real assets with them because of how those accounts are tied. I don't think, for instance, Microsoft would fair well if they cost someone $2k of XBox Live marketplace purchases because they cussed out a few butthurt players a few times (Microsoft claims it has the authority to do this). Google is the same way on a larger scale.
The more people that are involved, the more people who can be hauled into court, forced to testify, etc. You can demand they answer why they thought a reasonable person would act that way. You can point to flesh and blood people who are the focal point for a real user suffering real economic harm due to one or a few people's biases.
And then win damages.
IMO that is why you see these companies aggressively moving in this direction. It's about not facing as much accountability for acting like dicks.
Seriously. When someone else owns and operates your infrastructure, things like this are going to happen. When that someone earns their revenue from something other than the bill from them you pay every month, it's going to happen a lot more often because they'll be acting based on what's good for their business, not what's good for yours. This is life on any cloud platform. This was life with mainframe service bureaus back when they were the cloud platform of choice.
You have to make a call based on what the trade-offs are. Make sure you know what those trade-offs are going to be, bearing in mind that any contract you have is probably going to say the provider's only responsible for refunding your month's payment no matter what the cost to you of their mistake was. It's that that you're balancing against the cost of running your own hardware, not the monthly bill.
You should not have ANY one single point of failure.
Only 1 card holder? Single point of failure.
More importantly: Only 1 cloud provider? Single point of failure.
If you're running that level of cash, and still insist on outsourcing infrastructure, then fucking distribute it. Mirror the infrastructure between AWS, GCloud, and Azure. Even these companies themselves know this. Look up Amazon's DNS providers. Hint, its not JUST AWS, but they outside their own shit too *JUST IN CASE* their servers go offline.
I have an email asking me to fill in a form and upload a picture of the credit card and a government issued photo id of the card holder. Great, let's wake up the CFO who happens to be the card holder. What if the card holder is on leave and is unreachable for three days?
Uh, I don't know - take a picture of each and save them on your phone, in case you need them?
You report everything was back up within 20 minutes once you submitted the requested information - that seems pretty good to me.
Now, about your decision to only run one instance of your mission critical application suite on exactly one cloud service...
The story here is you consider it someone else's fault for your failure to plan/prepare for an outage.
Ken
Maybe I just didn't read enough, but it seems like he doesn't say anywhere exactly what happened. He implies it was a billing issue. That's all. Without knowing exactly what went on, it's very hard to care. I imagine it's something like "well the credit card details changed, oh, and we were 107 days overdue."
Also, millions of dollars are on the line for short downtime and you're billing to a credit card?
I have an email asking me to fill in a form and upload a picture of the credit card and a government issued photo id of the card holder. Great, let's wake up the CFO who happens to be the card holder. What if the card holder is on leave and is unreachable for three days? We would have lost everything -- years of work -- millions of dollars in lost revenue.
Somewhere in Russia, India and Nigeria, several callcenters full of scammers came all at once.
-=This sig has nothing to do with my comment. Move along now=-
As other people have pointed out, the magic letters here are "SLA". You must have a contract stating what the vendor's responsibilities are and be able to enforce that contract. Otherwise, you don't have a business, you just have a hobby.
I hate to blame the messenger, but either you didn't buy the right service level agreement or Google broke the contract.
If it's the first case, blame yourself and learn a lesson. You get what yo pay for. If Google doesn't offer the level of service you need, go elsewhere. If they do, either pay up or go elsewhere.
In the second case, you are rightfully upset but you should be talking to lawyers before talking to Slashdot.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
Why was there a second time?
So many of the problems here (ex. paying with a credit card and one that has only a single person's name on it? Having no fallback that can be spun up elsewhere?) are foolish if this has never happened before, and utterly, mind-bogglingly idiotic if this in fact has already happened before. It's one thing to be blind of something you should know could be a problem, it's quite another to be blind and wholly unprepared for a problem you've personally experienced! Something seems fundamentally wrong at this company.
Also, if your entire business can die because it takes an unexpected few days off, then perhaps your business is running a bit too raggedly and doesn't have enough meat on the bones . . .
I remember sigs. Oh, a simpler time!
The company thought they could get away with paying less for server infrastructure. They can. But they get less. This is one of the "less" things they get.
If you value your data, host it yourself, preferably in multiple locations. If you want to go cheap, then you can expect to lose things.
Like your data, or access to it, or availability of it.
It's not such a smart thing to cheap out on the important stuff.
Of course, convincing the bean counters of future risk inherent in what appears to them to be current savings... good luck with that.
Well, best to get rid of your bean counters. :)
Here's a maxim of mine I like to drop on the table during discussions like these:
I've fallen off your lawn, and I can't get up.
Who do you work for? I'm divesting. Hell, as a 6 man startup in the 90's, we knew better to have only one server farm in one colo. Granted, our failover was to the developmental farm on a T-1 in our office, but it was at least *some* failover. h, and the colo texted when there was a problem, real or imaginary.
That was 6 drunk amateurs 2 decades ago.
...(as mentioned in other comments):
1) Don't trust another company with your critical IT infrastructure!
2) Have redundant facilities with different ISPs. 3) Have tested backup/standby power systems.
Yes, it is expensive, but - how much would it cost you to be down a week? A month? There is no free ride.
This is not a problem with Google Cloud, this is a problem with all "cloud" platforms. It's really simple, they can be held liable so they put acquit ass-covering in the contract so that they can shut you down on a whim. If this doesn't work for you then you should not any "cloud" platform.
Not true. There are plenty of full service providers like liquidweb and rackspace that won't pull the plug on your server. I've had servers act up and I immediately get a phone call. In severe cases they might even disconnect the network until they can contact you and resolve the problem but they aren't going to destroy your data or even disable your computer without first contacting you. Even in cases of spam they will work with you and try to fix the problem and unless you really are a spammer they won't just boot you off their system the first time there is a problem.
s3 storage is *massively* expensive at scale, compared to in house. Even among cloud providers, there are competitors that are 75% less.
XML is like violence. If it doesn't solve the problem, use more.
One, note that just because they are a big name, it does not mean all their decisions are bullet proof guaranteed the best. Dropbox has the exact opposite story to tell.
For another, Netflix has a rather special position. They are *the* go-to reference customer for AWS. Amazon with almost every other breath references just how *awesome* Netflix is doing with AWS. As such, they assuredly have special status, Amazon is not going to just screw with Netflix because the second Netflix so much as whispers a negative AWS experience, their biggest reference customer has gone bad. If there is *any* company on the face of the Earth that can get away with single-sourcing from a cloud vendor, it's Netflix.
For the 98% of customers who are not highly prized marquee customers.. Well your experience will deviate from stories about Netflix.
XML is like violence. If it doesn't solve the problem, use more.
Mission-critical functions should be kept in-house. Never farm out anything that can kill your business if your vendor fails to do their job.
-jcr
The only title of honor that a tyrant can grant is "Enemy of the State."
... cheapest service they offer, the one that doesn't include 24/7 phone support - let alone a guaranteed SLA, to host your multi-million dollar wind/solar plant, where any service outage will cost you millions in service penalties.
I may be one of the "old timers" who I'm told is thinking about things in an "old school" way when I say this. But I've *always* warned people that "The Cloud" just means you're giving somebody else the responsibility of handling your data and the systems it runs on.
That makes sense sometimes. I'm not "anti cloud". But for anything really critically important to a business, I feel you should have it running locally and THEN consider cloud options as hot-failover sites, backup sites, etc. With cloud hosting, the whole thing is off limits to you as soon as your Internet circuit goes down, for one thing. With it running locally, you can still use it just fine anywhere on your LAN.
But additionally, if the provider hosting your stuff goes bankrupt or merges with someone else, or just plain decides it's not profitable enough without some pricing changes -- where does that leave you? Technically, they can just disappear with your whole software and data configuration overnight. Or they can put trained apes in charge of maintaining things so it suddenly has huge security holes. Who knows?
When you run things yourself, YOU are where the buck stops if things go wrong. If you're good at what you do, that should be more of a comforting thing than a scary thing. I've seen too many shops trying to cut corners on the I.T. hiring budget by bringing in less experienced people who really can't properly run the systems they're supposed to be caring for. The cloud for them is a crutch ... a way to get things done that are beyond their abilities. But that's not an ideal situation for a business to put itself in.
Sorry, but this is your fault. We put multi million and billion dollar clients on AWS and GCP and have never, ever had an issue like this. 1) You're hosting in the cloud and not actually understanding what a cloud provider is or does. 2) You don't have a plan that is reasonable for a multi-million dollar organization, one that includes some level of support or SLA. 3) You're not building with DR in mind. Data should be backed up somewhere safe. Your infrastructure should be 'infrastructure as code' which can be spawned up essentially at a moments notice. 4) Your admins dont have an escalation path that doesnt involve waking up the CFO. 5) Someone did something wrong, either you didnt pay the bills, or your usage was completely fucked so badly they shut your account down? I bet money that it was you didn't pay your bills. Seriously... no offense, but you guys need to look at your business and spend a bit more time/money/effort on building this stuff up in a way that doesn't just fall over...
This sounds like asking fro trouble to me!
As other people have pointed out, the magic letters here are "SLA". You must have a contract stating what the vendor's responsibilities are and be able to enforce that contract.
A contract is only as valuable as your ability to ensure it is enforceable. When you are dealing with a company the size of Google they can hire some flesh eating lawyers and have the bank account to keep you busy until you die and so if you plan to bring a lawsuit you'd better be prepared for shock and awe. Just having a contract isn't enough by itself.
You are right that a service level agreement is a very good idea but it isn't going to matter if it is cheaper for them to screw you anyway.
Otherwise, you don't have a business, you just have a hobby.
That's a nice sound bite but it's complete BS. When you are a small business or a startup you generally simply don't have the resources to fight a company the size of Google. You can have whatever agreements you want but if they decide to screw you there isn't much you can do about it. I've started several companies where we had to depend more than is ideal on a single large vendor and it's freaking terrifying if/when you don't have alternatives - contract or no.
This is not about a server failure, router failure, logic bomb, or hacking incident - this is a hosting company deciding to flip a switch and simply turn off your entire infrastructure because an automated process determined "something" in your entire ecosystem was abnormal......
Frack that. never. not 1 dollar.