'Why You Should Not Use Google Cloud' (medium.com)
A user on Medium named "Punch a Server" says you should not use Google Cloud due to the "'no-warnings-given, abrupt way' they pull the plug on your entire system if they (or the machines) believe something is wrong." The user has a project running in production on Google Cloud (GCP) that is used to monitor hundreds of wind turbines and scores of solar plants scattered across 8 countries. When their project goes down, money is lost. An anonymous Slashdot reader shares the report: Early today morning (June 28, 2018) I receive an alert from Uptime Robot telling me my entire site is down. I receive a barrage of emails from Google saying there is some "potential suspicious activity" and all my systems have been turned off. EVERYTHING IS OFF. THE MACHINE HAS PULLED THE PLUG WITH NO WARNING. The site is down, app engine, databases are unreachable, multiple Firebases say I've been downgraded and therefore exceeded limits.
Customer service chat is off. There's no phone to call. I have an email asking me to fill in a form and upload a picture of the credit card and a government issued photo id of the card holder. Great, let's wake up the CFO who happens to be the card holder. What if the card holder is on leave and is unreachable for three days? We would have lost everything -- years of work -- millions of dollars in lost revenue. I fill in the form with the details and thankfully within 20 minutes all the services started coming alive. The first time this happened, we were down for a few hours. In all we lost everything for about an hour. An automated email arrives apologizing for "inconvenience" caused. Unfortunately The Machine has no understanding of the "quantum of inconvenience" caused.
Customer service chat is off. There's no phone to call. I have an email asking me to fill in a form and upload a picture of the credit card and a government issued photo id of the card holder. Great, let's wake up the CFO who happens to be the card holder. What if the card holder is on leave and is unreachable for three days? We would have lost everything -- years of work -- millions of dollars in lost revenue. I fill in the form with the details and thankfully within 20 minutes all the services started coming alive. The first time this happened, we were down for a few hours. In all we lost everything for about an hour. An automated email arrives apologizing for "inconvenience" caused. Unfortunately The Machine has no understanding of the "quantum of inconvenience" caused.
If millions of dollars are on the line, you should be running your own systems. Seriously. I'm not an IT expert, data infrastructure guy or anything. I'm just a dumb nerd, and I know that. Never trust your data to a third party when millions are at stake -- let alone critical infrastructure reliability.
Beware of the Leopard.
Why was there a second time?
Are you using Google Cloud in the first place? Is it because you run wind turbines and solar plants?
Why you shouldn't use the cloud period
Over 90 percent of Google income is adverts. You would be absolutely insane to trust them with your business or educational institution data.
Iâ(TM)m not saying MS or Amazon is great but at least their revenue model is not based exclusively or largely on data mining of users.
This happened to me. They had some kind of p2p malware going on at the data center, they saw that one of my servers use a p2p service (cryptocurrency) and they literally banned my entire project causing all servers in all regions to go offline. It took them DAYS to get everything back online with only a "sorry for the inconvenience" email. They costed me money and spent trust with my users. I had lots of redundancy, just never expected my project to get shut down.
I still use them, but now I spread my services across other cloud providers as well.
If someone else owns your infrastructure, then they but need to flip the switch and your infrastructure vanishes in a puff of, well, cloud.
This is the essence of "cloud". This is the future, everyone tells us.
That's impossible, AI wouldn't let this happen. /s
You need to design the systems such that they have a fall-back and can continue to operate without an internet connection.
Really ...
What are you going to do the next time a major blackout occurs, and the grid wants you to restart your turbines?
Our company tried to use Amazon a few years ago and ran into the same issues. Although google and amazon allow you to
spin up a single instance, they are really designed for companies that have hundred if not thousands of servers. Amazon
assumes that you have dozens of fault tolerant servers and if one goes down you just replace it with another one. This works
great for companies like Netflix but Amazon is a disaster for a company that isn't fully fault tolerant and has critical servers
that can't go down. Liquidweb, Rackspace, Linode, and even Digitalocean are more reliable when it comes to wanting to
keep a single server up and running with minimal downtime. Now if you need to keep thousands of servers up and don't care
if any one server goes down then Amazon works fine.
The wind turbines have acquired sentience and are tired of being spied upon.
Just wait until they grow some legs!
If an extended system outage can cause "millions of dollars in lost revenue" then you should have a DR plan. Don't put all your eggs in one basket. Have copies of everything at another site (EC2, Azure, Colo, etc) that you can turn on and switch to in this event. If millions of dollars are on the line, then it shouldn't be unreasonable to have such a plan and infrastructure established.
YouTube users, GMail users, etc. have all complained about similar issues with blackbox, zero accountability. On click, boom, you're done.
IANAL, but this is my theory...
We know that Google is controlled by some highly political people. People who want to be able to disconnect you, deplatform you, etc. at the drop of a dime. The more they make their services a customer service blackbox, the easier it is to get away with acting in bad faith.
By bad faith I mean specifically in contractual bad faith. All of the XKCD-citing hipsters miss a very important nuance of the law regarding "deplatforming assholes:" contracts are judged by the "good faith" conduct of both parties and evaluated by reasonable behavior standards.
They do things like tie your account to all of the services, including purchases, and after a few vague "bad behavior incidents" nuke it. Often taking real assets with them because of how those accounts are tied. I don't think, for instance, Microsoft would fair well if they cost someone $2k of XBox Live marketplace purchases because they cussed out a few butthurt players a few times (Microsoft claims it has the authority to do this). Google is the same way on a larger scale.
The more people that are involved, the more people who can be hauled into court, forced to testify, etc. You can demand they answer why they thought a reasonable person would act that way. You can point to flesh and blood people who are the focal point for a real user suffering real economic harm due to one or a few people's biases.
And then win damages.
IMO that is why you see these companies aggressively moving in this direction. It's about not facing as much accountability for acting like dicks.
Seriously. When someone else owns and operates your infrastructure, things like this are going to happen. When that someone earns their revenue from something other than the bill from them you pay every month, it's going to happen a lot more often because they'll be acting based on what's good for their business, not what's good for yours. This is life on any cloud platform. This was life with mainframe service bureaus back when they were the cloud platform of choice.
You have to make a call based on what the trade-offs are. Make sure you know what those trade-offs are going to be, bearing in mind that any contract you have is probably going to say the provider's only responsible for refunding your month's payment no matter what the cost to you of their mistake was. It's that that you're balancing against the cost of running your own hardware, not the monthly bill.
Don't put it all on Google's cloud, don't put it all on anyone else's cloud, don't put it all in one colo facility, don't put it all in your office building. The key to high availability for mission critical stuff is redundancy, not reliability of an individual provider. If you absolutely cannot be without internet, don't pay through the nose for an uplink with a five nines SLA: Get multiple uplinks. Spread your services over multiple clouds, if you want to use the cloud. This doesn't just protect you from single outages or "machines pulling the plug on you", it also makes your system more flexible, so that you can switch clouds if one tries to raise prices, for example. Do not put all eggs in one basket.
When you entrust your business to an outside cloud service you are entrusting people, organizations, policies, and procedures that you don't and usually can't know with the keys to the success of your business. They can be very useful and cost effective in situations but I would never trust an outside organization for mission critical services.
Turned off their automated systems, and who ever caused the flag to be raised with your Google Payments Account gets in and takes over your entire system, maxes out your CFO's credit card?
Put all your infrastructure under the physical control of some other entity well beyond your reach and then discover they can summarily turn it off and refuse to respond - Duh!
The Cloud Stikes Back!
They used a personal account, not a business plan, it's their own damn fault.
Never trust your data to a third party when millions are at stake -- let alone critical infrastructure reliability.
While that is reasonable advice, sometimes that isn't an option. Sometimes the only reasonable way to do things is through a third party. Furthermore sometimes the third parties can do a better job than I could do myself, even accounting for their flaws.
You should not have ANY one single point of failure.
Only 1 card holder? Single point of failure.
More importantly: Only 1 cloud provider? Single point of failure.
If you're running that level of cash, and still insist on outsourcing infrastructure, then fucking distribute it. Mirror the infrastructure between AWS, GCloud, and Azure. Even these companies themselves know this. Look up Amazon's DNS providers. Hint, its not JUST AWS, but they outside their own shit too *JUST IN CASE* their servers go offline.
Sounds like normal incompetence to me, as an American, where 911 services are unreliable and the POTS telephone system has 5x9s reliability, with no reliability for internet connectivity and no anti-trust litigation for Amazon or Google.
I have an email asking me to fill in a form and upload a picture of the credit card and a government issued photo id of the card holder. Great, let's wake up the CFO who happens to be the card holder. What if the card holder is on leave and is unreachable for three days?
Uh, I don't know - take a picture of each and save them on your phone, in case you need them?
You report everything was back up within 20 minutes once you submitted the requested information - that seems pretty good to me.
Now, about your decision to only run one instance of your mission critical application suite on exactly one cloud service...
The story here is you consider it someone else's fault for your failure to plan/prepare for an outage.
Ken
Embarrassing!
If millions of dollars are on the line, you should be running your own systems. Seriously. I'm not an IT expert, data infrastructure guy or anything. I'm just a dumb nerd, and I know that. Never trust your data to a third party when millions are at stake -- let alone critical infrastructure reliability.
http://www.datacenterknowledge.com/archives/2016/02/11/netflix-shuts-down-final-bits-of-own-data-center-infrastructure
Netflix Shuts Down Final Bits of Own Data Center Infrastructure
Completes seven-year process of transition to a 100-percent AWS infrastructure
You were saying?
> Note: This post is not about the quality of Google Cloud products. They are excellent, on par with AWS. This is about the “no-warnings-given, abrupt way” they pull the plug on your entire systems if they (or the machines) believe something is wrong.
Is it just me, or shouldn't "randomly shuts you off for arbitrary reasons" be something that disqualifies something from being excellent? Especially since it wasn't the first time it'd happened?
If you hit your finger with a hammer and then go see a doctor, the doctor's going to tell you to stop hitting your finger with a hammer. OP needs to stop hitting their finger with a hammer.
This is not a problem with Google Cloud, this is a problem with all "cloud" platforms. It's really simple, they can be held liable so they put acquit ass-covering in the contract so that they can shut you down on a whim. If this doesn't work for you then you should not any "cloud" platform.
This is just an example of reality catching up to all the idiots who said "put it in the cloud!" while ignoring all the risks. Play with fire and you'll eventually get burned.
Anons need not reply. Questions end with a question mark.
If you run adsense on your not-super-big-but-somewhat-popular company website they will merrily cut you off without telling you exactly why, with no way to contact them other than a stupid web form limited to 500 characters that you are supposed to use to explain what you did to fix the problem (and I believe that may now even be a simple checkbox saying "we fixed whatever you think we did wrong"). They did this twice to us. The first time I had no clue what it is we did wrong so I personally made a change, basically speaking to myself "what could it be that caused this? geez, maybe... this is their issue with us? could it? lemme try" and informed them. Two weeks(!!) later they reactivated the account, with a not-so-apologetic e-mail message basically saying we should be really glad and thank our deity of preference that they turned things back on at all, because, you know, they are way past big enough to not care and leave you hanging with impunity. Funny thing is, that thing I guessed (correctly, I thought at the time) we did wrong and caused our account to be suspended is now advocated by their own documentation as a marketing strategy. If it was that, what the hell? If it was not, why the hell did we get suspended? Oh and then they disabled our account a second time for something that was in no way whatsoever applicable to us. After explaining their mistake and basically saying we changed nothing to "fix our transgression" we got reinstated, somewhat quicker this time. I expected an apology. I should have known better. We were greeted with the same message that tells us we should be eternally grateful they flipped the switch. Because, you know, they are under no obligation to do so.
Forgetting things like privacy etc. for the moment, most Google things are excellent when it all works. When your use case and situation fall within the confines of their automated processes and/or rigorous guidelines enforced by likely overworked or stressed out personnel that you never get to interact with. As soon as something unexpected happens, you are (potentially) screwed and there was never a way for you to prevent it. Yeah, we have since moved most of our things off Google. Not that they care.
Maybe I just didn't read enough, but it seems like he doesn't say anywhere exactly what happened. He implies it was a billing issue. That's all. Without knowing exactly what went on, it's very hard to care. I imagine it's something like "well the credit card details changed, oh, and we were 107 days overdue."
Also, millions of dollars are on the line for short downtime and you're billing to a credit card?
I have an email asking me to fill in a form and upload a picture of the credit card and a government issued photo id of the card holder. Great, let's wake up the CFO who happens to be the card holder. What if the card holder is on leave and is unreachable for three days? We would have lost everything -- years of work -- millions of dollars in lost revenue.
Somewhere in Russia, India and Nigeria, several callcenters full of scammers came all at once.
-=This sig has nothing to do with my comment. Move along now=-
This is why, when I design a cloud service, I design for redundancy at the business level too, not just at the host level. I deploy to and run on multiple different cloud vendors at the same time. This way, all my eggs aren't in one basket if one has a billing hiccup or one goes out of service.
I hate to blame the messenger, but either you didn't buy the right service level agreement or Google broke the contract.
If it's the first case, blame yourself and learn a lesson. You get what yo pay for. If Google doesn't offer the level of service you need, go elsewhere. If they do, either pay up or go elsewhere.
In the second case, you are rightfully upset but you should be talking to lawyers before talking to Slashdot.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
Why was there a second time?
So many of the problems here (ex. paying with a credit card and one that has only a single person's name on it? Having no fallback that can be spun up elsewhere?) are foolish if this has never happened before, and utterly, mind-bogglingly idiotic if this in fact has already happened before. It's one thing to be blind of something you should know could be a problem, it's quite another to be blind and wholly unprepared for a problem you've personally experienced! Something seems fundamentally wrong at this company.
Also, if your entire business can die because it takes an unexpected few days off, then perhaps your business is running a bit too raggedly and doesn't have enough meat on the bones . . .
I remember sigs. Oh, a simpler time!
The company thought they could get away with paying less for server infrastructure. They can. But they get less. This is one of the "less" things they get.
If you value your data, host it yourself, preferably in multiple locations. If you want to go cheap, then you can expect to lose things.
Like your data, or access to it, or availability of it.
It's not such a smart thing to cheap out on the important stuff.
Of course, convincing the bean counters of future risk inherent in what appears to them to be current savings... good luck with that.
Well, best to get rid of your bean counters. :)
Here's a maxim of mine I like to drop on the table during discussions like these:
I've fallen off your lawn, and I can't get up.
Sharecroppers, company stores, vassals, etc... all have digital counterparts these days. Instead of a single entity though, it is spread out among several corporate entities and perpetuated by all levels of government. It's only going to get worse unless/until the people revolt. Problem is most of them don't even realize it. Quite clever way of creating highly productive slaves who think they are free.
As a consultant a typical war I get into with customers is to pick the cloud setup email, drive etc... I always recommend Microsoft instead of Google products, and I have to always remind people that, altough the strong brand name, Google is an advertising company not an Enterprise partner, and I have countless stories of google pulling the plug on services because of "reasons" whatever, also, they have no respect for the customer when they drop a product, they just send an email with a month notice, and that is a good one, and then they pull the plug, never ever use google products for enterprise, ever. True story!
Businesses should not purchase important equipment and services with typical consumer level contracts and conditions. One would think this is obvious.
Who do you work for? I'm divesting. Hell, as a 6 man startup in the 90's, we knew better to have only one server farm in one colo. Granted, our failover was to the developmental farm on a T-1 in our office, but it was at least *some* failover. h, and the colo texted when there was a problem, real or imaginary.
That was 6 drunk amateurs 2 decades ago.
This is Google's standard operating principal. They operate in an environment where they do what they want and always assume they know best. The vast majority of the operation does not have paying customers telling them what to or not to do as any normal business would.
This is why everything Google is "not invented here" and why they shut things down on a dime without consideration or responsibility. It isn't that they don't care about their customers. It's more fundamental than that. They don't see customers as customers and don't understand the concept of caring in the first place.
Relying on Google for anything is a career limiting bet you are guaranteed to lose sooner or later. Don't go there.
...(as mentioned in other comments):
1) Don't trust another company with your critical IT infrastructure!
2) Have redundant facilities with different ISPs. 3) Have tested backup/standby power systems.
Yes, it is expensive, but - how much would it cost you to be down a week? A month? There is no free ride.
"In the cloud" = "On someone else's computer".
If your system is reliant on someone else, especially what that someone else has no ready means of being contacted, like Google or Amazon, you have no one to blame but yourself if they randomly disconnect you, only explaining later that it was "because reasons".
If it's mission-critical, run your own servers. Servers fail, but if you host it yourself you can decide on what type and degree of redundancy you'll run. If something fails, you can deal with it directly. And you're unlikely to randomly shut yourself down for no good reason.
Running a large project on the cloud makes a lot of sense, for a number of reasons. However, every cloud provider is vulnerable to down time... just for starters. Add to that, he had once pulled off line by GCP. Why is this not spread across multiple providers? This is 2018, the headline "Single Point of Failure Can Lead to Downtime" is not news.
This company sucks at running a business.
cloud services as they are marketed by cloud providers. Suicidal for a business if you ask me.
Basically you are paying another party to have control of your business, being responsible to keep your business up and running. You are putting your business, the source of your livelihood and livelihood of all the company's employees on systems you have NO CONTROL OVER. Basically putting your business in the hands of other people, who quite frankly do not give a damn about your business other than it pays its bill every month. Have any kind of problem, and you are out of business until the problem is rectified. As a software engineer, I advise client against relying upon the cloud for critical business functions. It makes no sense to put applications, or code bases on devices you have NO CONTROL OVER!!!!
That being said, I do see cloud services being used as part of a businesses disaster preparedness plan...as a backup system...but nothing more. But giving a cloud provider full control over your business? Plain stupid if you ask me.
You should think about divesting from just-google. The cloud is costing you more already, get yourself a number of real servers with real hosting providers dispersed geographically. Running something solely on Google or Amazon clouds is technically identical to hosting everything on a single server.
Custom electronics and digital signage for your business: www.evcircuits.com
Bring up the SLA and at LEAST request outage credits....
The cloud is a computer owned by someone else.
Someone else owns the cpu, and can decide what to do with it.
Someone else owns the network infrastructure, and can decide what to do with it.
Someone else owns the disks, and can decide what to do with them.
Someone else owns/maintains the software and can decide what to do with it.
Someone else has control over the main breaker.
But don't worry, you have some control! You can decide how much to pay, when to make a payment, what payment method to use, the billing frequency, and even chose to receive a paper bill [ for a nominal fee].
Throughout all of Google's platforms you can expect this kind of treatment. JUST SAY NO TO GOOGLE.
> Great, let's wake up the CFO who happens to be the card holder. What if the card holder is on leave and is
> unreachable for three days?
Unreachable for three days? What if the CFO is dead? Oops!
In some circles, this is called "the campus bus problem"... The guy who knows everything walks out in front of the campus bus. Now what.
This also used to be seen on University computer systems... Some grad student has written processes in use all over the place, running from the home directory... The graduates/moves on. The account is deleted; chaos ensues.
Universities learned and moved on.
If you're anything but a whiny kid, you own your mistake, learn from it and move on. Don't blame someone else. The fact they're whining in the quasi pay walled "The Medium" says even more (I for one am fairly tired on getting nagged by them).
Remember when we complained about foreign tech. support? Well welcome to 2018, where there is none, even if you're paying. This "disconnect" seems to be how life goes in this so-called Information Age. Everything is SO optimized for profit today, that getting assistance for anything is getting closer and closer to being non-existent. We were sucked into it when we were given free web browsers, etc., that came with no user's manual. I understood, since the stuff was (beer) free. Fine. But these days, I run into trouble and get no help, even if I'm paying. So the digital wall of fine print that the OP ran into doesn't surprise me at all.
F*$k Google. They intercept your email so they can hook it up to their "Selfish Ledger". Company needs to bottom out.
With that much at stake, I think I would find something much better then Google Cloud for a storage solution. I tend to think of any Google product as a more basic solution then anything else. But some of these cloud complaints are universal throughout any cloud solution.
Had an identity I used on MS for support and forums for over 15 years. Tried to log in about a month ago and was told it had been temporarily suspended due to violations of their TOS. I tried to find out what
happened but they refused to tell me anything. They told me the only way I could get my account back was to have them send a text code to my phone. Only phone I have that accepts text is my google number which MS
WON'T ACCEPT. They don't have a system like amazon or google where they can call you @ a home number to have a robot announce a code over the phone. They demand your text#. If you don't have one, you don't get your account back.
Tried following up with their support -- twice -- both times was told I violated something in their TOS, and could I review to see what it might have been (WAY too vage). and to use their text-msg system to recover my
access (which I'd told both of the service reps didn't work -- and MS wouldn't take my voice number.
Completely lame.
As a cloud migration consultant, I see a lot of companies going dual cloud with a sort of DR model in a second cloud provider to avid tat kind of scenario. Yes, it creates quite an overhead but it could be worth it.
I keep stuff on my laptop, pc, cloud, secure HDD's, safe deposit box. On a business scale, you should be keeping stuff backed up across multiple platforms.
Mission-critical functions should be kept in-house. Never farm out anything that can kill your business if your vendor fails to do their job.
-jcr
The only title of honor that a tyrant can grant is "Enemy of the State."
We tried it in our company. Got some call from a Microsoft contractor when we signed up. For reasons the didn't explain, disabled our account a few days later. My boss got them to re-enable it but was so unimpressed we stuck with AWS. Amazon are evil, but at least they are competent evil.
The real problem is that Google customer service is unreachable.
Yes but you missed a caveat with the courts: Whoever has the most money wins. Sue Google and unless you're as big as say Samsung, they can tie you up in red tape and pretrial motions and depositions for years. Those court fees pile up and you will be made to hire a lawyer who will bill and bill and bill.
The trick is to avoid them in the first place. If you're a small fry and you think you can "do business" with Google as their "valued customer" or GUFAW! "valued partner" then you're an idiot who deserves to be screwed over.
So don't deal with them, okay? Same goes for Facebook and Amazon. Microsoft on the other hand do remotely care what business and developers think of them.
"We apologize for the inconvenience." bullcrap. I had the same exchange with the morons at Sears appliance service. They kept transferring me to god-awful call centers in Bangalore or some other place where their accents are so thick that talking to someone in the Deep South would be easier. After talking to six different people, I realized that they are all working off the same set of rules: Ask for the same damn information over and over, Try to sympathize, Pretend your computer is frozen, Blow off the customer.
These bastards had the balls to try to sell me a whole-home warranty. Why the eff would I buy that when I can't get you bastards to come fix my dishwasher? No wonder your company is going down in flames and good effing riddance.
Has anyone ever talked to anyone @ Google? I in all the years since I first heard "Google". I have never been able to chat with a live person on anything ever.
;)
To be fair, I have chatted with the
Azure/365 folks (took time, many calls (2 months), but did get "their" Information Protection/Crypto issues worked out, once they stopped pointing the finger at me),
Amazon (not so much, selling mostly, interface & whole experience sucks), AWS, never pulled the trigger, but did get through (pricing is mind numbing and complex),
other smaller data centers, GoDaddy (Good/Bad), (their interface just keeps getting worst), others, etc pretty good.
Just my 2 cents
Google pulls the plug on shit all the time.
Not an Amazon fanboy, but I have shit running on old amazon databaes they have not promoted for 10 or 15 years... they just keep shit running...
Or is it just a question of how large your account is?
We're just a mid-sized MSP, but there's always a way to get someone on the phone, 24x7. The on-call number customers call is the security-service that does our physical security, they forward the call to the on-call engineer (after the customer is verified using a "password"). The intermediate step is to ensure people don't call the on-call engineer for sysadmin-tasks that could be done during business hours.
Every customer can get the on-call number, provided they cough-up the money. For most, it's not worth it because servers are quite stable these days.
The problem is of course that Google is so big and has so many customers that it's not possible to "know" every single customer anymore.
Windows 2000 - from the guys who brought us edlin
Apparently cheap computing* be it enterprise or consumer has blinded people to what reliable hardware really is.
Both Tandem and Stratus demonstrated that failure is survivable.
*I blame Google for fostering the throwing more and more cheap hardware at the reliability problem, instead of making more reliable hardware.
They're busy chasing down and rooting out wrongthink. They can't be bothered with your capitailist business, which after all only wants to make a PROFIT.
ABOLISH BORDERS
ABOLISH PRISON
ABOLISH PROFIT
I tried to buy a Google smartphone a few years ago. I had bought previous phones from Google in the past without any problems. But this time, Google rejected my order. I spent hours on the phone with Google's customer "service?". They would keep telling me that they fixed the problem so I should try to place the order again. When I placed the order again, it would fail again. Then after several times of Google people telling me to try again, and it would fail again, Google's system finally blocked my whole account because I placed too many orders (which all failed)!!!
At one point, I received an email from Google telling me that my account was blocked because I had placed too many orders (which was what Google humans told me to do) and because I was ordering from a location where Google does not sell phones (California, the same state Google is located!)
I finally gave up and bought a different phone not from Google.
Remember folks:
The simple definition of " Cloud " is infrastructure you neither own nor control.
By offloading this responsibility to a third party ( Google in this case ) you simply add an additional point
of failure in the chain.
With any substantial amount of money on the line, the better way to do things is to have your own servers
( preferably two locations, one primary and one backup ) so if one site goes down, it's more of an annoyance
than a Class A Catastrophe.
Most companies, however, have to get burned before they understand that there is a limit to the number of
corners you can cut.
In a word: arrogance. We see this way too much with Google, the new evil.
When all you have is a hammer, every problem starts to look like a thumb.
First rule. "The Cloud" is great for scaling things up to handle massive unexpected bursts of load. It's great for being able to shunt things around to different locations to handle short term outages. All of that.
But.
If a given application is business critical - if it's going to cost millions of dollars per hour if it goes down - you need to have a plan. That plan needs to cover shooting the not-really-working-well instance (if it fails in a particularly nasty way), as well as bringing up another instance somewhere else. It needs to cover all the possible failure scenarios, and one failure scenario that too few people consider is, "What happens if the cloud provider goes down?"
Sure, Amazon, Google, etc., aren't likely to go down. But did they go down, or did they pull the pin on your account for some unknown reason? Who knows? Who cares, more to the point? Thing is, the service is dead, and you need to have somewhere you can bring online until it comes back (if it comes back.) If you don't have a plan to do this, if you don't have backups that you can access without needing an account at a given cloud provider, then you're setting up your business with a massive single point of failure. The odds of that failure might be low, but it's there, it's a risk, and it needs to be either mitigated or accepted by the high level execs in the business.
If you don't do this, if you don't understand this, you damn well shouldn't be in the job. You might be hamstrung by the higher ups - but you should at least be making sure they know they're hamstringing you.
... cheapest service they offer, the one that doesn't include 24/7 phone support - let alone a guaranteed SLA, to host your multi-million dollar wind/solar plant, where any service outage will cost you millions in service penalties.
I may be one of the "old timers" who I'm told is thinking about things in an "old school" way when I say this. But I've *always* warned people that "The Cloud" just means you're giving somebody else the responsibility of handling your data and the systems it runs on.
That makes sense sometimes. I'm not "anti cloud". But for anything really critically important to a business, I feel you should have it running locally and THEN consider cloud options as hot-failover sites, backup sites, etc. With cloud hosting, the whole thing is off limits to you as soon as your Internet circuit goes down, for one thing. With it running locally, you can still use it just fine anywhere on your LAN.
But additionally, if the provider hosting your stuff goes bankrupt or merges with someone else, or just plain decides it's not profitable enough without some pricing changes -- where does that leave you? Technically, they can just disappear with your whole software and data configuration overnight. Or they can put trained apes in charge of maintaining things so it suddenly has huge security holes. Who knows?
When you run things yourself, YOU are where the buck stops if things go wrong. If you're good at what you do, that should be more of a comforting thing than a scary thing. I've seen too many shops trying to cut corners on the I.T. hiring budget by bringing in less experienced people who really can't properly run the systems they're supposed to be caring for. The cloud for them is a crutch ... a way to get things done that are beyond their abilities. But that's not an ideal situation for a business to put itself in.
I'm sorry, but that is BS and you know it. It doesn't take a tremendous amount of planning or capital. It simply takes using proper, industry standard methods. It is industry standard for mission critical infrastructure to have both local redundancy (either automatic fail-over, or even at worst, manual fail-over, by either using cluster of hypervisors running the critical infrastructure on a virtual machine that can switch between multiple servers, or simply clustered services through Ricci/Luci cluster software (or any of a number of free and open source software stacks).
For REALLY important and critical items, a second datacenter should be located on another power grid, preferably 50 or more miles away, using a completely different ISP which uses a completely different backbone carrier for connection to the internet. You can do that on the cheap by renting a rack at a COLO site, and sticking 5 systems and a couple storage arrays in there (3 which are your cluster, with the other two having connectivity to the out of bands remote management of all systems and storage). The entire site can then simply be kept in warm backup mode (with people monitoring it remotely from your main office).
You won't be running an Amazon or Facebook with that kind of remote site, but you can keep a mid sized company's email, going along with financial/business applications, such as the one described by the original poster. If you really want to skimp, you can have your bean counters trade the remote location for a "cloud computing" setup, but you still need your local setup with local redundancy for hardware fault tolerance (so you can handle a burnt out power supply, failed hard drive, faulty RAM/CPU, failed patch/software deployment, by simply failing over to the other system).
I agree.
Get this: At the initial meeting, I asked if there were any speed issues and the vendor said, "No, you'll operate much faster than you do now."
The fucking latency was shit.
And get this: The firm logged in using RDP.
It was actually just one big duplication of our production servers (I had a dual system) loaded up to the "cloud."
It little behooves the best of us to comment on the rest of us.
All of the posters here saying it's your fault for not having two clouds or not configuring the account in some bizarre way are basically wrong. Google has a reputation for this kind of behaviour, and I'm not surprised they're still at it. Instead of "focus on the user," Google is "the user is the spammer." They are all about false positives and talk-to-the-hand. They have such contempt for anyone who does not work for Google, and when it comes to being self-critical they think "I had good intent" is good enough. I would not do any serious business with them unless I had an SVP's cel number. They are currently useful only for really big accounts, and for toy projects. We need to teach them to regret their arrogance, because they have some excellent infrastructure, and they treat their employees much better than Amazon. If not for this attitude problem, I think they could be the best cloud provider.
Not an if...but a when your own servers go down, you need a redundant backup plan...I would only count on the Google cloud as a plan C, and yes we have had Internet problems where plan D went onto effect! Plan for it! Only if u don't want to loose money!
child's-play. Try Billions.
Last project involved cloud-computers where we processed a BILLION USD every year (credit card payments)
And you're ALWAYS trusting a 3rd party. You own IT guys or cloud. Doesn't matter.
What DOES matter is backups and fail-overs. (These guys NEEDED a proper failover)
For the past 15+ years, I have worked with systems that must not go down. No - really, 1 second of down time in a year is intolerable for some applications. (Not always every system I work with, but at least some fall into this category.)
If the original poster of this story failed to keep in mind when designing the infrastructure that a computer that won't ever go down isn't available, then he/she failed the first principal of fault flexible systems. That principal is to come to terms with the level of acceptable fault vs. the available budget. 9's cost money. How many do you want to buy? A uptime SLA for 99.99% is still three and a half days down over 1 year. 99.999% SLA will cost more. A lot more.
From the OP " millions of dollars in lost revenue." - this is perhaps the first time I've heard that statement that I actually believe it. That still doesn't mean "you have chosen wisely".
So, let us do a thought experiment in the absence of any data. I'm going to get things wrong here for this particular situation because I don't know details and will be making assumptions.
Data Storage: Sounds like "We don't need backups because we have RAID" issue. Sorry, study availability zones in S3 on Amazon. You can stripe your RAIDs across a local zone, stripe it to other data centers geographically separated. On Linode, you can rsync between data centers. On Rackspace, you can use object storage in multiple data centers (Linode and Rackspace this will require software to accomplish as they do not offer a API to fire off copies unlike S3 does.) This ensures that the totality of data is not stored in one basket. That still doesn't back it up - and you need to do that too. Data may need to be restored due to human as well as machine error. One situation I remember was engineering a database system with 4, 8, 16, 24, 36, and 48 hour delayed slaves for super fast point in time recovery - with hot transaction backups that could be selectively applied. Just to add to the server count, it was also Master|Slave^6/Master|Slave^6 - 14 DB servers and 7 ingress data clusters of 5 each. (I'd say if you are in the US, there is a 100% chance you've done something that was processed by this system in the last year if you drive a car.)
Data Acquisition: This should use something like Atom Hopper/Rabbit MQ/Service Mix (oh god, please not Service Mix) in a clustered environment across data centers. Use DNS round Robbin for a cheap way out to find your ingress servers but requires some consideration of DNS timeouts. Datapoints are posted as messages, and servers claim the work entry from the Atom Hopper queue. This prevents the situation where clients cannot report their data, and also protects the processing servers from having a regional failure or failing to process a data message.
Also, systems should be configured so that Ansible, Salt, Git, or Puppet can automatically build systems in the cloud when stressed or even a total data center failure. Strict deployment and versioning should be enforced so that no server can't be replaced at the drop of the hat somewhere else due to "one off" changes.
Treat servers like cattle, not pets.
Last - KNOW YOUR VENDOR. This is absolutely critical. Here is something I suggest you try for your self - call the support line and time how long it takes to get a human on the phone. Remember that some vendors cost more than others, and there's a good reason for that. Getting, keeping, and paying really smart people to help you out with zero notice costs money. A lot of money. And a vendor that will be willing to let you skip tier I - III support (after you've proven you're not an complete idiot) is even more rare. I know of only one company that will do that, and it's the same one that will win the call and get a human test. But they cost twice to three times what others do - because they give you what the others will not.
What I hear in the OP's post is that they suffered a catastrophic failure and rate limits and fraud protection kicked in. I do understa
Necessity is the plea for every infringement of human freedom. It is the argument of tyrants; it is the creed of slaves.
Sorry, but this is your fault. We put multi million and billion dollar clients on AWS and GCP and have never, ever had an issue like this. 1) You're hosting in the cloud and not actually understanding what a cloud provider is or does. 2) You don't have a plan that is reasonable for a multi-million dollar organization, one that includes some level of support or SLA. 3) You're not building with DR in mind. Data should be backed up somewhere safe. Your infrastructure should be 'infrastructure as code' which can be spawned up essentially at a moments notice. 4) Your admins dont have an escalation path that doesnt involve waking up the CFO. 5) Someone did something wrong, either you didnt pay the bills, or your usage was completely fucked so badly they shut your account down? I bet money that it was you didn't pay your bills. Seriously... no offense, but you guys need to look at your business and spend a bit more time/money/effort on building this stuff up in a way that doesn't just fall over...
Sorry, you must be really bad at AWS.
Amazon's hardware isn't HA, your solution is.
I've run small to huge workloads on amazon, and saying that it's designed for companies that have hundreds/thousands of servers is totally wrong. In fact, it's not really that well designed for tons of servers because they don't really have a lot of built-in automation to handle hundreds, if not thousands, of servers.
Just because you use a cloud provider doesn't mean that it's turnkey. You still have to know everything. The difference is it costs less and generally it's easier.
The answer to "why you should not use google cloud" is simply "because it's run by google".
Cloud is awesome. Scale indefinitely, work out the kinks, test your product. All on a shoestring/pay-as-you-go budget. Very nice.
Yet Only cloud is shite in production. Provider X (in this case Google) has you by the balls. You do *not* want that. Tried and true fallback and failover with your own Docker setup and a stack of rented blades in a rack including nightly to-my-desk backups are an absolut must for any critical infrastructure. No matter how cool Kubernetes and Spanner are handling your stuff today, you at least want to be able to save your data tomorrow when things turn south.
BTW, all this is Captain Obvious speaking.
This isn't news, this is basic web stuff 1-oh'-1 that every third-grade webshop running critical WordPresses knows. It's really the exact opposite of rocket science.
The kid who set up this disaster deserves a smacking.
Lesson learned I guess.
We suffer more in our imagination than in reality. - Seneca
This sounds like asking fro trouble to me!
SLAs are what ensure companies can't do this to you. If you don't have an SLA with the cloud provider then you should probably run across multiple clouds and/or in-house infra.
If You Have millions of dollars then you can manten your own Personal Server.
They never get it. "uh oh something is suspicious, let's power off!" They don't understand that you need to be up. Up. Not down. Taking a system down is always wrong. If there is "suspicious" activity then analyze where it comes from and shut the source off and NOT THE GOOD GUYS!
It's a given Google is quite pedantic in many ways and not much customer friendly in the traditional meaning of "business customer", especially compared to others.
However, dude: "hundreds of solar turbines and scores of solar plants"? Thousands of dollars lost for every minute the control systems are not running?? And all this charged on that one single credit card of the CFO??? WTF??!! And I definitely wouldn't dare be the CFO in there!
Ever heard of "Mission Critical systems? Disaster Recovery? High Availability? Redundancy? Multi cloud or at worst an own datacenter where to keep cold critical emergency resources?
And perhaps only operating with providers through proper Account Managers (where available) and with a proper business relation like advance invoices, backup payment methods and such?
One day someone will come and say all their country's power plants were managed from a crashed large instance in a public cloud and that's why the entire country went dark. Blame it on the cloud provider? I don't think so.
( rant ) I once saw a pathetic 200 people business running with 10 years of its mails and drive storage (containing each and every document the company ever had, including legal stuff) on one of the first "all free" Google Apps account. Their incompetent VP engineering was the only one administering the Google account, to prevent clicking into some button that would change it in a proper paid business account. When this is the reasoning, frankly, better stick to opening a flower shop rather than running a business. With all respect to flower shops! ( /rant )
Trouble is they can't sign a sla worth considering.
I worked at a Hydroelectric project and used GCM to alert people when the values of PLCs hit certain thresholds (low voltage, high temps etc). There is no way I would just use Google for this though. These alarms are important, you should never just depend on one service. We also had a directly connect pro-face (with alarms) screen in the control room as well as various live graphs, android apps and widgets and email alerts. So if Google Cloud Messaging went down there were still other ways to detect a problem.
I never use any online cloud. Stupid idea IMHO.
It's a huge mistake to compare the service support of someone who has a budget of $50-$100/month vs someone that spends thousands per month.
The former case, even if they'd been running their own servers, it's very unlikely they'd have the ability to turn around repairs immediately or fend off attacks from intruders in the first place. That's the kind of operation where someone's 21 year old nephew sets up the "server". That kind of organization may be vulnerable to a lack of support from Google, AWS and Azure but the reality is that they are orders of magnitude better off than they would be on their own if something went wrong.
If you're an org that already has full time 24/7 staff and multiple redundancies in your onsite data center, then you spend the kind of money on your cloud services that is analogous and you get an analogous level of support.
As other people have pointed out, the magic letters here are "SLA". You must have a contract stating what the vendor's responsibilities are and be able to enforce that contract.
A contract is only as valuable as your ability to ensure it is enforceable. When you are dealing with a company the size of Google they can hire some flesh eating lawyers and have the bank account to keep you busy until you die and so if you plan to bring a lawsuit you'd better be prepared for shock and awe. Just having a contract isn't enough by itself.
You are right that a service level agreement is a very good idea but it isn't going to matter if it is cheaper for them to screw you anyway.
Otherwise, you don't have a business, you just have a hobby.
That's a nice sound bite but it's complete BS. When you are a small business or a startup you generally simply don't have the resources to fight a company the size of Google. You can have whatever agreements you want but if they decide to screw you there isn't much you can do about it. I've started several companies where we had to depend more than is ideal on a single large vendor and it's freaking terrifying if/when you don't have alternatives - contract or no.
We all stop using google services. Any little tick locks you out of your account, even adding a new account on a new phone is locked after 60 seconds. F- that.
Google has done many other things including REQUIRING LOCATION SERVICES ON ANDROID to be turned on for some bluetooth and Wifi functions without explanation. WTF? Google and Microsoft with the Windows 10 data theft is untrustworthy.
Not to defend Google, I don't hate them, but of course (as seen in the FB case) we're still evaluating how much this "user preferences"-based business can do and what it represents to our rights (privacy, Freedom, transparency etc.)
Now, what would you do?
You monitor wind generators and you think someone is hacking into them. Aside from danger for nearby lives, if someone renders any plant unusable by a specific tweaking of operational settings -- yes, I mean with catastrophic consequences -- are you going to keep things working, to keep the uptime? I'm talking about what it means for people if your wind, solar, hydroelectric or nuclear installation is put out of order and people cannot move because power is down.
Actually, Google suspending access to your account may be also dangerous, so perhaps you could devise a redundant way to do it (even if you keep using Google and/or any other cloud service).
Comment removed based on user account deletion
As much as I appreciate Google's services as an individual, I do not use them for any business-critical need for the exact reasons listed in your article. The service may be generally reliable, but what is not acceptable is the casual way they handle exceptions. It clearly defines the pecking order. They are the masters, you are the peon, the paying peon but still a peon.
The question is why was the infrastructure shut down. Not if it can be built to stay on for ever. Of course it can. The issue is what is a reliable way to ensure that your infrastructure does not fail.
In this case there was something wrong with the system and google shut it down. Should they shut it down or allow it to continue working in "LIMP" mode is a good and not easy question.
I myself have shut down several clients. When a client has something on their system (Eg. a break in that could cause harm to other clients) we do not even phone the client we shut their servers down and afterward sort it out with them. This is probably what happened at the Google cloud system. There is other ways to do this one other solutions is to allow the system to continue working but limit their capabilities severely but lets be honest most ISP's including us will simply shutdown the system if there is malicious data coming from them and this is not only for the cloud.
IT could happen in your own data infrastructure as well (we have had this happen to clients). Simply changing IP's resolve the issue but it does happen if a system is not properly secured.
Not just Google Cloud, all cloud services can pull their plug without warning.
All data centers should be autonomous, independant, and redundant, with adequate offsite backup storage.
In the early stages of the digital age, companies spent 1% of gross revenue developing and maintaining information technology. Now, companies try to shave every penny, consolidate, and centralize their data services.
If millions are on the line and you don't wake the CFO, he/she/it is going to be very angry. Chances are, if millions are on the line, the CFO will be waking you up even if the problem is technical.
I came to the datacenter drunk with a fake ID, don't you want to be just like me?
As every anti cloud sysadmin that has been burned or replaced by AWS breaks out their quotation marks and waves their canes in the air.
This is not about a server failure, router failure, logic bomb, or hacking incident - this is a hosting company deciding to flip a switch and simply turn off your entire infrastructure because an automated process determined "something" in your entire ecosystem was abnormal......
Frack that. never. not 1 dollar.
Except you don't seem to understand the "quantum of inconvenience" the system helped you avoid by taking down your systems like that. Just because you perceive the decision of the machine as wrong does not mean that it was in fact wrong. Perhaps based on the information it had it saw that you would incur significantly more downtime if it hadn't temporarily shut off your services? Just because the machine makes your life inconvenient does not mean what it did was wrong, that is unless you don't care about all the potential things that could happen to your system running on other people's equipment? Maybe you should learn
your place in this world, human.
Dumb AI vs dumb human. Who wins?
Is not like in Hollywood movies AI is going to kill you literary. but it will destroy eventually
That sounds like a classic failure when migrating to a distributed network. Just clone your workflow from the existing shitpile. That basically always fails and no one is happpy. Part of the migration should be retooling your stack to take advantage of the providers services.
Absolutely agree. The benefits of "the cloud" which is one of a string of terms which initially had zero meaning and people kept trying to figure it out until they'd invented something... can be had in a private cloud infrastructure, anything else is just leveraged hosting which should definitely be a no-no for enterprise.
1) Don't trust another company with your critical infrastructure!
We trust utilities, how come all this "I don't like the cloud, do it locally" doesn't apply to anything else critical?
And yet no one will see, "You should not use public utilities" or "public roads", or even POTS, etc. Build your own. Critical infrastructure that SOMEONE ELSE CONTROLS. Oh, the horror. You all sound like the waning days of the buggy-whip makers. You should use the cloud because progress is built upon the shoulders of everyone else.
If the site was so important, why didn't they plan for an outage? Sounds like they just assumed it would always be up. Why didn't they ask Google about the conditions and possible sources of outages, and plan accordingly? My employer has done multiple cloud deployments, and this is always part of the planning. What do we do if cloud provider goes down or experiences and error? What are their operational norms around infrastructure changes and the like? What is our downtime tolerance? Is the cost worth the risks (i.e. not that important) or do we invest in multiple sites? Perhaps not the whole story is here, but it sure sounds like they didn't do much due diligence before throwing all in on "cloud"
This posting is provided 'AS IS' without warranty of any kind, implied or otherwise.
Here is my bit of wisdom. Get everything running locally in a cloud container framework like Heroku or whichever on you prefer. When and if you get to the point where you need disaster recovery or more scalability you can shift everything to the cloud until you recovery or at peak periods load balance to the cloud.
The cloud is not magic. It just means that other systems administrators are running your servers and they do screw up quite often.
If you are really serious about running in the cloud and having high reliability, let me introduce chaos monkey
https://github.com/Netflix/cha...
This is what Netflix uses to make sure that they can keep running with Amazon availability zones, instances, or whatever they call them disappear. I believe that the day that Amazon accidentally took all of it's storage offline and killed half their cloud, that Netflix survived as was able to keep going.
Read this for amusement: https://aws.amazon.com/message...
Ooops, we deleted AWS....
The benefits of "the cloud" which is one of a string of terms which initially had zero meaning and people kept trying to figure it out until they'd invented something
False. "Cloud computing" originally meant that you were leasing instances on someone else's servers, and you could spin up more instances any time you were willing to pay for them. That's still a useful meaning, unfortunately people now have "private clouds" which we just used to call "clusters", so the phrase actually means less than it did originally.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
You put such mission critical stuff on "Google Cloud"?
Why??
Part of the migration should be not.
It little behooves the best of us to comment on the rest of us.
There is a three-part mantra that project managers learn at the enterprise level:
This project fell to the third part.
You work in an industry where an excuse is good enough. Which is nice for you.
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
The issue here, is about plain arrogance on Google's part.
They have no respect or care for their customers. This happens time and time again, blaming it on a bot or system is rubbish. You can't soften the blow on Google by saying its the system, or the search algorithm. The system was created/written by Google.
All the fools who relied on Google should step back and think for a second. Google has nothing to worry about all fools are plugged into their system whether they want to, like it or not.
Did the CFO or CTO get a bonus before this for making the decision to save money? Did one of the executives get a bonus for 'solving the problem'?
That is how it works where I am....bonus for exec in what later turns out to be a bad decision & then bonus to the exec when the problem happens and others solve the problem.
No more banging head against the wall....
If a vendor is selling to market that needs an on-prem requirement, it's a non-starter to not offer such a feature.
Many vendors, after doing their market research, have concluded that on-premises requirements such as yours are a rounding error. The benefit of satisfying them does not exceed the opportunity cost of monopoly rents that can be extracted by not satisfying them.
You get shocked but blogging is today's era it is difficult and you need to spend lot of time and need skills
but here i have my free Affilate marketing courses( amazon Fba, and many more ) . If you what to see just click on
http://extra-income.club/ its really amazing and instead of dreaming $20. You will able to make $1000 per week.
that's my promise
Sooner or later folks are going to have to realize that "Cloud" is just market-speak for another box that you don't have access to (until you pay the $9.95 access fee...). I have run my "own cloud" since 1999. Sits right across from me, and anytime it goes down I have direct and full immediate access. And this is on stuff only I consider important (apparently..)
You keep going until you die..."Me".
I sent that email. So when I get back from shopping, I'll forward that info along with a description of your problem to google.
In other words, one must also consider malice from the hosting provider when shopping for a hosting provider.
The cloud rained on him. If it's a bad enough storm, you lose everything.
"Didn't pay enough for Gogol to care."
AWESOME!
CAPTCHA: courting
If you're not going to be hybrid (which gives you other opportunities) then you simply DNS load balance between regions AND cloud providers. Easiest way? Containerize.
The complexity comes into play around your databases, but there are a myriad of well known solutions to all of these problems.
There's no other rational way to provide solid service that to spread your risk.
Setup DNS Failover (or load balance if you prefer)
Setup in multiple AWS regions.
Setup in multiple GCE regions.
Optionally setup for Azure
If you're really paranoid, have an on premise instance somewhere local to you (or a metapod in house or something similar.)
Containerization makes all of this vastly simpler than in the past.
As many others have mentioned - don't trust anyone to be up all the time, trust that at least one of them will be up all the time.
Loading...
There seems to be a lot of comments saying why would you ever use someone else's infrastructure. The last few companies I worked for have needed to and they were big companies. You pay services to be reliable. They'd lose business if they weren't.
Building infrastructure is not cheap. There's a reason only a few companies build things like electrical grids and telecommunications networks. Businesses do business with other businesses. Unless your server needs are small and simple or you have a lot of money to burn, there is value of large-scale distributed server centers. Even if your server needs are small, you should have a guaranteed better uptime than doing it locally-- they have redundant machines with redundant storage and redundant power supplies with generator backups. Also, having your server and data offsite and backing them up means you don't lose everything if your brick and mortar business catches on fire.
Check your provider out if you have concerns and make sure you have a disaster plan. Also, redundant cloud services is always an option.
SMBs run their businesses on SAAS services. So, essentially anyone serving SMBs is in the same boat.