Slashdot Mirror


'Why You Should Not Use Google Cloud' (medium.com)

A user on Medium named "Punch a Server" says you should not use Google Cloud due to the "'no-warnings-given, abrupt way' they pull the plug on your entire system if they (or the machines) believe something is wrong." The user has a project running in production on Google Cloud (GCP) that is used to monitor hundreds of wind turbines and scores of solar plants scattered across 8 countries. When their project goes down, money is lost. An anonymous Slashdot reader shares the report: Early today morning (June 28, 2018) I receive an alert from Uptime Robot telling me my entire site is down. I receive a barrage of emails from Google saying there is some "potential suspicious activity" and all my systems have been turned off. EVERYTHING IS OFF. THE MACHINE HAS PULLED THE PLUG WITH NO WARNING. The site is down, app engine, databases are unreachable, multiple Firebases say I've been downgraded and therefore exceeded limits.

Customer service chat is off. There's no phone to call. I have an email asking me to fill in a form and upload a picture of the credit card and a government issued photo id of the card holder. Great, let's wake up the CFO who happens to be the card holder. What if the card holder is on leave and is unreachable for three days? We would have lost everything -- years of work -- millions of dollars in lost revenue. I fill in the form with the details and thankfully within 20 minutes all the services started coming alive. The first time this happened, we were down for a few hours. In all we lost everything for about an hour. An automated email arrives apologizing for "inconvenience" caused. Unfortunately The Machine has no understanding of the "quantum of inconvenience" caused.

30 of 508 comments (clear)

  1. Sorry, but... by Known+Nutter · · Score: 5, Insightful

    If millions of dollars are on the line, you should be running your own systems. Seriously. I'm not an IT expert, data infrastructure guy or anything. I'm just a dumb nerd, and I know that. Never trust your data to a third party when millions are at stake -- let alone critical infrastructure reliability.

    --
    Beware of the Leopard.
    1. Re:Sorry, but... by Nutria · · Score: 5, Insightful

      As if servers doing down can't happen if you host it yourself.

      But then you're in control, instead of having to rely on some amorphous, anonymous monster that only allows communication via automated email.

      --
      "I don't know, therefore Aliens" Wafflebox1
    2. Re:Sorry, but... by Known+Nutter · · Score: 4, Insightful

      Of course servers go down. But if they are your own, you have many more levels of control than if they're in the "cloud" -- redundancy, as an obvious example. Or at least you have keys to the goddamn data center and don't have to provide the CFO's p-card to resolve the problem.

      In the case of TFA, it sounds like a consumer-grade solution was applied to an enterprise-level problem. Maybe that's Google's fault, maybe that's the customer's fault -- sounds like there's plenty of stupid to go around.

      --
      Beware of the Leopard.
    3. Re:Sorry, but... by Anonymous Coward · · Score: 5, Insightful

      That's why you also configure a failover at a secondary location. This server punching guy sounds like a damn idiot. With millions on the line why is there no disaster recovery plan outside of "call the CFO for his credit card?"

    4. Re:Sorry, but... by Junta · · Score: 4, Insightful

      Ah, but SLAs cost money, and part of the cloud movement has been "see, these vendors can make hosting cheaper than it used to be", precisely because all they have to do when they screw you over is say "whoops, sorry", and as such don't need to invest in *too* much resiliancy and don't have to worry too much about liability.

      --
      XML is like violence. If it doesn't solve the problem, use more.
    5. Re:Sorry, but... by Joce640k · · Score: 4, Insightful

      Google cloud isn't supposed to be enterprise grade? I bet that's news to google.

      If you read the fucking summary it doesn't say the servers went down, it says that some robot shut them down when it detected "suspicious activity". No review was done, nobody at google called the customer, nothing.

      This is clearly 100% Google's fault.

      --
      No sig today...
    6. Re:Sorry, but... by El+Cubano · · Score: 1, Insightful

      Google cloud isn't supposed to be enterprise grade? I bet that's news to google.

      ... it says that some robot shut them down when it detected "suspicious activity". No review was done, nobody at google called the customer, nothing.

      Those two statements seem, at least to me, clearly contradictory. I can understand summarily shutting down a customer on a residential Internet connection, or a small business shared web hosting provider. However when providing an "enterprise grade" service, you should be prepared to give your customers the benefit of the doubt. About the only instances I can think of for an enterprise service to shut down a customer is if they are greatly exceeding their allocated resources and/or the activity associated with the customer is actively in the process of harming other customers. In both of those instances, though, the right thing to do is attempt to contact the customer first. Of course, if Google attempted to contact the customer and could not get a hold of him (perhaps because the contact person was listed as someone who has changed employ, or because their phone was off, etc.) we would not know that from this individual.

      This is clearly 100% Google's fault.

      Based on what we know, that seems like an accurate statement.

    7. Re:Sorry, but... by AmiMoJo · · Score: 5, Insightful

      Google's cloud services are enterprise grade if you pay enterprise prices for them.

      If you pay for it on a credit card assigned to the CFO then you are not an enterprise and you are not paying enterprise prices.

      They chose a cheap, no-SLA no-support service, probably because it was cheap. Then they get upset that they aren't receiving the support they didn't pay for.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    8. Re:Sorry, but... by Anonymous Coward · · Score: 5, Insightful

      As if servers doing down can't happen if you host it yourself.

      But then you're in control, instead of having to rely on some amorphous, anonymous monster that only allows communication via automated email.

      Exactly right. If your servers go down, YOU can fix them immediately and you don't have the problem of trying to get in touch with some support person who may or may not respond quickly, or may not respond at all if you can't even figure out how to get in touch with them.

      That said, this was a case of a company trying to be cheap, and instead of an enterprise account they tried to run critical applications on what is supposed to be a consumer-level account, which is why they started getting e-mails from Google about "suspicious activity".

    9. Re:Sorry, but... by rudy_wayne · · Score: 4, Insightful

      If you read the fucking summary it doesn't say the servers went down, it says that some robot shut them down when it detected "suspicious activity". No review was done, nobody at google called the customer, nothing.

      This is clearly 100% Google's fault.

      Yes, Google was a little too quick to pull the trigger. They should have sent out a couple of e-mails first before shutting things down.

      But

      The "suspicious activity" was due to the cheapfuck customer running heavy, 24/7, critical applications on what is supposed to be a consumer-level account.

      You get what you pay for.

    10. Re:Sorry, but... by Zenin · · Score: 4, Insightful

      That Amazon message is very different. It's basically telling you (days or weeks in advance!) that there's a serious hardware failure and the underlying hardware needs to be pulled from service.

      It's literally as simple as a reboot to move to new hardware. You can even catch the notification easily with a CloudWatch Alarm and trigger the purpose-built auto-recovery action to do it for you the moment the instance goes into that state. Or use an AutoScale group and it'll just cycle the hardware out for you w/o any downtime or manual action. TMTOWTDI

      If you aren't living by the motto, "Everything Fails, All The Time", then you're simply doing Cloud wrong. To be fair, even if you're entirely on your own physical hardware in a datacenter...you're still Doing It Wrong if you aren't counting on your hardware to always be failing all the time.

      --
      My /. uid is better then your /. uid
    11. Re:Sorry, but... by ras · · Score: 2, Insightful

      If millions of dollars are on the line, you should be running your own systems.

      Apparently Netflix and friends (who use these services) don't agree. Neither do I.

      Running your own system doesn't fix anything. Yes, you won't go down because someone forgot to renew the credit card, but you will go down because of a faulty RAM chip, or an air-conditioner going out, or a ISP getting a route wrong, or a backhoe going through a cable. None of those failures are likely to take out Google as they will just move you to new working hardware in a working location - both of which they have in abundance along with well tested procedures to do the move. In fact it will probably happen automatically without you having to raise a finger.

      Comments like this betray something that appears common in the industry - people have absolutely no idea what engineer tradeoffs using the cloud involves. For example rented cloud computing resources are far more expensive per CPU cycle (perhaps an order of magnitude so) then a machine lying on the bedroom floor at home. Yet you will hear people say they use the cloud because it's cheaper. Then later you will hear them complain bitterly about how their cloud bills are breaking them.

      I don't know why they are surprised. It's not like the cloud provider has some magic cookie jar they can pull unlimited CPU cycles from. They, like you, had to buy the servers, pay for the power, pay for building it's in, pay the highly redundant internet connection they provide, and pay for people who keep the things running, pay for the accountants, tax, and all those other business overheads. They also have to make a profit. Anybody thinks that could do this and still sell you those CPU cycles for cheaper than you can make them yourself must be smoking something. The flip side of course is they are really good at doing it, you don't have to concern yourself with replacing dead disk drives, swapping out power supplies you have (hopefully) top notch staff at your disposal you didn't have to train, and don't have to pay.

      Regardless, the cloud is cheaper in the sense that it's cheaper to stay in a $500/night hotel room for a night than it is to buy an apartment for the overnight stay, but those CPU cycles aren't "cheap". But they are cheap in the sense that can't buy 1/2 an apartment and nor can you buy 1/2 a server, but you can rent a room and buy a VM that shares a machine with 1000 other VM's.

      The moral of this story isn't "he should be running his own systems". It's that if you want really high uptimes you have to design a system that wouldn't die because of a single point of failure, be that CPU, internet, power, or administrative control (ie, what happened here - someone has the power to flip the switch). But doing that is currently hard. As in you need the top software engineers on the planet type hard. And the sad bit is you can't just "buy it from Google (or some other cloud provider)" as he apparently thinks he should be able to, because that, as he had discovered, opens you up to a single point of failure when that provider flips the switch. Which makes the problem is so hard an MBA can't solve it by solve it by waving a money wand, he has to build and maintain a team of top flight software engineers. So we aren't talking just hard, we are talking really, really, really hard.

  2. Wait by Anonymous Coward · · Score: 5, Insightful

    The first time this happened,...

    Why was there a second time?

  3. Donâ(TM)t trust an ad company. by Anonymous Coward · · Score: 3, Insightful

    Over 90 percent of Google income is adverts. You would be absolutely insane to trust them with your business or educational institution data.
    Iâ(TM)m not saying MS or Amazon is great but at least their revenue model is not based exclusively or largely on data mining of users.

  4. Re:Amazon's cloud s no better by fahrbot-bot · · Score: 5, Insightful

    ... but Amazon is a disaster for a company that isn't fully fault tolerant and has critical servers that can't go down.

    If your company has "critical servers that can't go down" (wherever they are) and you're not fully fault-tolerant, you're the disaster, not Amazon.

    Not here to pick a fight, just sayin'. (one finger pointing at someone else is also three fingers pointing at yourself)

    --
    It must have been something you assimilated. . . .
  5. Your failure to plan... by kenh · · Score: 4, Insightful

    I have an email asking me to fill in a form and upload a picture of the credit card and a government issued photo id of the card holder. Great, let's wake up the CFO who happens to be the card holder. What if the card holder is on leave and is unreachable for three days?

    Uh, I don't know - take a picture of each and save them on your phone, in case you need them?

    You report everything was back up within 20 minutes once you submitted the requested information - that seems pretty good to me.

    Now, about your decision to only run one instance of your mission critical application suite on exactly one cloud service...

    The story here is you consider it someone else's fault for your failure to plan/prepare for an outage.

    --
    Ken
  6. New scam incoming! by Calydor · · Score: 4, Insightful

    I have an email asking me to fill in a form and upload a picture of the credit card and a government issued photo id of the card holder. Great, let's wake up the CFO who happens to be the card holder. What if the card holder is on leave and is unreachable for three days? We would have lost everything -- years of work -- millions of dollars in lost revenue.

    Somewhere in Russia, India and Nigeria, several callcenters full of scammers came all at once.

    --
    -=This sig has nothing to do with my comment. Move along now=-
  7. Re:It makes sense why Google is like this by Anonymous Coward · · Score: 2, Insightful

    All of the XKCD-citing hipsters

    This is a good time to remind people that XKCD wrote that comic to justify the forceful expulsion from the entire Internet of ordinary people who disapproved of the corruption of a gang of liars who happened to be connected to multiple foreign intelligence agencies, including state sponsors of al-Qaeda. And it turns out they own all of the press too, given how nobody has told this story correctly.

    All of that "Gamergate harassment?" There was one tweet. ONE. It was sent by Slade Villena to Randi Harper. The rest was either sent by their own people or random jokers. None of it was ever traced back to the Gamergaters despite having the Home Office, QCRI, DHS, FBI, Google, and Microsoft on the case trying to find somebody who was guilty so that they could justify their funding. When they couldn't find anyone, they simply concluded that Gamergate was a harassment campaign because everyone said it was and they pocketed their paychecks.

    When Trump says that the MSM is fake news, he is not lying. When he calls them the "enemy media," remember that I mentioned state sponsors of al-Qaeda. He is not lying.

  8. Yeah, big warning signs on the user side here by Phil+Urich · · Score: 5, Insightful

    The first time this happened,...

    Why was there a second time?

    So many of the problems here (ex. paying with a credit card and one that has only a single person's name on it? Having no fallback that can be spun up elsewhere?) are foolish if this has never happened before, and utterly, mind-bogglingly idiotic if this in fact has already happened before. It's one thing to be blind of something you should know could be a problem, it's quite another to be blind and wholly unprepared for a problem you've personally experienced! Something seems fundamentally wrong at this company.

    Also, if your entire business can die because it takes an unexpected few days off, then perhaps your business is running a bit too raggedly and doesn't have enough meat on the bones . . .

    --
    I remember sigs. Oh, a simpler time!
    1. Re:Yeah, big warning signs on the user side here by gweihir · · Score: 4, Insightful

      There probably is just one problem here: Utterly incompetent and greedy management that made this demented decision.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  9. Cheap service, cheap results by fyngyrz · · Score: 5, Insightful

    The company thought they could get away with paying less for server infrastructure. They can. But they get less. This is one of the "less" things they get.

    If you value your data, host it yourself, preferably in multiple locations. If you want to go cheap, then you can expect to lose things.

    Like your data, or access to it, or availability of it.

    It's not such a smart thing to cheap out on the important stuff.

    Of course, convincing the bean counters of future risk inherent in what appears to them to be current savings... good luck with that.

    Well, best to get rid of your bean counters. :)

    Here's a maxim of mine I like to drop on the table during discussions like these:

    If you can't afford to do it well, you almost certainly shouldn't be doing it at all.

    --
    I've fallen off your lawn, and I can't get up.
    1. Re:Cheap service, cheap results by Anonymous Coward · · Score: 1, Insightful

      Sorry, all of that is horseshit. Google is the one making the mistakes here, not the people who took them at their word that they offered robust professional services.

    2. Re: Cheap service, cheap results by Anonymous Coward · · Score: 4, Insightful

      Then take the contract you have with them to court and get your effing money back. What? No contract? You deserve what you get.

    3. Re:Cheap service, cheap results by gweihir · · Score: 4, Insightful

      You get what you pay for. They thought they could get much more than they paid for. 100% their fault. At the very least, they failed to evaluate what they actually got. And, since this is apparently the second time this happens, they also seem to be unable to learn.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    4. Re:Cheap service, cheap results by pnutjam · · Score: 4, Insightful

      That's what I tell people about the cloud. It's great when it works, but if you have problems you are almost certainly just a small fish. The best you can hope for his your money back, but your services will still be down and your left scrambling to build elsewhere.

  10. Re:Disaster Recovery by gweihir · · Score: 3, Insightful

    They did not have a DR plan _after_ this happened before. They did apparently not even buy enterprise grade cloud services, you know, those with an SLA. The problem is not the cloud here, but the utter morons that decided to use it in that way.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  11. Re:Disaster Recovery by tk77 · · Score: 3, Insightful

    No.

    Relying on a single company no matter what their internal redundancy is, is not having a good DR plan. Especially when the amount of revenue involved his so high.

    As others have pointed out, the other cloud providers do the same thing. You can have your stuff spread all across the country/world with a single provider and if a glitch in their system says you shouldn't have service, it will all be turned off (as is presented in the TFA and from others suggesting at least Amazon has done the same thing).

  12. Don't use Google Cloud's... by viperidaenz · · Score: 2, Insightful

    ... cheapest service they offer, the one that doesn't include 24/7 phone support - let alone a guaranteed SLA, to host your multi-million dollar wind/solar plant, where any service outage will cost you millions in service penalties.

  13. Exactly!! Ding, Ding, Ding! by King_TJ · · Score: 4, Insightful

    I may be one of the "old timers" who I'm told is thinking about things in an "old school" way when I say this. But I've *always* warned people that "The Cloud" just means you're giving somebody else the responsibility of handling your data and the systems it runs on.

    That makes sense sometimes. I'm not "anti cloud". But for anything really critically important to a business, I feel you should have it running locally and THEN consider cloud options as hot-failover sites, backup sites, etc. With cloud hosting, the whole thing is off limits to you as soon as your Internet circuit goes down, for one thing. With it running locally, you can still use it just fine anywhere on your LAN.

    But additionally, if the provider hosting your stuff goes bankrupt or merges with someone else, or just plain decides it's not profitable enough without some pricing changes -- where does that leave you? Technically, they can just disappear with your whole software and data configuration overnight. Or they can put trained apes in charge of maintaining things so it suddenly has huge security holes. Who knows?

    When you run things yourself, YOU are where the buck stops if things go wrong. If you're good at what you do, that should be more of a comforting thing than a scary thing. I've seen too many shops trying to cut corners on the I.T. hiring budget by bringing in less experienced people who really can't properly run the systems they're supposed to be caring for. The cloud for them is a crutch ... a way to get things done that are beyond their abilities. But that's not an ideal situation for a business to put itself in.
     

  14. Re:It makes sense why Google is like this by serviscope_minor · · Score: 4, Insightful

    This is a good time to remind people that XKCD wrote that comic to justify the forceful expulsion from the entire Internet of ordinary people

    No it's a good time to remind people you've gone way off the deep end, mate.

    It does however prove that just aobut anything in favour of gamergate no matter how batshit insane will get modded up here. Like this bit:

    of al-Qaeda.

    Aside: isn't that sort of old news even for the crazies? Aren't ISIS responsible for chemtrails now or are they merely a false flag perpertrated by the deep state t ostop us knowing the truth about how a chemtrail spraying plane actually did 9/11?

    All of that "Gamergate harassment?" There was one tweet.

    Gamergate was one tweet: +3 Insightful. I think that might be a new low for slashdot moderation.

    --
    SJW n. One who posts facts.