Slashdot Mirror


Microsoft Azure Outage Across the Globe

hawkinspeter writes: The BBC reports that overnight an outage of Microsoft's Azure cloud computing platform took down many third-party sites that rely on it, in addition to disrupting Microsoft's own products. Office 365 and Xbox Live services were affected.

This happened at a particularly inopportune time, as Microsoft has recently been pushing its Azure services in an effort to catch up with other providers such as Amazon, IBM, and Google. Just a couple of hours previously, Microsoft had screened an Azure advert in the UK during the Scotland v. England soccer match."
(Most services are back online. As of this writing, Application Insights is still struggling, and Europe is having problems with hosted VMs.)

19 of 167 comments (clear)

  1. Azure is blue, ain't it? by Anonymous Coward · · Score: 3, Funny

    Global BSOD!

  2. Yawn ... by gstoddart · · Score: 5, Insightful

    Cloud fail, like nobody saw that coming.

    If you don't own and operate your own infrastructure, you're at the mercy of someone else.

    And clearly that someone else can't guarantee you robustness with this magic cloud.

    All of these people who say "awesome, because, cloud" -- well, I have yet to be convinced that any of these vendors can provide as much uptime and reliability as a decent IT department.

    I suggest we start calling it Clown Computing -- you cram a lot of Clowns into a tiny little car, and hope it keeps going.

    When something goes wrong, hilarity ensues.

    --
    Lost at C:>. Found at C.
    1. Re:Yawn ... by i+kan+reed · · Score: 5, Interesting

      Yeah, but it's never really been about the reliability. It's always been the "not paying your own IT maintenance staff" thing that's the big draw.

    2. Re:Yawn ... by dontbemad · · Score: 4, Insightful

      Once again, missing the point. In my (small) shop, by using azure (which has worked well for us), we avoid having to use money to hire admins to maintain any sort of in house servers we might have. We can then put that money towards more developers (or better salaries for us current devs), as well as paying for training, nicer dev machines, etc. At the same time, if we do have a problem with any sort of hosted service through azure, support is literally a phone call away, and I can't remember the last time a resolution didn't happen within a couple hours.

      Sure, cloud computing has its short-comings. But it has also allowed a litany of small companies who simply can't afford to own their own infrastructure to do business.

    3. Re:Yawn ... by Kobun · · Score: 3, Informative

      I'd like to take your question "Is it really a good trade off?" and toss out an example set of numbers to summarize it. Let's say there are 250 business days in a year. Operations run from 8am until 6pm, not counting after-hours processing and maintenance. Revenue is $100 million. Gross profit percentage is 20%. This gives per hour revenue of $40,000, per hour profit is $8,000. A day of lost revenue is $400,000, or a loss of $80,000 of profit opportunity (assuming that opportunity costs are not recoverable). My own calculations for my department are somewhat similar, except I've also included the additional benefit my employees bring in for the work they do when they aren't working on maintaining/improving uptime. Avoiding the cloud is almost a no-brainer in our circumstances, except for very specific & limited services.

    4. Re:Yawn ... by Crudely_Indecent · · Score: 4, Interesting

      There is something you can do about all of those conditions.

      With cloud, you just wait for the rain (outage). You can pray (call an outsourced tech support department) for it to stop raining (services restored), but until god (cloud provider) decides the rain is done (fixes the problem), you're getting wet (offline).

      That gives me a new "Cloud" tagline:

      Cloud - We will definitely rain on your parade.

      --


      "Lame" - Galaxar
    5. Re:Yawn ... by serviscope_minor · · Score: 4, Insightful

      Sure, but when you have outages and stability issues which impact your business, is it really a good trade off?

      Of course it is. Outsource to the cloud and cut the quarterly costs massively by laying off staff. Get a big bonus. Possibly share options go up due to better profits and blathering to the shareholders about the cloud. Sure 3 years down the line it might tank for a few days and in one fell swoop wipe out all the savings and then some.

      Not my problem, I'll be long gone.

      So is it worth it? Hell yes!

      --
      SJW n. One who posts facts.
    6. Re:Yawn ... by Bengie · · Score: 4, Insightful

      There are many reasons to use the cloud.

      1) You're too small to afford enough full time IT
      2) You can't afford the capital investment into your own servers
      3) You need a low latency global CDN like service, but you can't afford dedicated servers running everywhere
      4) You need only temporarily need to scale up your servers to handle burst load
      5) I'm sure there are other reasons.

    7. Re:Yawn ... by Trailer+Trash · · Score: 5, Interesting

      Let me explain it from my point of view. I own and operate a one or two man software company that also hosts web sites. I work in the flim & tv music industry, meaning I have a shit load of music (literally terabytes) that has to be available for download.

      8 years ago I owned a rack of servers downtown here that I managed myself. Honestly, it wasn't that bad. I bought reliable used 1U servers (mainly IBM and Dell) off ebay and stocked them with disks. I ran FreeBSD and Linux, used RAID, etc. But I always had two issues to deal with. The main one was "I have to always be available to handle hardware issues".

      My company isn't big enough to hire someone to do it, but I managed for nearly 10 years with no disasters. In that time I had a motherboard crap (when I was starting out with one server - ouch) and a few disks fail. In all of those times I had to go in - sometimes in the middle of the night - and fix/replace whatever was wrong.

      Then I found Amazon AWS. Here's the kicker - it was actually cheaper for me to simply "rent" storage from them than to rent rack space for my own servers. I moved my servers to linode.com - again it was cheaper although they're nowhere near as fast as my former dedicated servers were, but they're fast enough for my applications and I can always move to larger instances where needed. And that eliminated my maintenance issues for hardware while costing less per month and maintaining the same 3-4 nines level of availability that I've always had. Oh, one other thing - S3 makes it just as easy to secure my audio files but the delivery speed can easily saturate any pipe that the files are being delivered to.

      So the cloud might not be "magical" and solve all the world's problems, but for small IT shops it's great. Everything I do is on the internet so the whole "what if your connection goes down?" issue doesn't exist for me. I do not recommend such a solution for everybody. I have clients in the industrial wholesale space and their inventory & sales system definitely should be on-site with off-site backups. But their web site can be hosted elsewhere.

      Anyway, yes, the "cloud" is very useful for many businesses.

    8. Re:Yawn ... by nine-times · · Score: 4, Insightful

      Yes, the "cloud" servers sometimes have outages. So do managed hosting providers. So do internal servers. And frankly, although every business thinks that what they're doing is super-important and they can't afford even the briefest outage, the fact is that most businesses can.

      If Azure or AWS go down for an hour, it makes news and everyone freaks out because a lot of people are using them. If your business's server goes down for an hour, it does not make news, and people don't freak out. But for the business experiencing that 1 hour of downtime, what difference does it make whether they own the hardware or it's in "the cloud".

    9. Re:Yawn ... by Jaime2 · · Score: 4, Insightful

      The calculations are simple when you assume the cloud will fail and your infrastructure will not. A real tradeoff calculation has to include estimates of the reliability of both scenarios. The answer to "Is it really a good trade off?" will be entirely based on estimates and opinions. I'm not saying you're wrong, I'm just saying that the math does not spit out "no-brainer".

      Some cloud providers will even give you SLAs with real money behind them. So, they could conceivably come up with a no-brainer deal where the cloud provider guarantees your $80,000 every day, whether it's from having your business up and running or writing you a check.

    10. Re:Yawn ... by Dutch+Gun · · Score: 3, Interesting

      I don't think anyone is disputing that hosted online services are both useful and, in some cases, absolutely essential, especially for smaller businesses. Well, maybe some people are, but they're pretty much Luddites, so we can ignore them. It's just that in the rush to push everything to the cloud since that's seen as some sort of panacea, people tend to forget that there are serious consequences to outages, and the more you push services to the cloud, the greater the impact of those outages will be. It's essentially putting all your technological eggs in one basket.

      As much as people complain about proprietary file formats, those really don't hold a candle to proprietary services as far as vendor lock-in. If the service you chose, for instance, starts to go south on a regular basis, and you've built your entire ecosystem inside a specific vendor's cloud, you could be in a world of hurt.

      That being said, my feeling is that these sorts of system-wide outages are simple part of these services growing pains. Even now, keep in mind that these sorts of large-scale failures are rare enough that they make international headlines. In another five to ten years, it's going to be even rarer still. Otherwise, fewer large players will trust them for critical infrastructure over the long haul. For smallish businesses, even with occasional outages, it's still probably a net win.

      --
      Irony: Agile development has too much intertia to be abandoned now.
    11. Re:Yawn ... by segedunum · · Score: 3, Insightful

      Once again, missing the point. In my (small) shop, by using azure (which has worked well for us), we avoid having to use money to hire admins to maintain any sort of in house servers we might have.

      Who maintains your Azure infrastructure (I hope you built in all that lovely redundancy for these problems) and how often do you really need to maintain internal servers? If these are on 24x7 you're going to be paying through the nose and if you miss a monthly fee, off you go. Not to mention that cloud servers are horrifically under resourced compared to hardware you can buy, so you generally need many more of them, and none of the bandwidth, I/O or CPU resources are guaranteed to be yours no matter what your meaningless agreement says.

      We can then put that money towards more developers (or better salaries for us current devs), as well as paying for training, nicer dev machines, etc.

      Ahhh, yes. Developers who believe deployment can be bypassed as a cost and running applications in production (which is kind of important to any company running web applications and who relies on them for income) simply doesn't matter.

      At the same time, if we do have a problem with any sort of hosted service through azure, support is literally a phone call away, and I can't remember the last time a resolution didn't happen within a couple hours.

      You've been exceptionally lucky, or you're being economical with the truth ;-).

      Sure, cloud computing has its short-comings. But it has also allowed a litany of small companies who simply can't afford to own their own infrastructure to do business.

      I've also seen a litany of small companies go out of business with cashflow issues who thought like that. Funny that. Yes, the infrastructure is cheaper if you don't run it all the time. I think I once calculated that if you have a server on for more than eight hours a day then you're simply being milked for a monthly fee.

    12. Re:Yawn ... by uncqual · · Score: 4, Insightful

      However, in a widespread outage like this, I'll bet the big cloud providers have a better record of rapid recovery than their customers had in-house. By necessity, MS, Amazon et al have very competent engineers who know the product well available to pull off what they are doing (including sleeping) and jump into any really serious problem. There simply are not enough such engineers to go around all the mid-sized IT organizations in the world nor interesting enough work to keep these engineers interested and sharp at most of these IT organizations (to say nothing of the cost of keeping such engineers around).

      For a car analogy... When your high end car has a nagging problem that your local mechanic can't figure out, the dealer often can figure it out quickly, possibly with the help of a factory specialist who deals with (say) ECUs on only this make all day, every day. Rarely can an independent mechanic specialize enough to come close to the factory specialists in diagnosis. Now, if your car just has a dead battery, your local mechanic may give you faster, better, and cheaper service than the dealer.

      --
      Why is there an "insightful" mod and why isn't it "-1"? If I wanted insight, I wouldn't be reading /.
  3. Re:Out of band patch.. by afidel · · Score: 5, Informative

    I installed it last night on all domain controllers after testing it in my isolated testing network. It's not really optional since it allows any domain user to become domain admin and the only resolution to that is a domain rebuild or authoritative restore. It's also already been seen in attacks in the wild so you can assume the next client to get driveby malware will be going for domain admin.

    --
    There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
  4. You Still Need Geographic Diversity by digsbo · · Score: 3, Interesting

    Just like the Amazon AWS failure that took down Netflix, architecting your cloud infrastructure for geographic diversity can significantly reduce the likelihood of these kinds of outages.

  5. Wow, I'd be pretty angry by ErichTheRed · · Score: 4, Interesting

    Everyone forgets that Azure is a way-beyond-massive Hyper-V implementation, and that AWS is a way-beyond-massive Xen-like-thing implementation. Even though both cloud providers let you be smart in designing your infrastructure (multi-site, redundancy, etc,...the tools are there) nothing will save you from an outage of the core guts of the system. Wasn't Azure's last failure due to a certificate expiration? There's no way an end customer can plan around that.

    I'm a big fan of the private or hybrid cloud version of this fad. You get all the good stuff that Azure and AWS customers get like dynamic provisioning and software defined networking, without having to rely on a third party. Unfortunately, CIOs and other execs just see the numbers on a spreadsheet and don't take the costs of outages that you can't control into account. Power fails, networks drop, and people do stupid things in on-site implementations also. But you can at least have your staff working on it with the incentive being "you get to keep your job." With a public cloud provider or even a hoster, the responsibility ends with "oops, here's 7 hours of free service" and you have to wait in line with everyone else.

  6. Re: Out of band patch.. by Eosi · · Score: 4, Insightful

    Interesting... What about all the Open SSL or SSH issues that happened this year, which in many cases were default as part of Linux servers???
    Regardless of OS, poor testing of third party apps / services or poor security as part of your deployment, can cause you to be violated. I have seen many Linux server still using Telnet or VNC for management, and allowing ROOT to login directly to them....
    Secure your environment regardless of what you run......

  7. Awesome! wait on experts so we can run again by tacokill · · Score: 3, Insightful

    So let me get this straight.....your cloud is down and your only recourse is to depend on the cloud provider's highly skilled technicians to diagnose and fix the problem? Sign me up! There's nothing I like more than only one path forward which is completely dependent on specialists. /s

    Are you kidding or do you not understand how large companies, in particular cloud companies, operate? Have you ever had to call one about an unknown issue? Try it sometime....you'll learn a lot.