Slashdot Mirror


Microsoft Azure Outage Across the Globe

hawkinspeter writes: The BBC reports that overnight an outage of Microsoft's Azure cloud computing platform took down many third-party sites that rely on it, in addition to disrupting Microsoft's own products. Office 365 and Xbox Live services were affected.

This happened at a particularly inopportune time, as Microsoft has recently been pushing its Azure services in an effort to catch up with other providers such as Amazon, IBM, and Google. Just a couple of hours previously, Microsoft had screened an Azure advert in the UK during the Scotland v. England soccer match."
(Most services are back online. As of this writing, Application Insights is still struggling, and Europe is having problems with hosted VMs.)

39 of 167 comments (clear)

  1. Azure is blue, ain't it? by Anonymous Coward · · Score: 3, Funny

    Global BSOD!

  2. Yawn ... by gstoddart · · Score: 5, Insightful

    Cloud fail, like nobody saw that coming.

    If you don't own and operate your own infrastructure, you're at the mercy of someone else.

    And clearly that someone else can't guarantee you robustness with this magic cloud.

    All of these people who say "awesome, because, cloud" -- well, I have yet to be convinced that any of these vendors can provide as much uptime and reliability as a decent IT department.

    I suggest we start calling it Clown Computing -- you cram a lot of Clowns into a tiny little car, and hope it keeps going.

    When something goes wrong, hilarity ensues.

    --
    Lost at C:>. Found at C.
    1. Re:Yawn ... by i+kan+reed · · Score: 5, Interesting

      Yeah, but it's never really been about the reliability. It's always been the "not paying your own IT maintenance staff" thing that's the big draw.

    2. Re:Yawn ... by gstoddart · · Score: 2

      Sure, but when you have outages and stability issues which impact your business, is it really a good trade off?

      I mostly see this as a management fail -- penny wise and pound foolish.

      I will be curious to see what percent of companies who went to the cloud will transition back to doing stuff in-house, and just how much that will really cost them in the long run.

      --
      Lost at C:>. Found at C.
    3. Re:Yawn ... by dontbemad · · Score: 4, Insightful

      Once again, missing the point. In my (small) shop, by using azure (which has worked well for us), we avoid having to use money to hire admins to maintain any sort of in house servers we might have. We can then put that money towards more developers (or better salaries for us current devs), as well as paying for training, nicer dev machines, etc. At the same time, if we do have a problem with any sort of hosted service through azure, support is literally a phone call away, and I can't remember the last time a resolution didn't happen within a couple hours.

      Sure, cloud computing has its short-comings. But it has also allowed a litany of small companies who simply can't afford to own their own infrastructure to do business.

    4. Re:Yawn ... by Anonymous Coward · · Score: 2, Insightful

      Right. Because in-house infrastructure never fails.

      Power outages never happen.

      Lines are never cut.

      Patches never fail and rollbacks always work. ... can I come live in make-believe-land with you?

    5. Re:Yawn ... by Chas · · Score: 2

      Cloud fail, like nobody saw that coming.

      If you don't own and operate your own infrastructure, you're at the mercy of someone else.

      Pretty much anyone with a brain saw it coming. That doesn't stop a lot of idiots who bought the shit sandwich from feeling burned.

      And clearly that someone else can't guarantee you robustness with this magic cloud.

      Nope. Because most of the time, unlike when you control your infrastructure, you have exactly ZERO way to verify claims regarding robustness of service.

      All of these people who say "awesome, because, cloud" -- well, I have yet to be convinced that any of these vendors can provide as much uptime and reliability as a decent IT department.

      And keep waiting. Because they can't. Flat out.

      I suggest we start calling it Clown Computing -- you cram a lot of Clowns into a tiny little car, and hope it keeps going.

      And my mind immediately flashed to Michelle Duggar.

      "Oh! We'll take whatever uptimes God sees fit to grant us!"

      When something goes wrong, hilarity ensues.

      Unless you're the poor sonofabitch it's happening to. Then it ain't quite so funny. It's on par with having to take a computer in for servicing, and then getting it back to find out that the technicians reformatted the system and just destroyed your business apps and 20+ years of data.

      --


      Chas - The one, the only.
      THANK GOD!!!
    6. Re:Yawn ... by hawkinspeter · · Score: 2

      I'm disappointed that they edited out my original comment: "Office 365 (maybe an optimistic name)".

      --
      You're a temporary arrangement of matter sliding towards oblivion in a cold, uncaring universe
    7. Re:Yawn ... by Kobun · · Score: 3, Informative

      I'd like to take your question "Is it really a good trade off?" and toss out an example set of numbers to summarize it. Let's say there are 250 business days in a year. Operations run from 8am until 6pm, not counting after-hours processing and maintenance. Revenue is $100 million. Gross profit percentage is 20%. This gives per hour revenue of $40,000, per hour profit is $8,000. A day of lost revenue is $400,000, or a loss of $80,000 of profit opportunity (assuming that opportunity costs are not recoverable). My own calculations for my department are somewhat similar, except I've also included the additional benefit my employees bring in for the work they do when they aren't working on maintaining/improving uptime. Avoiding the cloud is almost a no-brainer in our circumstances, except for very specific & limited services.

    8. Re:Yawn ... by Anonymous Coward · · Score: 2, Insightful

      I spent 6 months earlier this year on behalf of our IT Director (who wanted us to go to cloud really badly, because, well, cloud) studying the costs and efforts of doing so. My conclusion was that over a 5 year period, cloud hosting would cost us TEN TIMES the cost of hosting internally. I expected this report to end this discussion, but it didn't.

      My director pointed out I hadn't taken into account the fewer people we would need to manage things (which I pointed out was horseshit, we do colo hosting now and visit our datacenter sites maybe 2-3 times a year now).

      I think, aside from people having ideas that cloud is magic, it comes down to the accounting lure of a low monthly fee (operational costs) vs high one-time costs when you buy equipment (generally capital costs). It's very short term thinking.

      Point one: Unless you're talking about short-term projects or workloads, cloud is probably far more expensive than in-house for most companies over time.

      Point two: Cloud hosting is just another form of Colo. Instead of you paying for racks, or portions of racks, you're paying for VM space (portions of a server). It's just colo.

    9. Re:Yawn ... by mrspoonsi · · Score: 2

      Well, I have a rack in a datacenter, and I have an azure vm, which basically pings the rack servers / services to notify me of outage. In the last year I when I come to install patches on the azure VM, about 3 times I have had the message 'unexpected shutdown, enter reason' message waiting when log-on. Number of times this has happened on my own rack (in last year? zero, you can go back 4 years and still zero).

    10. Re:Yawn ... by Crudely_Indecent · · Score: 4, Interesting

      There is something you can do about all of those conditions.

      With cloud, you just wait for the rain (outage). You can pray (call an outsourced tech support department) for it to stop raining (services restored), but until god (cloud provider) decides the rain is done (fixes the problem), you're getting wet (offline).

      That gives me a new "Cloud" tagline:

      Cloud - We will definitely rain on your parade.

      --


      "Lame" - Galaxar
    11. Re:Yawn ... by serviscope_minor · · Score: 4, Insightful

      Sure, but when you have outages and stability issues which impact your business, is it really a good trade off?

      Of course it is. Outsource to the cloud and cut the quarterly costs massively by laying off staff. Get a big bonus. Possibly share options go up due to better profits and blathering to the shareholders about the cloud. Sure 3 years down the line it might tank for a few days and in one fell swoop wipe out all the savings and then some.

      Not my problem, I'll be long gone.

      So is it worth it? Hell yes!

      --
      SJW n. One who posts facts.
    12. Re:Yawn ... by Bengie · · Score: 4, Insightful

      There are many reasons to use the cloud.

      1) You're too small to afford enough full time IT
      2) You can't afford the capital investment into your own servers
      3) You need a low latency global CDN like service, but you can't afford dedicated servers running everywhere
      4) You need only temporarily need to scale up your servers to handle burst load
      5) I'm sure there are other reasons.

    13. Re:Yawn ... by Trailer+Trash · · Score: 5, Interesting

      Let me explain it from my point of view. I own and operate a one or two man software company that also hosts web sites. I work in the flim & tv music industry, meaning I have a shit load of music (literally terabytes) that has to be available for download.

      8 years ago I owned a rack of servers downtown here that I managed myself. Honestly, it wasn't that bad. I bought reliable used 1U servers (mainly IBM and Dell) off ebay and stocked them with disks. I ran FreeBSD and Linux, used RAID, etc. But I always had two issues to deal with. The main one was "I have to always be available to handle hardware issues".

      My company isn't big enough to hire someone to do it, but I managed for nearly 10 years with no disasters. In that time I had a motherboard crap (when I was starting out with one server - ouch) and a few disks fail. In all of those times I had to go in - sometimes in the middle of the night - and fix/replace whatever was wrong.

      Then I found Amazon AWS. Here's the kicker - it was actually cheaper for me to simply "rent" storage from them than to rent rack space for my own servers. I moved my servers to linode.com - again it was cheaper although they're nowhere near as fast as my former dedicated servers were, but they're fast enough for my applications and I can always move to larger instances where needed. And that eliminated my maintenance issues for hardware while costing less per month and maintaining the same 3-4 nines level of availability that I've always had. Oh, one other thing - S3 makes it just as easy to secure my audio files but the delivery speed can easily saturate any pipe that the files are being delivered to.

      So the cloud might not be "magical" and solve all the world's problems, but for small IT shops it's great. Everything I do is on the internet so the whole "what if your connection goes down?" issue doesn't exist for me. I do not recommend such a solution for everybody. I have clients in the industrial wholesale space and their inventory & sales system definitely should be on-site with off-site backups. But their web site can be hosted elsewhere.

      Anyway, yes, the "cloud" is very useful for many businesses.

    14. Re:Yawn ... by nine-times · · Score: 4, Insightful

      Yes, the "cloud" servers sometimes have outages. So do managed hosting providers. So do internal servers. And frankly, although every business thinks that what they're doing is super-important and they can't afford even the briefest outage, the fact is that most businesses can.

      If Azure or AWS go down for an hour, it makes news and everyone freaks out because a lot of people are using them. If your business's server goes down for an hour, it does not make news, and people don't freak out. But for the business experiencing that 1 hour of downtime, what difference does it make whether they own the hardware or it's in "the cloud".

    15. Re:Yawn ... by Geeky · · Score: 2

      What it boils down to is whether the cloud service is more reliable than doing it in-house - which has more downtime? Can you do it better than Azure? The cost then comes into it - can you do it better for less money? The only no-brainer is the service that is both more reliable and cheaper, otherwise you're looking at tradeoffs.

      For some small businesses, cloud solutions may be both cheaper and more reliable than doing it in-house, especially if the core business is not IT related.

      Of course, that assumes that customers of cloud services have done a proper analysis and aren't just jumping on a bandwagon.

      --
      Sigs are so 1990s. No way would I be seen dead with one.
    16. Re:Yawn ... by Jaime2 · · Score: 4, Insightful

      The calculations are simple when you assume the cloud will fail and your infrastructure will not. A real tradeoff calculation has to include estimates of the reliability of both scenarios. The answer to "Is it really a good trade off?" will be entirely based on estimates and opinions. I'm not saying you're wrong, I'm just saying that the math does not spit out "no-brainer".

      Some cloud providers will even give you SLAs with real money behind them. So, they could conceivably come up with a no-brainer deal where the cloud provider guarantees your $80,000 every day, whether it's from having your business up and running or writing you a check.

    17. Re:Yawn ... by Dutch+Gun · · Score: 3, Interesting

      I don't think anyone is disputing that hosted online services are both useful and, in some cases, absolutely essential, especially for smaller businesses. Well, maybe some people are, but they're pretty much Luddites, so we can ignore them. It's just that in the rush to push everything to the cloud since that's seen as some sort of panacea, people tend to forget that there are serious consequences to outages, and the more you push services to the cloud, the greater the impact of those outages will be. It's essentially putting all your technological eggs in one basket.

      As much as people complain about proprietary file formats, those really don't hold a candle to proprietary services as far as vendor lock-in. If the service you chose, for instance, starts to go south on a regular basis, and you've built your entire ecosystem inside a specific vendor's cloud, you could be in a world of hurt.

      That being said, my feeling is that these sorts of system-wide outages are simple part of these services growing pains. Even now, keep in mind that these sorts of large-scale failures are rare enough that they make international headlines. In another five to ten years, it's going to be even rarer still. Otherwise, fewer large players will trust them for critical infrastructure over the long haul. For smallish businesses, even with occasional outages, it's still probably a net win.

      --
      Irony: Agile development has too much intertia to be abandoned now.
    18. Re:Yawn ... by segedunum · · Score: 3, Insightful

      Once again, missing the point. In my (small) shop, by using azure (which has worked well for us), we avoid having to use money to hire admins to maintain any sort of in house servers we might have.

      Who maintains your Azure infrastructure (I hope you built in all that lovely redundancy for these problems) and how often do you really need to maintain internal servers? If these are on 24x7 you're going to be paying through the nose and if you miss a monthly fee, off you go. Not to mention that cloud servers are horrifically under resourced compared to hardware you can buy, so you generally need many more of them, and none of the bandwidth, I/O or CPU resources are guaranteed to be yours no matter what your meaningless agreement says.

      We can then put that money towards more developers (or better salaries for us current devs), as well as paying for training, nicer dev machines, etc.

      Ahhh, yes. Developers who believe deployment can be bypassed as a cost and running applications in production (which is kind of important to any company running web applications and who relies on them for income) simply doesn't matter.

      At the same time, if we do have a problem with any sort of hosted service through azure, support is literally a phone call away, and I can't remember the last time a resolution didn't happen within a couple hours.

      You've been exceptionally lucky, or you're being economical with the truth ;-).

      Sure, cloud computing has its short-comings. But it has also allowed a litany of small companies who simply can't afford to own their own infrastructure to do business.

      I've also seen a litany of small companies go out of business with cashflow issues who thought like that. Funny that. Yes, the infrastructure is cheaper if you don't run it all the time. I think I once calculated that if you have a server on for more than eight hours a day then you're simply being milked for a monthly fee.

    19. Re:Yawn ... by swb · · Score: 2

      We can then put that money towards more developers (or better salaries for us current devs), as well as paying for training, nicer dev machines, etc. At the same time, if we do have a problem with any sort of hosted service through azure, support is literally a phone call away, and I can't remember the last time a resolution didn't happen within a couple hours.

      Who's this we? Are you some kind of dev-only shop, self-managed?

      I would bet that in most instances, the "savings" from moving to cloud never becomes more budget for the IT department, especially if its money for salaries. If anything it just cuts your budget or feeds some bonus pool for executives.

    20. Re:Yawn ... by Kobun · · Score: 2

      Surprisingly enough, I've got those failure calculations (estimated using prior internal failures rates) included in my own measurements. The SLA thing is where things really turned south - for the expense cloud providers who would guarantee uptime the way you describe AND reimburse for outages outside of the SLA, I could have doubled my staff size.

    21. Re:Yawn ... by Anonymous Coward · · Score: 2, Insightful

      This is why I drill for my own natural gas. You never know when the gas company will have an outage.

    22. Re:Yawn ... by uncqual · · Score: 4, Insightful

      However, in a widespread outage like this, I'll bet the big cloud providers have a better record of rapid recovery than their customers had in-house. By necessity, MS, Amazon et al have very competent engineers who know the product well available to pull off what they are doing (including sleeping) and jump into any really serious problem. There simply are not enough such engineers to go around all the mid-sized IT organizations in the world nor interesting enough work to keep these engineers interested and sharp at most of these IT organizations (to say nothing of the cost of keeping such engineers around).

      For a car analogy... When your high end car has a nagging problem that your local mechanic can't figure out, the dealer often can figure it out quickly, possibly with the help of a factory specialist who deals with (say) ECUs on only this make all day, every day. Rarely can an independent mechanic specialize enough to come close to the factory specialists in diagnosis. Now, if your car just has a dead battery, your local mechanic may give you faster, better, and cheaper service than the dealer.

      --
      Why is there an "insightful" mod and why isn't it "-1"? If I wanted insight, I wouldn't be reading /.
    23. Re:Yawn ... by Bob+the+Super+Hamste · · Score: 2

      If your business's server goes down for an hour, it does not make news

      Not everyone is in that boat. If there is a system outage with the systems I deal with it will make the news, sometimes even the national and international news. That problem wasn't with one of the systems I deal with or was provided by the company I work for but was a wake-up call to the industry.

      --
      Time to offend someone
    24. Re:Yawn ... by labnet · · Score: 2

      Yeah, but it's never really been about the reliability. It's always been the "not paying your own IT maintenance staff" thing that's the big draw.

      I priced 10 2core VMs. It was 24k/annum. We do that internally on an R720 that cost 10k and needs about 3 hours a month maintenance. So for mainly internal use networks, where is the value?

      --
      46137
  3. Re:Out of band patch.. by afidel · · Score: 5, Informative

    I installed it last night on all domain controllers after testing it in my isolated testing network. It's not really optional since it allows any domain user to become domain admin and the only resolution to that is a domain rebuild or authoritative restore. It's also already been seen in attacks in the wild so you can assume the next client to get driveby malware will be going for domain admin.

    --
    There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
  4. You Still Need Geographic Diversity by digsbo · · Score: 3, Interesting

    Just like the Amazon AWS failure that took down Netflix, architecting your cloud infrastructure for geographic diversity can significantly reduce the likelihood of these kinds of outages.

  5. Wow, I'd be pretty angry by ErichTheRed · · Score: 4, Interesting

    Everyone forgets that Azure is a way-beyond-massive Hyper-V implementation, and that AWS is a way-beyond-massive Xen-like-thing implementation. Even though both cloud providers let you be smart in designing your infrastructure (multi-site, redundancy, etc,...the tools are there) nothing will save you from an outage of the core guts of the system. Wasn't Azure's last failure due to a certificate expiration? There's no way an end customer can plan around that.

    I'm a big fan of the private or hybrid cloud version of this fad. You get all the good stuff that Azure and AWS customers get like dynamic provisioning and software defined networking, without having to rely on a third party. Unfortunately, CIOs and other execs just see the numbers on a spreadsheet and don't take the costs of outages that you can't control into account. Power fails, networks drop, and people do stupid things in on-site implementations also. But you can at least have your staff working on it with the incentive being "you get to keep your job." With a public cloud provider or even a hoster, the responsibility ends with "oops, here's 7 hours of free service" and you have to wait in line with everyone else.

    1. Re:Wow, I'd be pretty angry by serviscope_minor · · Score: 2

      I'm a big fan of the private or hybrid cloud version of this fad. You get all the good stuff that Azure and AWS customers get like dynamic provisioning and

      Not really. I mean sure you get dynamic provisioning right up untill you completely run out of capacity. The advantage of Amazon is that they are much, much, much, MUCH bigger than you. So, if there's a big peak in usage for some reason, you can keep on scaling up to match the demand.

      --
      SJW n. One who posts facts.
    2. Re:Wow, I'd be pretty angry by Dr.+Evil · · Score: 2

      A good devops team means that amazon vs. local is a flip of a command line parameter.

    3. Re:Wow, I'd be pretty angry by ErichTheRed · · Score: 2

      "Most people who run large companies aren't stupid, and I'm sure that many of them do take into consideration the costs of outages."

      Not stupid, but MBAs in my experience never actually dig into the spreadsheets and figure out the meaning behind the number. They just see what the vendors promise them over multiple free lunches, golf trips, etc. It doesn't help that most CIOs aren't really technology people, or are so divorced from the day to day operations that they don't know what impact a decision like that has.

      It's the short sighted MBA disease -- if cost of onsite service is greater than shiny rosy cloud picture the vendor is painting, get rid of the onsite service regardless of operations impact. The other problem is that most of the decision makers will just bail when the first failure happens, after having collected the bonus for getting rid of the IT team.

    4. Re:Wow, I'd be pretty angry by 0123456 · · Score: 2

      The other problem is that most of the decision makers will just bail when the first failure happens, after having collected the bonus for getting rid of the IT team.

      And the smart ones bail right after collecting their bonus, before the first failure happens...

    5. Re:Wow, I'd be pretty angry by afidel · · Score: 2

      Would governments be the only clients that private clouds truly make sense for?

      Nah, we're on the small side of the S&P 500 and our "private cloud" has enough spare capacity to bring entire new projects online, spin up testing instances, provide an entire parallel Citrix farm (we're upgrading and want to have the old farm available for fallback in case we hit a critical bug), and still provide for the failure of up to two hosts without any overprovisioning. Infrastructure hardware and operating costs are less than 5% of our annual IT budget. For most companies that aren't doing massive public websites people and software costs will dominate over the cost of infrastructure.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
  6. Does MS offer any guarantees? by Anonymous Coward · · Score: 2, Interesting

    I mean, what happens now? If I use Azure in my business, and because this outage I have lost x dollars in business transactions that i could not carry out, is MS going to compensate me in any way? Or is Azure one those services that comes without any guarantees?

  7. Re: Out of band patch.. by Eosi · · Score: 4, Insightful

    Interesting... What about all the Open SSL or SSH issues that happened this year, which in many cases were default as part of Linux servers???
    Regardless of OS, poor testing of third party apps / services or poor security as part of your deployment, can cause you to be violated. I have seen many Linux server still using Telnet or VNC for management, and allowing ROOT to login directly to them....
    Secure your environment regardless of what you run......

  8. What am I not getting? by hyades1 · · Score: 2

    I just can't get my head around the idea that somebody would take information vital to their needs and put it beyond reach, under the control of other people whose priorities probably don't match theirs.

    What advantages are so overwhelming that they make this a sensible thing to do?

    --
    I've calculated my velocity with such exquisite precision that I have no idea where I am.
    1. Re:What am I not getting? by labnet · · Score: 2

      Cheaper?
      10 x 2core vms is $20k /annum. We do that on r720 that cost $10k and a couple of hours a month maintainance.

      --
      46137
  9. Awesome! wait on experts so we can run again by tacokill · · Score: 3, Insightful

    So let me get this straight.....your cloud is down and your only recourse is to depend on the cloud provider's highly skilled technicians to diagnose and fix the problem? Sign me up! There's nothing I like more than only one path forward which is completely dependent on specialists. /s

    Are you kidding or do you not understand how large companies, in particular cloud companies, operate? Have you ever had to call one about an unknown issue? Try it sometime....you'll learn a lot.