Slashdot Mirror


Microsoft's Azure Cloud Suffers Major Downtime

New submitter dcraid writes with a quote from El Reg: "Microsoft's cloudy platform, Windows Azure, is experiencing a major outage: at the time of writing, its service management system had been down for about seven hours worldwide. A customer described the problem to The Register as an 'admin nightmare' and said they couldn't understand how such an important system could go down. 'This should never happen,' said our source. 'The system should be redundant and outages should be confined to some data centres only.'" The Azure service dashboard has regular updates on the situation. According to their update feed the situation should have been resolved a few hours ago but has instead gotten worse: "We continue to work through the issues that are blocking the restoration of service management for some customers in North Central US, South Central US and North Europe sub-regions. Further updates will be published to keep you apprised of the situation. We apologize for any inconvenience this causes our customers." To be fair, other cloud providers have had similar issues before.

210 comments

  1. But Remember - by Ralph+Spoilsport · · Score: 5, Insightful
    Your data's safe in the Cloud.

    Until it isn't.

    --
    Shoes for Industry. Shoes for the Dead.
    1. Re:But Remember - by Anonymous Coward · · Score: 5, Funny

      It's very safe though - just so safe no one can get access to it! :)

    2. Re:But Remember - by tnk1 · · Score: 1

      Oh their data is safe. They just can't get to it or use it in any way. :)

    3. Re:But Remember - by tripleevenfall · · Score: 4, Funny

      Nonsense, Microsoft is the name you can trust for security.

    4. Re:But Remember - by masternerdguy · · Score: 5, Insightful

      Also remember the cloud is just the 21st century spin of the dummy terminal-mainframe model.

      --
      To offset political mods, replace Flamebait with Insightful.
    5. Re:But Remember - by masternerdguy · · Score: 2

      ActiveX best X

      --
      To offset political mods, replace Flamebait with Insightful.
    6. Re:But Remember - by V!NCENT · · Score: 2

      Hey! After rain comes sunshine. Now they'll just have to wait for cloud formations again...

      --
      Here be signatures
    7. Re:But Remember - by Barsteward · · Score: 4, Insightful

      Stop talking sense, its no use here on /.

      --
      "The hands that help are better far than lips that pray." - Robert Ingersoll (1833-1899)
    8. Re:But Remember - by poetmatt · · Score: 3, Interesting

      When you rely on a 3rd party for cloud storage and that 3rd party has a basically nonexistent SLA for an under 30 day outage, it becomes your own fault for making a horrible business decision.

      when you take a 3rd party cloud storage solution and implement it yourself for your enterprise, guess what? it works. And if there's issues, you know who's to blame.
      https://spideroak.com/diy/ - this is one example of but many.

    9. Re:But Remember - by geekoid · · Score: 0

      Nope.
      But you keep you overly simplistic view of the world,..to yourself.

      --
      The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
    10. Re:But Remember - by geekoid · · Score: 2

      Yes, they can. It's service management that's down, not data.
      Users can still access data.

      --
      The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
    11. Re:But Remember - by icebraining · · Score: 4, Insightful

      Except those dumb terminals were, well, dumb, while nowadays the "terminals" are essentially the same as the "mainframe" but slower. So you can have hybrid configurations were a dedicated machines handles the base load and spins up remote resources on demand to handle peaks. If those resources are unavailable, the dedicated machine can still do the job, just with some performance degradation.

      A good example would be a script on your laptop that started an EC2 instance running distcc to reduce your compilation time from hours to minutes. If the instance can't be loaded, you could still compile, it just takes more time.

    12. Re:But Remember - by dave420 · · Score: 4, Insightful

      Except this time you can add as many mainframes you wanted, dynamically. And access them over the internet. And serve content to millions of people over said internet. That wasn't possible with this clichéd "mainframes!!!!!1" nonsense. Yes, you are using a remote computer. That's the only similarity. The current terminals are far from dumb, and the server being connected to is vastly different to the mainframes of old.

    13. Re:But Remember - by Surt · · Score: 3, Insightful

      If only even a single cloud service were actually built this way, it'd be great!

      --
      "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
    14. Re:But Remember - by dave420 · · Score: 1

      Then that's a problem with your phone, and not the cloud. My Android works just fine when not connected to the internet. It's got a 1.2GHz dual-core processor, so it's not exactly dumb.

    15. Re:But Remember - by Anonymous Coward · · Score: 0

      Wow that is a terrible SLA:

      This is intended as a cost effective long term bulk data archival service, so the SLA is geared with that sort of use in mind.

      DIY includes a guarantee of 99.7 % uptime. This allows us to be offline for a couple hours each month.

      Additionally, we don't consider SpiderOak DIY to be offline at all when it is merely read-only for less than 2 hours.

      So they can have virtually unlimited "read only" downtime as long as they turn write back on every

    16. Re:But Remember - by gstrickler · · Score: 1

      Or, as I like to phrase it.

      If all your data is in the Cloud, what happens when it rains?

      --
      make imaginary.friends COUNT=100 VISIBLE=false
    17. Re:But Remember - by icebraining · · Score: 1

      There are. Search for "hybrid hosting", "hybrid cloud" and similar.

    18. Re:But Remember - by Solandri · · Score: 1

      Cloud services are great for hundreds of thousands of small businesses which are big enough to need centralized computing and file-storage services, but not big enough to have full-time IT staff to support it. You basically outsource your email server support to Google, your file server maintenance to Amazon, etc. and pay by the account or GB. The analogy to the mainframe/dummy terminal doesn't work for these companies because back in the day they never could've afforded a mainframe much less an IT staff to maintain it. That's what the cloud does - it takes something which used to only be accessible to larger businesses, and puts it within reach of smaller businesses by eliminating the need for equipment and IT staff to be physically located at the small business.

      A better analogy is back when PBXs (phone switching equipment) used to cost tens of thousands of dollars making it a huge expense for a company with just 3-5 employees. Then some phone company got the bright idea of selling the company multiple phone lines with a virtualized PBX. The PBX equipment would stay at the phone company, but the small company could get PBX-like features for about $20/mo extra per line.

    19. Re:But Remember - by hawguy · · Score: 4, Interesting

      Except this time you can add as many mainframes you wanted, dynamically. And access them over the internet. And serve content to millions of people over said internet. That wasn't possible with this clichéd "mainframes!!!!!1" nonsense. Yes, you are using a remote computer. That's the only similarity. The current terminals are far from dumb, and the server being connected to is vastly different to the mainframes of old.

      I wonder how old you are? The current "Web 2.0" paradigm reminds me very much of the old 3270 style mainframe environment.

      The 3270 terminal (well, the controller) was not exactly "dumb" - it had some base level of intelligence, it knew how to display forms, it could do input validation, etc but it didn't really do much with the data beyond sending it up to the mainframe. The mainframe on the backend took the data and actually did something with it. This is pretty much exactly how "Web 2.0" works, except instead of a 3270 terminal communicating to the mainframe over SNA, you have web browsers calling back to the web server over HTTP using Javascript.

      Yes, both the endpoints and servers have become more capable, but there are still many similarities to the old style model.

    20. Re:But Remember - by Anonymous Coward · · Score: 0

      Excellent! It is safe, AND secure. Maybe forever.

    21. Re:But Remember - by Dog-Cow · · Score: 4, Funny

      So they can have virtually unlimited "read only" downtime as long as they turn write back on every

      Let me guess. Switched you to read-only right in the middle.

    22. Re:But Remember - by sapgau · · Score: 1

      mod parent up.
      Many techniques and advances were developed by mainframes decades ago we just have a hard time accepting it.

    23. Re:But Remember - by lgw · · Score: 2

      Everything old is new again. The only real advance ove rthe mainframe model here is AJAX - and until HTML 5 really matures it's still a half-assed solution, but better than a dumb terminal.

      Of course mainframes could add compute power to particular customers dynamically. Of course they could serve content to millions of customers (Visa used mainframes for transaction processing until quite recently). "Normal" HTML pages and forms are *very* similar to terminal-mainframe interaction. The older web server stacks were quite similar in architecture to the old mainframe solutions (but about 10% efficient in user-handling/compute as mainframe stacks - just shockingly bad).

      The "server" being connected to in terms of the actual functionality provided to the customer is really quite similar, as "the cloud" has finally caught up with the level of virtualizaiton and redundancy that are old hat on mainframes. The only thing that really seems new in the plumbing is not having to post a form in order to validate/update a field (which is damn nice).

      --
      Socialism: a lie told by totalitarians and believed by fools.
    24. Re:But Remember - by StuartHankins · · Score: 1

      But does it have Web Scale? Sharding can help you get Web Scale for your business. The Cloud helps you get Web Scale. You did not have Web Scale with those older systems. MongoDB helps you get Web Scale.

      Start sharting today with MongoDB!<sarcasm>

    25. Re:But Remember - by lgw · · Score: 1

      Back in the day small business would use a mainframe exactly like they use the cloud today: they'd lease time and space (and a line). The only equipment at the customer site was cheap terminals and a router-equivalent, ususally also leased.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    26. Re:But Remember - by s.petry · · Score: 2

      The same problems with scaling now existed back in the main frame days, and the same solutions were present. No matter how many CPUs you throw to a developer, the applications must be developed to scale with them.

      Do some digging on DMP (Dynamic Multi-Processor) and SMP (Symetric Multi-Processor) architectures and you will probably be amazed at how long ago these methods were being used.

      This to me is the hilarity of the people that push "Cloud". They say things like Microsoft did in their "Yeah Cloud" commercials geared toward home users. Running "Word" in the cloud is simply idiotic since there is no DMP built in to the application. Microsoft Photo does not have it either. Outside of having redundant nodes hosting your data there is absolutely 0 benefit. Most cloud services have no SLA on data you store there either, so the real use of a "Cloud" is very very obscure outside of the sales pitch.

      I'm not saying "Cloud" has no use, but the real value is in the same application stacks we saw on Grid computing and it's predecessors ("Main frame" and very large SMP servers).

      --

      -The wise argue that there are few absolutes, the fool argues that there are no probabilities.

    27. Re:But Remember - by cmdrbuzz · · Score: 1

      VISA still use mainframes, all the transaction forwarding and processing uses TPF. Most of the authorization by the various member banks and issuers are using Base24 though.

    28. Re:But Remember - by Anonymous Coward · · Score: 0

      Ex-seal team six operative sent a terse 'Mission accomplished'. Told him to buy Apple but he wouldn't listen to anyone unless it came from a 'Higher rank'.

    29. Re:But Remember - by dave420 · · Score: 0

      There are similarities, yes, but the differences are far greater. I doubt you could send a few bytes across a network and suddenly have another mainframe at your disposal, priced at cents per hour to run. Not to mention have millions of people being able to use said mainframe at the cost of cents per person/hour. It reminds me as well, but then I laugh as I remember the vast differences in both ability and cost.

      And for the record, I am familiar with mainframe operation, especially due to my parents being mainframe developers in the 60s and 70s and introducing me to computing in the first place.

    30. Re:But Remember - by dave420 · · Score: 1

      Wow. You've not really thought your argument through. With the cloud you are not restricted to the number of mainframes you own. You couldn't flip a switch in the good old days and suddenly a new mainframe would pop out of thin air.

      You don't really seem to know what you're talking about. It's almost as if you've never actually used the cloud for anything decent. Your summation as simply being asynchronous form submissions speaks volumes.

    31. Re:But Remember - by dave420 · · Score: 1

      Not all cloud applications are using MP. The vast majority are websites or application back-ends. It's not about software using more cores in a single computer, it's software running on the same number of cores, but on many computers. And with the cloud, you don't have to own your own mainframes and allocate them as you see fit - you can just request more computers without having to pay for them when you're not using them. You also don't seem to understand what the cloud actually is. I'm sure you're not doing it on purpose, but there are clear and fantastic advantages to cloud computing than mainframes which clearly separate them in terms of their *shudder* "paradigms".

    32. Re:But Remember - by ocdscouter · · Score: 1

      Nope. But you keep you overly simplistic view of the world,..to yourself.

      To be fair, the 'dummy terminal' has been supplanted by the more modern Fat Client.

      Take that as you will.

    33. Re:But Remember - by Martz · · Score: 1

      Perhaps there was nothing wrong with the old style model of main frames, it's just that not everybody could afford a mainframe for the home.

      We just need the personal internet connections/infrastructure for the masses to catch up.

      Then it'll be worth uploading CPU intensive tasks to the cloud, as generally the home users most limiting factor at the moment is bandwidth, not CPU.

    34. Re:But Remember - by Surt · · Score: 1

      I think that I would have to search for it makes my point. ;-)
      I've yet to see an app that really supports this.

      --
      "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
    35. Re:But Remember - by hawguy · · Score: 1

      There are similarities, yes, but the differences are far greater. I doubt you could send a few bytes across a network and suddenly have another mainframe at your disposal, priced at cents per hour to run. Not to mention have millions of people being able to use said mainframe at the cost of cents per person/hour. It reminds me as well, but then I laugh as I remember the vast differences in both ability and cost.

      And for the record, I am familiar with mainframe operation, especially due to my parents being mainframe developers in the 60s and 70s and introducing me to computing in the first place.

      So you've never bought a plane ticket online, nor used online banking?

    36. Re:But Remember - by mmusson · · Score: 1

      Actually I think Web 1.0 was reminiscent of the mainframe terminal environment. Struts immediately jumps to mind. Web 2.0 reminds me more of client / server applications although its not a perfect analogy.

      It should not be surprising that architectures keep shifting back and forth over the years. Each is in some sense solving certain problems of the former. But it does lead to a certain amount of "rediscovering" something that used to be well understood. I suspect that many of the younger folks would be surprised to know that the virtualization driving the cloud dates back to 1972 and the IBM System/370 range of mainframes.

      --
      SYS 49152
    37. Re:But Remember - by sensationull · · Score: 1, Insightful

      OMG, you mean instead of having the mainframe at your building you have it on teh internetz and instead of owning you're own stack of them you rent usage on a much larger stack owned by someone else. You are so right, that is SOOOOOOOO! different. Yes there is more redundancy thanks to better infrastructure that allows for stacks of 'mainframes' in different locations but the vast change is basicly the business model.

      If you are really that amazed by this 'new, unpresidented' tech perhaps you should go and work in sales as their KoolAid obviouly works quite well on you.

      It's not exactly the same implementation the the model is very simmilar, just a bit updated.

    38. Re:But Remember - by s.petry · · Score: 1

      Not all cloud applications are using MP. The vast majority are websites or application back-ends. It's not about software using more cores in a single computer, it's software running on the same number of cores, but on many computers

      Which is exactly what DMP is. Thanks for doing some checking before responding.

      And with the cloud, you don't have to own your own mainframes and allocate them as you see fit - you can just request more computers without having to pay for them when you're not using them.

      Which as someone pointed out already, was never an issue. Big companies that wanted a dedicated Main Frame purchased them. Small companies used "Time Share". If you ever heard of the company called EDS, see how they started out. Interesting read, and yes.. I did work for EDS a long long time ago.

      You also don't seem to understand what the cloud actually is. I'm sure you're not doing it on purpose, but there are clear and fantastic advantages to cloud computing than mainframes which clearly separate them in terms of their *shudder* "paradigms".

      < sarcasm on > Obviously I have no knowledge at all, which is why I am a consultant for a living. Having worked as a consultant during the various time periods I have gained no ability to discuss the parallels between the systems. < sarcasm off >

      As I mentioned, "Cloud" does have some purposes (I would not call them "advantages" as you have done. Sounds like you are a cheerleader for a service.). The biggest disadvantage of "Cloud" is that it allows complete morons the ability to think they have the answer to every problem ever posed for distributed compute systems. Honestly, that is the worst part. After some douche bag tells a VP how they will be better off running MS Word in a cloud, there is usually weeks of meetings required to undo that damage. Masses of charts to show why it's not financially better, how it will add to unproductive hours for both users and admins, how it will put their IP at risk, etc...

      --

      -The wise argue that there are few absolutes, the fool argues that there are no probabilities.

    39. Re:But Remember - by dan828 · · Score: 1

      Fat, drunk, and stupid is no way to go through life, boy.

    40. Re:But Remember - by Anonymous Coward · · Score: 1

      Nothing new, it is just zeroes and ones

    41. Re:But Remember - by Skuld-Chan · · Score: 1

      Google has an offline mode that does work great.

    42. Re:But Remember - by Anonymous Coward · · Score: 0

      More capable, maybe - but not more reliable :-)

    43. Re:But Remember - by s.petry · · Score: 1

      Wow. You've not really thought your argument through. With the cloud you are not restricted to the number of mainframes you own. You couldn't flip a switch in the good old days and suddenly a new mainframe would pop out of thin air.

      You don't really seem to know what you're talking about. It's almost as if you've never actually used the cloud for anything decent. Your summation as simply being asynchronous form submissions speaks volumes.

      Wow, to be blunt you are ignorant. Do you really believe that you don't have to have infrastructure to add nodes in the cloud? It's actually a common piece of ignorance. If you don't own the infrastructure, then you are using someone else' infrastructure.

      You don't just snap your fingers and have a new cloud node pop out of thin air as you imply. Someone has to own all of the parts that you are using, someone has to power them up, someone has to manage the layers you don't manage.

      --

      -The wise argue that there are few absolutes, the fool argues that there are no probabilities.

    44. Re:But Remember - by Anonymous Coward · · Score: 0

      honestly, this is a "money ball" scenario. Last ditch attempt to hitch the wagon of M$ onto the tablet revolution.

    45. Re:But Remember - by Anonymous Coward · · Score: 0

      love it!! I tell all my friends who are trying to be cloud resellers - don't do it.

  2. Eggs? by OzPeter · · Score: 4, Insightful

    Basket?

    Or how about "Never outsource your core functionality?

    --
    I am Slashdot. Are you Slashdot as well?
    1. Re:Eggs? by alphatel · · Score: 1

      Never outsource your core functionality

      Or more specifically, don't cloud your reasons for using it. Know what you are getting before you go there.

      --
      When the foot seeks the place of the head, the line is crossed. Know your place. Keep your place. Be a shoe.
    2. Re:Eggs? by Sir_Sri · · Score: 4, Insightful

      Ah, so there's the question. How much would it cost for you to run a system with 'no' downtime? I'm at a university, some of our labs (not so much in comp sci but generally) have fairly specific requirements about say not losing power, because it would damage/destroy equipment or running experiments.

      But IT is more than just power. In almost 4 years here every year we've had several days of downtime for our main undergraduate server (the one undergrads are supposed to use for various things, and that handles their logins and file storage), and several on the separate but arguably more important staff server, which is supposed does the same thing, but that includes all of our grant applications.

      Causes of our server outages (I'm not an IT guy, this is just what they've told us that I can remember): Power failures. Yes we have battery backups, but they're only good for so long, and since none of our equipment suffers permanent damage without power this isn't high priority. Networking. We only have two redundant pipes. That, for home use for example, or most businesses is pretty good. For our pipes one goes to a host to the west, one to the east. I'm not specifically familiar with what failed that took our networking offline for 7 or 8 hours but it affected both pipes. Storage: bad raid controller on the main fileserver. This has a few cascading effects. If you don't realizing it's garbling data it ends up distributing that garble off to the backups or clones. When it crashes (which doesn't take that long after the controller starts getting messy) you may have several backups that need to be repaired. We can't do much to the file system while it's being repaired or rebuilt (which, afaik you should be able to do on most professional grade setups, but for whatever reason our linux guys can't get it to behave). Added fun: When the system comes back up, if you tried to access your e-mail while the file system was garbled you probably still can't. And you get no error message about it. It just spits back nothing, as though you have no new mail. The system is 'up' but doesn't work and you have to go into your directory and delete some files that most people have never heard of. It's not hard to do, but because you have no idea that there's a problem the less technically inclined (or just ESL) people in building full of computer scientists don't always fix it immediately. The net effect is that if the storage controller gets messed up, we're down for 3 or 4 days if not longer.

      And that's just one university department. We have a relatively decent amount of money, and several full time staff for these things. But we probably can't match any cloud services uptime, even with 7 or 8 hours of downtime regularly, not even close. It's not a trivial calculation, even a 50 or 60 employee outfit will probably have trouble matching Amazon or Azure uptime with a full time IT guy. There's probably a cross over point where you have enough employees to support big enterprise IT infrastructure and manpower, but only support it badly (there's not enough money for proper replication or whatever), and then eventually you get big enough that you just run everything in house anyway because there's definitely no cost advantage to hiring someone. For us, I think we have 5 or 6 IT staff, if we could toss 3 of them, + all of their equipment, you're looking at somewhere around 350, 400k/year to spend on a support contract. I'm guessing, but don't know, if you can get a cloud service for ~20 TB of reasonably reliable file and e-mail storage for less than 350k/year from these guys.

      The big place I see people right now (as a sort of flavour of the month) using cloud service as an augment to burst capacity needs. That's a whole other analysis.

    3. Re:Eggs? by phantomfive · · Score: 1

      Fortunately, there is a solution. You can have your own personal cloud . The best part, "access speed is just as fast as a local hard drive."

      --
      "First they came for the slanderers and i said nothing."
    4. Re:Eggs? by gweihir · · Score: 1

      Basket?

      Or how about "Never outsource your core functionality?

      That would be a good engineering practice. A good business practice is to show your initiative by outsourcing to the cloud and then hope to be promoted away before anything bad happens. It really is time for managers to be liable for the mistakes they make in long-term decisions.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    5. Re:Eggs? by sjames · · Score: 1

      I can see clearly now that the rain is gone...

    6. Re:Eggs? by RightSaidFred99 · · Score: 1

      Totally. I think we all know that homegrown solutions are 100% reliable, right? If you outsource it you could get stuck with some number of nines, and nines are stupid. 100% uptime is where it's at.

      Now, on a related matter these "hard drive" thingies don't fail, do they? I was thinking of maybe putting them in what some guy called a "RAID 0" configuration so the web server for my business is faster.

    7. Re:Eggs? by RightSaidFred99 · · Score: 1

      Good engineering practice? In what world? Outsourcing makes perfect sense in many cases. Providers like Amazon and Microsoft have better uptime than most small businesses would be able to achieve on their own.

    8. Re:Eggs? by gweihir · · Score: 2

      They even may have. It is still a gross mistake, because you need to keep control over core functionality, so you can adjust its parameters to your needs.

      Uptime is not all that counts and is not even very important. What is important is what you can do when the system becomes unavailable. In the cloud case, all you can do is wait and hope. If you kept control, you have various options and can have various recovery strategies, depending on your business needs.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    9. Re:Eggs? by AlienIntelligence · · Score: 1, Informative

      What ghetto assed university do you go to where they cannot get their server straight?

      -AI

      --
      For me, it is far better to grasp the Universe as it really is than to persist in delusion
    10. Re:Eggs? by Anonymous Coward · · Score: 0

      Community College

      Goddamm allergies are acting up again.

    11. Re:Eggs? by durdur · · Score: 1

      Good points. Near 100% uptime is intrinsically hard. And if you think your admins can do it better those at a dedicated cloud hosting provider .. well, maybe they can, but it's a good chance they can't. Get big enough and you can invest in the hardware, network and support resources to do it right, but that's not cheap.

    12. Re:Eggs? by AlienIntelligence · · Score: 1

      Community College

      Ahhh, well you hit the nail on the head about 100% uptime.

      That's something only "God" will allow. Any disagreements
      with what he doles out, require 1-2 diesel generators and a
      good sized tank of diesel.

      Then the tier level of internet pipes you get in, determine
      whether they give a crap about you being down for any
      period of time.

      Essentially, 100% uptime isn't possible just due to the
      unknown. Which is why everyone brags about 5x 9's etc.

      Almost everything needs to be triple redundant (min) and
      you need to be near a top tier backbone to get a 5x 9's.

      I'm thinkin your school isn't in that ballgame.

      -AI

      --
      For me, it is far better to grasp the Universe as it really is than to persist in delusion
    13. Re:Eggs? by magamiako1 · · Score: 1

      4 9's of downtime is < 1 hour in the entire year, just an FYI.

      Most reasonable datacenters can do 4 9's in all aspects whether it be networking or physical power.

    14. Re:Eggs? by magamiako1 · · Score: 1

      8,760 hours in the year
      4 9's (99.99%) uptime is 8759.124 hours.

      That means to achieve 4 9's you can only have < 1 hour of downtime per year. This is possible.

      Microsoft being out for 7+ hours is a nightmare.

    15. Re:Eggs? by Myopic · · Score: 1

      I'm not sure what your point is. Are you saying that private IT installations have lower downtime averages than cloud services? I've never known that to be true -- in fact exactly the opposite. And if that's not what you are saying, then doesn't that pretty much completely raze your point?

    16. Re:Eggs? by s.petry · · Score: 2

      Lets not forget that in order to hit that 4 or 5 9s, you have to have built in maintenance times which are down time but don't count toward your SLA.

      And before you ask, yes.. I believe that most places will stiff a maintenance window when something bad happens and they have down time. So that 4 hour server crash sits in the 4 hour maintenance window and we get 0 down time for the month.

      --

      -The wise argue that there are few absolutes, the fool argues that there are no probabilities.

    17. Re:Eggs? by snadrus · · Score: 2

      Or for free, if anyone's interested: http://www.ubuntu.com/cloud

      --
      Science & open-source build trust from peer review. Learn systems you can trust.
    18. Re:Eggs? by SplashMyBandit · · Score: 1

      Yeah, but any competent business that is bigger than *tiny* can get better uptimes - if they have any IT staff that know what they are doing (and have some redundancy built in). Even for small outfits a LAN or dedicated WAN circuits (easy to hire from a telco) is far more robust than relying on the general Internet (there are far more pieces to break, and not under your direct control with the Internet). So no, I would say that Amazon and Microsoft don't have better uptimes than a competent small business - and worse, when there is downtime with a Cloud provider *there is nothing the small business can do about it* (unlike if they control their own resources). It may be cheaper to use the cloud, but it is not more reliable and certainly there is a huge loss of control (which can impact reliability). Once the Cloud gets as reliable as (First World) electricity then it makes sense to switch. The Cloud is still quite far from the 'reliable electricity' model at the moment - unfortunately bosses want to believe in the Cloud bullshit, even what is promised doesn't yet match reality, so they suck it up. Good Engineering practice is to keep your business running, which is what the parent said. Good financial practice is something else.

    19. Re:Eggs? by Daniel+Phillips · · Score: 1

      On face.

      In the Microsoft Museum of Epic Fail this surely ranks right up there with leaving the London Stock Exchange twisting in the breeze for a whole day, on one of the busiest trading days of the year. Hmm, or the Navy missile cruiser towed back to port. Or actually too many highly amusing and costly epic fails to even begin to list. Maybe we're just lucky nobody had the bright idea of running Windows at a nuclear reactor. Oh wait.

      --
      Have you got your LWN subscription yet?
    20. Re:Eggs? by Sir_Sri · · Score: 1

      University of Western ontario. Comp sci department.

      Also, that was sort of my point. We *should* be able to get it straight, and can't. I don't really care that much, and in general it's not that big a deal. Students work on their own machines unless they specifically have labs, so the facilities we run are there to be used if people need want them, but most people don't need or want them. Such is live in a BYOD business, and for us, if you can't BYOD you probably won't have one, because we don't give kids laptops. They pay us to be here after all. (Er, i guess I pay us to be here too, I'm a grad student, so I sort of pay myself for the privilege of being a student).

    21. Re:Eggs? by Sir_Sri · · Score: 1

      Lots of businesses run 5 9's of uptime. You *can* do it. It's not a matter of *can* it's a matter of how much does it cost, and is it more or less economical to pay someone else to do some or all of it.

      We were hit by the "northeast blackout of 2003". Which, where I was (different than were I am now) was without power for 24 hours, most places were 7 or 8, some a lot more than 24. So the power distribution itself didn't manage 4 9's for 24 years averaged up. Ourbackup generator I think was only good for 8 hours. Whatever it was, we had to go in and shut everything down within a couple of hours of the power not coming back up (this was when I was working with equipment that really cannot lose power, controlled power down was very expensive).

    22. Re:Eggs? by Sir_Sri · · Score: 1

      addendum - I'm not the anonymous coward below. I'm not exactly sure what a community college is, there is a college here in town called Fanshawe and while we get some students from there and we train some of them (and my colleagues train them at Fanshawe) I do not attend that school. Community college is a US concept, and not being part of that education system (fortunately)I have nothing to do with it.

      Universities: academic degrees (e.g. computer science, engineering, physics, medicine) including advanced degrees (masters, doctorates) colleges: tradeskills (auto mechanic, Linux server administration, IIS administration that sort of thing).

    23. Re:Eggs? by Anonymous Coward · · Score: 0

      Not that I doubt your back-of-the-napkin math, but I don't see how throwing IT bodies at the problem improves reliability.

      We've experienced zero downtime the past eight years running two colo's with a single IT guy (me). A single admin means no miscommunication. I'd hazard that number of IT staff correlates positively with the number of unplanned outages, the opposite of your assumption.

    24. Re:Eggs? by NeutronCowboy · · Score: 1

      No one brags about 5 9s, because no one can maintain a system like that. The very best I've seen in my days monitoring some of the top tier companies and their top tier applications is 3 9s. Most companies are happy to hit 1 nine - yes, they're happy about up-time in the 95% area. 5 9s is for people powering up a server and checking ping responses. in other words, it's not a real metric.

      The cold, hard reality no one wants to acknowledge is that it is incredibly hard to keep a modern application working properly. Everyone wants their fancy always-on, always-connected, always-available and easy-to-maintain application to be available all the time, but only very few people know what that takes. Sir_Sri is one of the few people who managed to list the even very basic and common components that can fail. I'll add croaking routers, dug-up cables, fu-bared patches, admins hosing their instance and, in the case of multi-tenant environments (aka The Cloud), upgrades that don't properly account for edge-cases to the list.

      --
      Those who can, do. Those who can't, sue.
    25. Re:Eggs? by sensationull · · Score: 1

      The providers 9s mean nothing if you need it locally and you're upstream links have a flapping route, just ask some NZ schools about their recent Google crApps issues where all their traffic got sent the long and slow way areound for a day or two. Funnily all our local gear was fine.

    26. Re:Eggs? by Sir_Sri · · Score: 1

      Depends how much infrastructure you need. We really do have I think it's 6 IT guys. What exactly each of them does I'm not sure. We do have to create accounts for several hundred students every semester though, and they have to maintain all of the local infrastructure no matter what, 20 professors, 150 grad students 500 or so computers in labs. So you need some IT staff for that. You can't give every research group identical computers, in fact, they buy their own equipment for their own needs in some cases (with their own money, that's the whole 'being self funded thing'). So overall we might run 700 machines across I guess two buildings, with 3 main servers, 'staff' 'student' and 'research that shouldn't be mixed with students' (which I've never used so I have no idea what its uptime is), profs will have a 'office' machine, and then research lab (rather than teaching lab) machines as well, as will many grad students. But only those that work on stuff that shouldn't be near things students can touch.

      Now lets say you're talking about IT for 2500 employees. How about 6000? Our university has about 35 000 students, we have about 2500 staff total, and then lab computers. Do you want to be the sole sysadmin for what is probably about 10 000 machines? You have to have a collection of sysadmins (who probably have a hierarchy, but they're still sysadmins). There might be one overall admin, there usually is, but they have to delegate management of various assets to other people if you have enough stuff.

      One IT guy can do a decent job on a 50 or 60 person outfit, even at a few colo's. But managing 500 machines, that's pushing your luck. And that was my point about there being some crossover. Probably a 200 -300 person (computer) place is big enough to warrant enterprise IT, but not necessarily big enough to support enterprise IT properly. Think call centres and things like that. Lots of staff, high turnover, (high training), and you don't have the money to pay decent people in house for a lot of stuff either. Whereas a 6000 person outfit is necessarily playing at the enterprise IT level.

    27. Re:Eggs? by CSMoran · · Score: 1

      We've experienced zero downtime the past eight years running two colo's with a single IT guy (me). A single admin means no miscommunication. I'd hazard that number of IT staff correlates positively with the number of unplanned outages, the opposite of your assumption.

      Just don't get hit by a bus, man.

      --
      Every end has half a stick.
  3. Gloat gloat gloat. by SuricouRaven · · Score: 1

    One of the worst things about the cloud is that it can go wrong when someone else screws up, so you get the blame for their mistakes.

    1. Re:Gloat gloat gloat. by gral · · Score: 4, Insightful

      The companies I deal with tend to say things like, we want to go with a company like this so we can can get "Support". Which usually means, so we can blame them if something goes wrong.

      --
      Scott Carr
    2. Re:Gloat gloat gloat. by WeatherServo9 · · Score: 2
      This may depend on your specific company or situation. I get the impression our upper management likes the cloud so when things go wrong they can blame someone else (even if only partially). When we were doing things in house, it didn't matter who actually screwed up, ultimately management took the blame. With the cloud, they can now point fingers at someone else and hold up a contract stating this wouldn't happen. We're a company that's still just small enough that we are pretty much always understaffed and don't put enough money into hardware to have proper redundancies so things will go wrong eventually; since moving to the cloud, management can not only point blame elsewhere (it wasn't my people who caused the outage!) but can try (usually successfully) to get some discount or other compensation from the provider when downtime occurs.

      In the end I've found the move to have pros and cons. The pros are that we simply never had the hardware infrastructure to provide the uptime requested of us (yet we were denied budget to build said infrastructure). In theory, our cloud providers can provide that uptime (or so our contract says). Development of our sites has been a nightmare though, the environment seems to lend itself to easily creating all sorts of spaghetti code (not sure yet if that is our relative unfamiliarity with the environment and/or lack of skill from the company we outsourced some of the work to, etc). Really I prefer keeping things in house for more control and flexibility, but I'm outnumbered with that opinion and that definitely isn't the way things are going (at least for us).

    3. Re:Gloat gloat gloat. by Anonymous Coward · · Score: 0

      When management accepts blaming others rather then demanding success, they get CYA. It's managements fault, not IT.

  4. Cloud services not ready by UnknowingFool · · Score: 1

    One of the selling points of using cloud services was that it would be more reliable than managing your own hardware/software. But to date, every single big player has suffered major downtime. If I would be hesitant to believe the sales pitch.

    --
    Well, there's spam egg sausage and spam, that's not got much spam in it.
    1. Re:Cloud services not ready by characterZer0 · · Score: 3, Insightful

      Cluster at the application level and have nodes at different providers. If your volume is too high for that, you are big enough to host your own stuff.

      --
      Go green: turn off your refrigerator.
    2. Re:Cloud services not ready by timeOday · · Score: 3, Informative

      I agree, I have nothing against the idea of cloud services, but they do need to work and reputations are based on events like this. After an outage this long, it takes a LOOONG time to earn your way back to five nines (which works out to 5.5 minutes of downtime per year).

    3. Re:Cloud services not ready by vlm · · Score: 3, Insightful

      After an outage this long, it takes a LOOONG time to earn your way back to five nines (which works out to 5.5 minutes of downtime per year).

      Only 84 years per the article, and growing at a rate of a year every 5 minutes.

      Thats probably about how long it would take me to trust MS in an enterprise environment.

      --
      "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
    4. Re:Cloud services not ready by hawguy · · Score: 4, Insightful

      One of the selling points of using cloud services was that it would be more reliable than managing your own hardware/software. But to date, every single big player has suffered major downtime. If I would be hesitant to believe the sales pitch.

      But still, for most companies that are good candidates for cloud offerings, even 8 hours of downtime once a year is probably better than they can guarantee using their own infrastructure. Companies in this range tend to not have redundant servers, offsite backups, disaster recovery sites, etc. Larger companies that can build redundant infrastructure (and staff it properly) are probably better off staying away from the cloud since they can guarantee any level of uptime and redunancy they want to pay for.

      Of course, when a small company Admin spills a cup of coffee in the Exchange server and they are down for 5 days while building a replacement server, it doesn't make the news so you never hear about it...while when a large cloud provider has a 2 hour outage, it's all over the news.

    5. Re:Cloud services not ready by gstoddart · · Score: 2

      After an outage this long, it takes a LOOONG time to earn your way back to five nines (which works out to 5.5 minutes of downtime per year).

      I'd be surprised if Microsoft (or anybody) is actually offering five nines for uptime.

      The fine print often says "well, we don't actually promise anything, and any outage and loss is your problem".

      --
      Lost at C:>. Found at C.
    6. Re:Cloud services not ready by geekoid · · Score: 0

      That's not true at all. Who else had major downtime?

      --
      The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
    7. Re:Cloud services not ready by leonardluen · · Score: 4, Funny

      it's a leap year, they can be down a full day and still claim they were up for 365 days this year!

    8. Re:Cloud services not ready by UnknowingFool · · Score: 4, Informative

      You mean besides Amazon, SalesForce, VMWare, Google Gmail, Yahoo Mail, Apple iCloud. Seriously who hasn't had downtime?

      --
      Well, there's spam egg sausage and spam, that's not got much spam in it.
    9. Re:Cloud services not ready by dave420 · · Score: 1

      And there's a very good chance your own hardware/software would also suffer downtime in the same period.

    10. Re:Cloud services not ready by Surt · · Score: 1

      Amazon had major downtime.

      --
      "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
    11. Re:Cloud services not ready by Anonymous Coward · · Score: 0

      Azure customer here! They offer 99.9% uptime for storage services, and 99.95% uptime if your compute instance is in at least two separate geographic datacenters. They don't offer five nines, but they do offer a basic SLA.

    12. Re:Cloud services not ready by Anonymous Coward · · Score: 0

      Incompetence. You are running exchange. That is your first problem. 5 days to repair? Insane. I run a small company (10 people or so) and there is no way that could even happen. We don't use exchange and we certainly can be back up and running within hours normally. Worst case scenario is an unrecognized problem and/or no one is available for a few hours. We have daily off-site automated backups. Encrypted. About once a month we make a bit for bit image of the entire drive. About once every six months we make a bit for bit backup and send it to a third site.

      We are reliant on our web site being up to bring in money. We do run into issues from time to time like any operation and yet have never been down for any significant length of time. We have had 'total failures' where we were back up and running within a few hours to data coruption type of issues and missconfiguration issues and something went wrong we can't figure it out type of issues. Thing is we have always recovered quickly. At some point in the past we have an image that was working. Reverting back makes it easier to figure out what that issue is.

    13. Re:Cloud services not ready by hawguy · · Score: 1

      Incompetence. You are running exchange. That is your first problem. 5 days to repair? Insane. I run a small company (10 people or so) and there is no way that could even happen. We don't use exchange and we certainly can be back up and running within hours normally. Worst case scenario is an unrecognized problem and/or no one is available for a few hours. We have daily off-site automated backups. Encrypted. About once a month we make a bit for bit image of the entire drive. About once every six months we make a bit for bit backup and send it to a third site.

      We are reliant on our web site being up to bring in money. We do run into issues from time to time like any operation and yet have never been down for any significant length of time. We have had 'total failures' where we were back up and running within a few hours to data coruption type of issues and missconfiguration issues and something went wrong we can't figure it out type of issues. Thing is we have always recovered quickly. At some point in the past we have an image that was working. Reverting back makes it easier to figure out what that issue is.

      The outage I was referring to was a small non-profit that was using Microsoft Small Business Server since they got it for "free", it was acting as their exchange server and file server. Their backups consisted of a copy of the D: drive that someone burnt to DVD from time to time, with no backup of the C: drive (hey, we can reinstall it from the original CD images, why back it up!?).

      It sounds like you are more technical than they were and are in a better situation, but unless you have a DR site that is ready to go, I don't believe you have "hours" of recovery time if something catastrophic happens (fire in the datacenter, accidental fire suppression discharge, leaking toilet on the floor above you dripping onto your server rack, roof collapse during a heavy snow storm, 100 year flood, backhoe accident taking out your internet connection, earthquake, tornado, etc). Depending on what the disaster is, you and your other IT staff might be more concerned with keeping your home and family safe than getting the servers back online.

      If you've had multiple instances where you had a "total outage" due to data corruption issues and misconfiguration issues, I don't think you're in as good of shape as you think you are, you're missing the policies and procedures that are supposed to catch those problems before they cause an outage. The same policy and procedures that make it take a week for a simple firewall change in a large corporate IT environment, but that change has gone through several layers of approval and has been tested on the test system before being promoted to production.

    14. Re:Cloud services not ready by Anonymous Coward · · Score: 0

      And how long was Amazon down?
      (hours)

      Were ALL their datacenters down?
      (no)

      So did they experience any global downtime?
      (no)

      Just because your developers are piss poor at using the tools (amazon services) to create a redundant setup, doesn't mean others are piss poor too.

    15. Re:Cloud services not ready by Anonymous Coward · · Score: 0

      By 4am GMT, Microsoft had figured out what was going on.

      "We have identified the root cause of this incident. It has been traced back to a cert issue triggered on 2/29/2012 GMT," the software giant said.

      It indeed seems to be leap year related, I'd hasard some of their security checks can't cope with the 29nth day of february...

    16. Re:Cloud services not ready by thetoadwarrior · · Score: 1

      And downtime didn't exist when people ran their own hardware?

    17. Re:Cloud services not ready by Surt · · Score: 1

      Well, this story is about azure being down for 7 hours, Amazon was down for 8.
      And since Amazon's outage took out all of the US and EU, if you needed responsive applications in those regions you were hosed, and hosed from 840am - 450pm, basically a full work day for both services.

      --
      "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
  5. So merely days after announcing the G-Cloud... by phonewebcam · · Score: 2

    ...the British Governments Cloud service suffers the inevitable Microsoft kiss of death.

    1. Re:So merely days after announcing the G-Cloud... by Chris+Mattern · · Score: 4, Funny

      The hilarious part of this link is that the article detailing how screwed people are for depending on Microsoft's cloud services is stuffed with rollover ads for...Microsoft's cloud services!

    2. Re:So merely days after announcing the G-Cloud... by courteaudotbiz · · Score: 1
      Yes, just tried that, this is too cool! You talk about bad ads placement! :-)

      The article title is

      Government's G-Cloud service knocked offline by Microsoft Azure cloud computing outage

      And all around the page, an ad that says

      Get in the cloud - Microsoft Office 365

      Just like hearing a Subway commercial while in the restrooms of a McDonalds... Priceless!

    3. Re:So merely days after announcing the G-Cloud... by Anonymous Coward · · Score: 0

      since they have been down most of a day now, will they have to re-label it "Microsoft Office 364"?

    4. Re:So merely days after announcing the G-Cloud... by PhilHibbs · · Score: 1

      ...And all around the page, an ad that says

      Get in the cloud - Microsoft Office 365

      Well, clearly they need to release Microsoft Office 366 that works on leap years.

    5. Re:So merely days after announcing the G-Cloud... by leonardluen · · Score: 1

      clearly judging by the name they only intended it to work 365 days a year

    6. Re:So merely days after announcing the G-Cloud... by JourneymanMereel · · Score: 2

      No... it will still be up for 365 days this year... trouble is... it should have been up for 366.

      --
      Life has many choices. Eternity has two. What's yours?
    7. Re:So merely days after announcing the G-Cloud... by Deathmoo · · Score: 2

      So, that downtime would be a feature, not a bug? :)

    8. Re:So merely days after announcing the G-Cloud... by Myopic · · Score: 1

      What ads? The web has ads?

    9. Re:So merely days after announcing the G-Cloud... by Anonymous Coward · · Score: 0

      its even more amusing when one considers that Microsoft is the biggest "customer" of Microsoft cloud services.

  6. 2/29/2012 by MacBrave · · Score: 5, Interesting

    Leap year strikes again?

    1. Re:2/29/2012 by jtownatpunk.net · · Score: 1

      That was my first thought.

    2. Re:2/29/2012 by guygo · · Score: 1

      My thought, too. You'd think that after the Y2K madness coders would have learned to adopt more robust calendar implementations.

    3. Re:2/29/2012 by Anonymous Coward · · Score: 1

      Yes, according to the details I've read so far it has to do with a certificate issue regarding today's date. I assume their management platform uses signed certificates to access/control their nodes/clusters of servers and apparently today isn't a valid date, so it's not allowing it.

      Hilarious.

    4. Re:2/29/2012 by ColdWetDog · · Score: 1

      My thought, too. You'd think that after the Y2K madness coders would have learned to adopt more robust calendar implementations.

      Yeah, like the Mayan Long Date!

      --
      Faster! Faster! Faster would be better!
    5. Re:2/29/2012 by the_other_chewey · · Score: 5, Informative

      From the service dashboard:

      "4:00 AM UTC We have identified the root cause of this incident. It has been traced back to a cert issue triggered on 2/29/2012 GMT."

      So yeah, a leap day bug sounds probable.

    6. Re:2/29/2012 by Talderas · · Score: 1

      So who is it that isn't recognizing 2/29 as a valid date? The platform? The certificate?

      --
      "Lack of speed can be overcome. In the worst case by patience." --Znork
    7. Re:2/29/2012 by phantomfive · · Score: 2

      Microsoft has had leap year related problems before. Like last leap year. You'd think they'd have learned.

      --
      "First they came for the slanderers and i said nothing."
    8. Re:2/29/2012 by SplashMyBandit · · Score: 1

      Some coders are just shit and will never learn if you tell them (we lead them to water but they just don't listen - I've supplied working date-correct code in the past but some folks [usual Visual Basic schmucks] still want to do crappy hacks instead of doing it right).

      If you are working with dates you should never work with an internal representation that is not Gregorian (although, of course, you need to display Gregorian, you just don't use that for your internal date representation). For example, the Java standard library has some clunky date handling but the internal representation is in the *Julian* Calendar. This is just a count of days since some date in the past (turns out to be way in the past, where the Gregorian and Julian calendars are synchronized on that one date). That means date calculations become very easy (which is why java.util.Calendar can do all sorts of nifty calculations and get it right every time no matter whether it is a every-4-years leap year or every 400-years not-a-leap year.

      Coders are also very crap at dealing with Timezones. Even if they are aware of timezones most do not realise that one location can have multiple timezones (such as Standard Time and Daylight Time for a single timezone band). The key for internationalized stuff is always do calculations and storage in UTC *only*, and then convert to the local display timezone on output.

      The last thing people don't commonly realise is that there are some dates without timezone. Most points in time should have an attached timezone (+0 for UTC) but some should never have a timezone. A birthday is a classic example. Your birthday is the same date no matter what timezone you are in, but this does not mean that a timezone should be attached to your birthday, since the actual point in time may be the same but at the same point in time the date is different around the world. Example, in timezone +13 (where I am now) it could be my birthday but in England (+0) or the US (-6 in some parts) it will be the day before so is not my birthday, despite being the same moment in time. In this way a birthday should not have a timezone attached.

      If you think this is esoteric junk that devs shouldn't need to worry about then you're a bad developer (if you do coding at all). Turns out that even the pros get this wrong. When the F-22 fighter first crossed the International Date Line their software crashed and they lost most of their flight computers (but not all). Fortunately they could just make out their tanker and could follow it back to Hawaii. It also appears that the Windows Azure software platform may be choking because of the leap year - if so then that is woeful (but regrettably not abnormal for low-time Windows coders). Getting time and dates right does matter!

      Understanding time and dates is *fundamental* for professional software development. If you feel a bit hazy on the subjects covered here then it is time to do some research and experimental with the extra functionality your development tools offer (like I said, Java does this properly, which can be checking how they thought about this stuff).

    9. Re:2/29/2012 by Anonymous Coward · · Score: 0

      So yeah, a leap day bug sounds probable.

      Could just be a simple cert expiring...

    10. Re:2/29/2012 by Anonymous Coward · · Score: 0

      From the service dashboard:

      "4:00 AM UTC We have identified the root cause of this incident. It has been traced back to a cert issue triggered on 2/29/2012 GMT."

      So yeah, a leap day bug sounds probable.

      Leap days. How do they work?

    11. Re:2/29/2012 by Anonymous Coward · · Score: 0

      Looks like it. I got this from the Azure team just after cancelling my subscription today. Notice the date.

      "This mail is confirmation that your subscription to 3-Month Free Trial has been cancelled on Thursday, March 01, 2012. Contact one of our team members today at http://windowsazure.com/Support if you have any questions, or would like assistance. Thank you for using Microsoft Online Services. Windows Azure"

      I'm on the East Coast I cancelled at about 9pm EST 2/29/2012. If you go sign up for a free subscription and cancel it you should see the same thing.

  7. People never learn by Anonymous Coward · · Score: 0

    Never trust Microsoft. For anything. They can't even manage water vapor, for crying out loud.

  8. To quote the lady in the commercial... by Pollux · · Score: 4, Funny

    Yay, cloud!

    1. Re:To quote the lady in the commercial... by Anonymous Coward · · Score: 0

      Yay, cloud!

      Fixed that for you:

      Yay, cloud!

      (Quoth a Microsoft admin with a cert expiring on a leap day: "I just don't know what went wrong!")

  9. Now they're slashdotted, too... by Sqr(twg) · · Score: 4, Funny

    This is not helping, guys!

    1. Re:Now they're slashdotted, too... by Anonymous Coward · · Score: 1

      That's fine - they just have to go into the Azure service management and spin up new instances... oh wait...

  10. Wait by afidel · · Score: 1

    Wait, so Azure isn't down just the admin functionality is? Who gives a crap. Man, I can't spin up a new VM for 8 hours, boo hoo. This isn't an admin nightmare, the VM's being down for 8 hours would absolutely be a nightmare but the only admins this is a nightmare for are the poor guys working for MS trying to fix whatever the code monkeys screwed up =)

    --
    There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    1. Re:Wait by fuzzyfuzzyfungus · · Score: 1

      Given that one of the major selling points of 'cloud' is the ability to swiftly spin up(and down) instances as you do or don't require them, that's a bigger deal than it might otherwise be.

      If you are doing a BYO Server thing, or a conventional static-sized hosting package, and buying to fit largely static demand, you may never have touched the power button after you first shoved it in the rack and fired it up. However, if you are doing the cloud thing and not spinning stuff up and down pretty frequently, you are probably overpaying.

    2. Re:Wait by Sez+Zero · · Score: 2

      We have a vendor that provides software distribution through Azure. It is completely down; no software and not even the web-based administration panel.

      So it isn't just the ability to fire up new VMs, but (from my experience) seems to be a complete platform failure for some customers.

    3. Re:Wait by glassware · · Score: 2

      I concur with what others have said. There are numerous services, being provided by Azure, that are completely unreachable, and have been so for longer than seven hours.

    4. Re:Wait by im_thatoneguy · · Score: 2

      To be specific Microsoft said about 2.8% of customers lost storage/hosting.

    5. Re:Wait by Anonymous Coward · · Score: 0

      I have over 5 deployments, with redundant instances, lost them all, not just the management controls

    6. Re:Wait by Anonymous Coward · · Score: 0

      Still counts as only one customer. Oh the stats!

  11. last time by phantomfive · · Score: 5, Informative

    Last time a Microsoft cloud product went down, users sustained real data loss. Of course, Microsoft claimed it couldn't happen with Azure.

    --
    "First they came for the slanderers and i said nothing."
    1. Re:last time by Anonymous Coward · · Score: 1

      A customer described the problem to The Register as an 'admin nightmare' and said they couldn't understand how such an important system could go down.

      This customer is new to the concept of Microsoft, aren't they?

      Hell, they're new to the concept of the internet in general, most likely.

    2. Re:last time by wstrucke · · Score: 1

      The system that lost user's data was aptly named "Danger".

  12. it's OK, just explain it in the blog by alen · · Score: 1

    like google does when something goes wrong. just explain how you're going to change things and why it happened and it will all be OK

    1. Re:it's OK, just explain it in the blog by Anonymous Coward · · Score: 0

      This is Microsoft, are things ever OK?

    2. Re:it's OK, just explain it in the blog by Anonymous Coward · · Score: 0

      I agree.

  13. Credibility by hism · · Score: 2, Interesting

    At this point, the best way to keep their credibility from further deteriorating is to provide good reports on what is going on. E.g., not like PSN, more like Amazon. Currently that Azure dashboard doesn't even load for me... has it been slashdotted or something?

    As an aside: whenever a cloud system goes down, people come out to rag on the reliability of the cloud. While I'm also annoyed by the marketing guys throwing around "just put it in the cloud!!" as much as anyone else, and agree some applications make no sense living in the cloud, I'd also like to point out that for some people, doing the admin work in-house results in the same amount or more headaches.

  14. Is real failover redundancy a pipedream? by swb · · Score: 1

    It seems like even the biggest guys can't make it work reliably, and presumably given the high profile of these services, they're not afraid to throw money and smart people at these problems.

    1. Re:Is real failover redundancy a pipedream? by medcalf · · Score: 1

      Well, the real problem is that you can never eliminate human error. When combined with the difficulties and costs of maintaining a proper test environment (full duplicate of production, essentially), the odds of something going wrong are always going to be non-zero. Then when you add the interconnectivity that clouds require on top of that, the odds that that something that goes wrong will make everything go wrong all at once becomes non-zero as well. So failure modes for well-designed cloud services tend to be fewer, but more catastrophic, than for non-cloud environments.

      --
      -- Two men say they're Jesus. One of them must be wrong. - Dire Straits
  15. Feature Suggestion! by fuzzyfuzzyfungus · · Score: 5, Funny

    Since the image that "Azure" and "Cloud" conjurs up is more "sky" than "cloud" it would be my suggestion that Microsoft simply register chickenlit.tl and set up an Azure service status monitor/report page there.

    They could have an adorable cartoon chicken that, when the system is working normally, runs around scratching and pecking(speed dependent on load). When downtime occurs, it would begin squawking about how the sky is falling. What could make failure more endearing?

    Just to add that Microsoft touch, they could do the entire thing as a Microsoft Agent ActiveX control!

  16. Upgrade to Win 8 Beta? by Anonymous Coward · · Score: 1

    They thought it was ready.

  17. To the cloud! by Howard+Beale · · Score: 4, Funny

    Well...maybe not right now...

  18. BCoD by tsmithnj · · Score: 2

    It's the Blue CLoud of Death!

    1. Re:BCoD by Sulphur · · Score: 2

      It's the Azure CLoud of Death!

      FTFY

  19. This is why NASDAQ isn't using windows anymore... by Anonymous Coward · · Score: 0

    London stock exchange also is using Linux...

  20. The thing about clouds... by ThisIsAnonymous · · Score: 2

    When it rains, it poors...

  21. down sides of centralization and remote admins by Anonymous Coward · · Score: 0

    down sides of centralization and remote admins.

    Some times you are better off with local admins and systems.

    what is better all your sites / a big chunk of then down or just one?

    local admins or centralization with remote admins that do not know about each site local software setups?

  22. basket by Anonymous Coward · · Score: 0

    Yet another example of why the Cloud is not ready for production. No way I'm putting my eggs in there. Maybe development/testing, but never production.

  23. Ah, the cloud... by ErichTheRed · · Score: 4, Insightful

    It's funny how those of us who bring up issues of data security and service resiliency are dismissed as just trying to protect our jobs.

    Like so many other things, the actual technical underpinnings of "the cloud" are great, and have been standard fare for years. Virtual machines + flexible networking are a godsend for systems guys tasked with getting capacity for a new project up and going yesterday. I love being able to build and rip down entire test environments just to try something out...that used to mean a rack of physical servers, switchgear, etc. tied up while it was being used. That's why everyone's slowly coming around to the "private/hybrid cloud" model, which is really just code for "VMs + network capacity + something to tie it all together + maybe some external hosting".

    The problem is that "the cloud" is very badly misunderstood. As sson as a CIO sees "virtual, on-demand capacity without those pesky physical on-site machines and IT staff, for a fixed cost per compute-hour" everything else takes a back seat. Then, it's "why do we need IT staff on-site, everything's being taken care of in the cloud." Public clouds like Amazon or Azure are great for startups who can't really afford their own data centers, or even bigger businesses to offload some of the nonessential stuff. When you start looking at hosting everything though, the marketing hype of the cloud sometimes distracts people from realities that they have to contend with.

    Also, I'm not saying that businesses who go the private cloud or traditional hosting/outsourcing route won't have downtime -- they will. However, having onsite staff and infrastructure means you can work those staff until they fix the problem, and you have control over them. Most sane outsourcing contracts have SLAs in them stating that the vendor will expend X amount of effort to fix your problems. Cloud provider agreements, unless specifically mentioned otherwise, are "as is, where is, best effort restoration with no warranty." OK, maybe some providers will give you an SLA, but all that does is buy you free service at a later date if they violate it...it doesn't bring your application back online. You still have no choice but to sit and wait around for the provider to fix whatever's wrong...just ask Amazon EC2 customers about what happened during their last outage...

    Companies need to draw sane boundaries around hosted systems, and decide what is critical and what can be offloaded. Do I care about a set of development/test machines that get used once a month? Probably a lot less than the critical database/application servers that run my core business. Comfort level, cost per minute of downtime vs. cost of dedicated resources and other factors need to be carefully considered before jumping into the cloud with both feet.

    1. Re:Ah, the cloud... by geekoid · · Score: 2

      Just so you know, the data is still accessible in Azure, it's the management console that's
        down. That's still bad, but lets deal with the actual facts.

      A) the cloud doesn't need to mean offsite. It often is, but the philosophy can be brought in house.
      B) redundancy.

      Companies should completely adopt the cloud philosophy, but keep onsite system redundancy; which is still cheaper and easier then current non cloud solutions.

      The desktops should just be cloud machines. Note, I don't say dumb terminals bacause three is some use for local data, just not application data. Dumb terminal rely ion centralized storage, and processing. Cloud computers do the majority of the processing.

      I got to say, getting a new computer, and not needing to do a recovery, or build a system instance is pretty damn good.

      --
      The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
    2. Re:Ah, the cloud... by Anonymous Coward · · Score: 0

      Just so you know, the data is still accessible in Azure, it's the management console that's
            down. That's still bad, but lets deal with the actual facts.

      A) the cloud doesn't need to mean offsite. It often is, but the philosophy can be brought in house.
      B) redundancy.

      Companies should completely adopt the cloud philosophy, but keep onsite system redundancy; which is still cheaper and easier then current non cloud solutions.

      The desktops should just be cloud machines. Note, I don't say dumb terminals bacause three is some use for local data, just not application data. Dumb terminal rely ion centralized storage, and processing. Cloud computers do the majority of the processing.

      I got to say, getting a new computer, and not needing to do a recovery, or build a system instance is pretty damn good.

      God I wish people would stop saying cloud every other word. At this point I have no fucking idea what it is their actually referring to. The term is meaningless drivel at this point. Ask any two people what "cloud" means.

    3. Re:Ah, the cloud... by Anonymous Coward · · Score: 0

      Dude, why do you make stuff up? I see it in many of your threads, short quick answers with no references and most of your answers are just flat out wrong.

      Services ARE down, not just the admin interface. And.. being able to mange those services and expand them at will is one of the main purposes for many users of cloud services so if they can not expand and manage things, they are effectively limping along.

      Quotes I found..
      Microsoft is also experiencing problems with the availability of Azure Compute offerings and, as of 1.30pm, is "still troubleshooting this issue and verifying the most probable cause".

      The recently launched G-Cloud CloudStore is built on an Azure platform. G-Cloud representatives announced on Twitter two hours ago that a "power outage on Microsoft Azure means CloudStore is temporarily unavailable. Patch [is] being applied so will update when normal service resumed."

      At 5 a.m. GMT, Microsoft said less than 3.8 percent of hosted services had been affected, and measures had been taken to stop the problem "from spreading across the production environment."

      In addition, Azure customers in the north and south central U.S. as well as northern Europe may be experiencing some performance problems, according to a message on the dashboard posted at 10:55 a.m. GMT.

    4. Re:Ah, the cloud... by serviscope_minor · · Score: 1

      A) the cloud doesn't need to mean offsite. It often is, but the philosophy can be brought in house.

      Then it's a meaningless term which mean "a bunch of servers".

      The desktops should just be cloud machines.

      I literally don't know what you mean. The desktops should host virtual servers for the various services?

      I got to say, getting a new computer, and not needing to do a recovery, or build a system instance is pretty damn good.

      I'm not sure what this has to do with the cloud. Presumably you are setting up your machines to PXE boot some standard image (with a VM?), since that's the smallest level of setup I can think of. That's not exactly new functionality...

      --
      SJW n. One who posts facts.
    5. Re:Ah, the cloud... by Anonymous Coward · · Score: 0

      It's funny how those of us who bring up issues of data security and service resiliency are dismissed as just trying to protect our jobs.

      That's because you guys always try to make us use Windows.

  24. Advice by DickBreath · · Score: 4, Informative

    Use the MCSE mantra:
    1. Perform virus scan.
    2. If that doesn't work, find a different program that will display a reassuring green graphic.
    3. If that doesn't work, reboot.
    4. If that doesn't work, reformat, reinstall.
    5. If that doesn't work, GOTO 1.

    Microsoft wouldn't know anything about data center running if it were chase aftering them at full speedo.

    Google this: "Microsoft Sidekick / Danger"

    http://techcrunch.com/2009/10/10/t-mobile-sidekick-disaster-microsofts-servers-crashed-and-they-dont-have-a-backup/

    https://www.pcworld.com/article/173470/microsoft_redfaced_after_massive_sidekick_data_loss.html

    http://www.appleinsider.com/articles/09/10/11/microsofts_danger_sidekick_data_loss_casts_dark_on_cloud_computing.html

    --

    I'll see your senator, and I'll raise you two judges.
    1. Re:Advice by Anonymous Coward · · Score: 0

      Microsoft wouldn't know anything about data center running if it were chase aftering them at full speedo.

      Google this: "Microsoft Sidekick / Danger"

      In their defense, they *bought* Danger, and the Danger services weren't running on Azure (or Microsoft technologies, I thought). From what I hear from friends in The Know, Danger's stuff was pretty much held together with tape. Microsoft just happened to be holding the bag when shit hit the fan.

  25. Is Azure free? by nurb432 · · Score: 1

    if so, that's the breaks. If not, then there should be contractual SLAs and penalties involved.

    --
    ---- Booth was a patriot ----
    1. Re:Is Azure free? by courteaudotbiz · · Score: 1

      ...there should be contractual SLAs and penalties involved

      Do you really think Microsoft would put a gun on their own head like that, assuming they learned from their past?

      I think they provide the service "As-is and with best-effort service recovery". Read the fine prints, I'm sure you'll find something like that.

  26. Thunk by koan · · Score: 1

    One more nail in the Cloud coffin.

    --
    "If any question why we died, Tell them because our fathers lied."
    1. Re:Thunk by geekoid · · Score: 2

      Yes, just like flat tires are putting nails in the auto industry coffin.

      --
      The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
    2. Re:Thunk by Chris+Mattern · · Score: 2

      If you had "flat tires" that put thousands of cars out of service all at once, then flat tires *would* be putting nails in the auto industry coffin.

  27. Not like Salesforce, yet by mattr · · Score: 1

    I had an outage on Salesforce for 1 week and they did absolutely nothing regarding giving me any free account time or anything except "Sorry".
    Their explanation was a massive multiterabyte log file had to processed since what corruption they had extended to their backup.
    Shouldn't ever happen.
    This was last Autumn.
    All boy scouts should take away this: Cloud promises are made to be broken.

    1. Re:Not like Salesforce, yet by gweihir · · Score: 1

      Given how boastful and grand these claims are, this really is not a surprise to anybody competent. Complex systems fail. They fail in complex ways. Redundancy helps in some ways, but makes things worse in others, by increasing complexity.

      Also keep in mind that when outsourcing IT, the IT people suddenly have different business goals than you do. As long as they stay afloat, they do not really care whether you go under. In-house IT is different. They are sitting in the same boat. And any sane management will make sure they have all the benefits of being in this boat and so a huge motivation of keeping it going. Unfortunately, many managers just see IT as a problem.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    2. Re:Not like Salesforce, yet by mattr · · Score: 1

      Yes of course. I agree with you. I tried Salesforce with a client where it would have helped with up-front costs. But I am not convinced even this is a good argument in most cases.

      The interesting part is that Salesforce uses a monolithic, unique system that you cannot back up yourself or run on your own cloud as far as I know. Well actually I heard there is some way but it is so secret I could never get real info about it. They have secret APIs rolled out for special customers so you don't know all the possibilities of the system. And while there are some very interesting things about it - and I spent a lot of time becoming proficient in Apex their managed language - I don't think they put enough effort into the administration of the back end.

      The support engineers are extremely helpful. But they aren't actually administering the system, they are a help desk. The real admins are hidden behind an opaque wall and there are very frequent announcements of one system or another becoming less responsive for some minutes. I guess they are working hard but to me the proper level of service is "fanatical" and I don't feel that at Salesforce. I tried it thinking it would be an interesting market and it might still be but I think any cloud to be trusted has to be a transparent, open cloud backed up by your own systems at the very least.

  28. Great uptime! by gmuslera · · Score: 5, Funny

    Put your servers in the Azure cloud to have an uptime of 9.999999999%

    1. Re:Great uptime! by courteaudotbiz · · Score: 1

      You misplaced the "."... Oh wait...

    2. Re:Great uptime! by sourcerror · · Score: 1

      It's more than four nines, it must be really good!

  29. Resiliency vs. Control by medcalf · · Score: 1
    Clouds are, in a sense, all about using tight control to gain efficiency. Control requires centralization. But this introduces failure modes that are catastrophic: rather than degrading performance overall or seeing point failures, everything is perfect until everything is gone. Resiliency — the ability to survive failures and still function to some degree — requires decentralization both of infrastructure and of decision making power. So attempts to become more efficient, past a certain point, inevitably result in the destruction of the system.

    This is not just an IT observation. The same thing happens with biodiversity (fewer species means greater risk that a key part of a food chain will collapse and take the entire chain with it), the economy (ever notice how failures are getting bigger as government steps in more to prevent failures?), and any other complex system. Once a system is too big for a single human mind — and specifically the one in charge of the system — to contain its complexity and understand its failure modes, failure becomes inevitable. The fewer people allowed to understand and make decisions about the system, the more catastrophic the failures when they occur. The more complex the system, the more likely it is for the failures to occur. Which is to say, any complex system is at increased risk of catastrophic failure as it grows in complexity and as it becomes more centralized. Combine the two, and you're just waiting for the disaster to happen.

    --
    -- Two men say they're Jesus. One of them must be wrong. - Dire Straits
  30. Cloud ain't so bad by Martz · · Score: 5, Insightful

    I wrote a comment on slashdot a while back which questioned the sensibleness of running services in the cloud. I used to be a sceptic.

    Since then I've used Rackspace Cloud and found that it's actually a very good idea, for certain things.

    The benefits of using a cloud system are scalability and no commitment- it's not about reliability or higher availability - but you do get a little win in those areas.

    To give some examples, I was recently able to play around with mysql clustering. I followed a mysql clustering howto and played around with it, setup a mysql cluster with load balancers. Once I was finished geeking about, I saved the VMs to the file storage and deleted the cloud instances. Total cost a £/$2-3 maximum. I hadn't previously been able to do this, I would have had to rent a dedicated server which would serve websites, email etc. I couldn't really use the dedicated server to play with new technology in case it had a negative impact on the live systems. I did have development box for a while, but it essentially doubled my costs without making any more money, just offering some protecting.

    Now I have staging/development instances in the cloud - and no commitments to them - I don't have to worry about a £250 monthly bill or sign a 12 month contract to get my own box. I can fire up some resources, use them, and throw it away when I'm done.

    The upshot is that I can play around with other peoples cool open source software without risk or buggering something up on my live box, and the costs are insignificant since I'm only renting it per hour. I can try something new, if it works great - it might go/stay in production. If not, delete it and move onto the next cool thing.

    If I need high availability, I would use Rackspace, Amazon, Azure, and I'd ensure that I have a plan to deal with a major outage with any of the providers. Each have APIs, so in theory I could create new instances automagically and failover between different cloud providers with a quick DNS change, while keep costs low.

    To recap, the cloud isn't all about high availability - no matter what the marketing says. It's about scaling systems and running resources for small amounts of time, and is perfectly suited to services which have peak demand (ticket sales for example).

    1. Re:Cloud ain't so bad by PPH · · Score: 1

      Correct about the availability. Cloud services are a cheap way to rent processing/storage resources really cheap.

      If I need high availability, I would use Rackspace, Amazon, Azure, and I'd ensure that I have a plan to deal with a major outage with any of the providers. Each have APIs,

      Unless the API is proprietary (or just non standard) and the cloud operator introduces some systemic fault* into their services. What then?

      Building apps targeted to LAMP services (for example) don't necessarily suffer from these problems. Because not every provider is installing the same patches at the same time (or even running the same configurations). So you gain reliability from a sort of genetic diversity by distributing your app across several cloud providers.

      *I didn't read all TFAs, but this is what it seems to me just happened to Azure.

      --
      Have gnu, will travel.
    2. Re:Cloud ain't so bad by Martz · · Score: 1

      If the cloud operator is experiencing any sort of fault, they get dropped from serving the site/services and any DNS pointing to them gets changed to any of the other providers, and more instances spin up. This is either done manually or scripted to monitor the state of the cloud providers.

      The point is that using any one provider for anything, cloud or not, isn't a good idea.

    3. Re:Cloud ain't so bad by PPH · · Score: 1

      The point is that using any one provider for anything, cloud or not, isn't a good idea.

      Right. You'd think people would have learned about the hazards of monocultures by now.

      --
      Have gnu, will travel.
    4. Re:Cloud ain't so bad by Martz · · Score: 1

      The point is that using any one provider for anything, cloud or not, isn't a good idea.

      Right. You'd think people would have learned about the hazards of monocultures by now.

      Awesome monoculture reference :)

    5. Re:Cloud ain't so bad by dkf · · Score: 1

      If I need high availability, I would use Rackspace, Amazon, Azure, and I'd ensure that I have a plan to deal with a major outage with any of the providers. Each have APIs,

      Unless the API is proprietary (or just non standard) and the cloud operator introduces some systemic fault* into their services. What then?

      Warm up the lawyers.

      Seriously, if you've got a critical part of your business based on a single supplier, you're vulnerable to problems and so should have some other means of protecting yourself. Cloud computing is just yet another way that this can happen, but it's not particularly special in this regard. At the commoditized service end (i.e., IaaS) you can protect yourself by having multiple suppliers. For a specialized service, you have fewer options (but can potentially gain more benefit from that service, of course).

      --
      "Little does he know, but there is no 'I' in 'Idiot'!"
    6. Re:Cloud ain't so bad by Terrasque · · Score: 1

      So you'd make a cloud.. of cloud providers? Woah.. +1 "Yo dawg" to you, mate!

      --
      It's The Golden Rule: "He who has the gold makes the rules."
  31. The Daily Show by ISoldat53 · · Score: 1

    I wonder if this is what is causing the Daily Show to post a maintenance sign on login?

  32. When Clouds go down... by AB3A · · Score: 1

    It's called Fog.

    --
    Nearly fifty percent of all graduates come from the bottom half of the class!
  33. Azure down.... by non-plus · · Score: 0

    so, now that the Azure cloud is down and the news has hit Slashdot - the "service dashboard" has now been "slashdotted"

    Network Error (tcp_error)

    A communication error occurred: "Operation timed out"
    The Web Server may be down, too busy, or experiencing other problems preventing it from responding to requests. You may wish to try again at a later time.

    For assistance, please raise a ticket through the CSC Help Desk (E-mail: CSS_Internal_Help_Desk@csc.com), and provide the information on this page for Proxy: CSC-CHD-CDC-1

    ya'll killin' me :-)

  34. my money goes on.... by Anonymous Coward · · Score: 0

    ....a loop of death as cause for the outage! ^^

  35. Leap year by Anonymous Coward · · Score: 0

    29 February and unexpected downtime hummm

  36. Cloud Redundancy by Anonymous Coward · · Score: 0

    I have no empathy for any company who relies solely on a single provider. It seems as though nothing is every 100% reliable, and for those companies who rely on outside service providers, they need to understand that no external company will ever value the service as much as you do.

    For a while now, I have contemplated the necessity for a data layer which provides replication and failover, between two ENTIRELY separate clouds (think Azure and AWS/EC2). I just keep waiting for someone to do the legwork of developing this (I'm distracted on other projects).

  37. "This should never happen" ... Stupid by gweihir · · Score: 2

    People that believe the cloud is not as risk for downtimes are just stupid and deserve exactly what they get. The cloud not only has the normal risks any comparable infrastructure has, but also suffers from additional risks because of complex network connectivity, complex usage patterns and untried system administration patterns.

    People that still think this now are not only stupid but unwilling to learn, as the Amazon outage last year clearly showed the risks. In addition, Amazon is very likely more competent than Microsoft at this by any sane metric.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  38. Why not make it an Apple story? by Anonymous Coward · · Score: 0

    Apple's iCloud service is apparently using Azure (it's been reported many times), and since it's down, iCloud should in theory be down.

    So how come no one's putting the full on for page views by saying it's Apple's iCloud service is down and then just mentioning in passing it's because Microsoft's Azure is down?

    Right now we just have people commenting about cloud services. Why not add in the Android and Apple fanboys as well to the discussion?

    Page views, people!

  39. any chance... by Trailer+Trash · · Score: 1

    I could fix this with a $35 payment to someone?

  40. Cloud or just more Fog blown up your asses? by Anonymous Coward · · Score: 0

    Ill take care of my own data storage, Thank you....

  41. Re:The cloud runs Linux. by Anonymous Coward · · Score: 0

    I run cloud servers that are 24/7/365 with uptimes of well over 1000 days. Microsoft servers to the best of my knowledge can not do that.

    They can but this year there are 366 days..

  42. Cloud needs to be rebooted (twice) by Anonymous Coward · · Score: 0

    Anyone with an interest in meteorology knows that weather can be unpredictable and unreliable.
    No one *really* expects anything more from microsoft any more, do they???

  43. The Cloud Laws will Come when everybody is there by Anonymous Coward · · Score: 0

    Obama and his crew will want mandatory, no warrant backdoors. Prices will go up.

  44. Big companies running on Azure by Anonymous Coward · · Score: 0

    Is comedy central http://www.thedailyshow.com/ hosted on there?

  45. Azure Service Dashboard by Anonymous Coward · · Score: 0
  46. Low Frequency, High Amplitude by dcollins · · Score: 1

    My thinking on clouds and downtime is that it's pushing failures to be less frequent, but much higher-impact when they happen. That is, instead of 1 hour down a year (and N man-hours of work lost), you get 20 hours down every decade (and 1000*N lost man-hours or somesuch). Which is bad because we (psychologically and economically) are truly terrible at evaluating or dealing with once-in-a-generation huge catastrophes -- yet we seem to be arranging more of them all the time.

    Central planning and power relations and all that, I guess.

    --
    We know where leadership by an anti-intellectual "strongman" who scapegoats minorities and likes boisterous rallies goes
  47. Office 365 by TheNinjaroach · · Score: 3, Funny

    I've always had to laugh at the name "Office 365" -- the fact this happened on Leap Day amuses me to no end.

    --
    I went to eat some animal crackers and the box said, "Do not eat if seal is broken." I opened the box and sure enough..
  48. Blue sky of death by Anonymous Coward · · Score: 0

    When Windows was on your machine, you had blue screen of death.
    Now, online, the microsoft cloud is gone so I guess you get a blue sky of death.

  49. Maybe it's *because* it's Leap Day? by zooblethorpe · · Score: 3, Insightful

    I've always had to laugh at the name "Office 365" -- the fact this happened on Leap Day amuses me to no end.

    In light of Excel's horribly buggy code of handling Leap Day, I have to wonder if Microsoft's problems here might not be because it's Leap Day? Whaddaya bet Azure comes back up all fine and dandy once the date rolls over to 1 March instead of 29 February? I'm actually serious about this conjecture, this is not just an attempt at humor.

    On a different angle, does anyone else find it amusingly ironic that this service is named Azure, and now it's blue-screened? They've only gone up one letter -- now it's the ASOD.

    Cheers,

    --
    "What in the name of Fats Waller is that?"
    "A four-foot prune."
    1. Re:Maybe it's *because* it's Leap Day? by Dr.Dubious+DDQ · · Score: 1
      "does anyone else find it amusingly ironic that this service is named Azure, and now it's blue-screened?"

      No more so that naming their "cloud computing" platforming after the color of a cloudless sky...

      ("Hey! The Emperor has no Clouds!")

    2. Re:Maybe it's *because* it's Leap Day? by Dan+East · · Score: 1
      --
      Better known as 318230.
  50. Gah! MY EYES! by zooblethorpe · · Score: 1

    Microsoft wouldn't know anything about data center running if it were chase aftering them at full speedo.

    Well, THANK YOU so very much for putting that image in my head! All that brought to mind was Ballmer chasing people while wearing nothing but a Speedo.

    Where's the brain bleach?

    --
    "What in the name of Fats Waller is that?"
    "A four-foot prune."
  51. Please wait for updates to finish installing... by Anonymous Coward · · Score: 0

    Had to be critical updates to the cloud.

  52. What should this matter? by Trogre · · Score: 1

    This outage shouldn't affect Slashdot readers, since everyone here will be aware of these two fundamental principles of IT:

    1. Never trust Microsoft
    2. Never trust cloud services with anything important

    --
    "Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife
  53. They Probably Did This ON PURPOSE by Anonymous Coward · · Score: 0

    Microsoft has always been about the desktop and securing their market. The cloud takes things away from the desktop and their corporate clients.

    step 1: identify cloud as potential enemy
    step 2: create cloud of own
    step 3: advertise so everyone joins cloud
    step 4: randomly "loose the cloud" to hundreds/thousands/millions of cloud users - everyone gets scared of the "cloud" idea across the board, incl. other cloud service customers

    step 5: PROFIT! sell more fully independent PCs and servers!

  54. Megafail.com by symbolset · · Score: 1

    Lots of people couldn't access data or services that were hosted in Azure. The UK government for one, who had just migrated to Office 365 a few weeks ago. The Daily Show for another. Many others are reported. We're going to have news about these failures for weeks. Many millions of dollars were lost by services hosted in the Azure cloud that were down all day today for Leap Day - perhaps a billion dollars or more of sales. Everyone affected has a huge loss of face they're going to have to recover gradually over time, so the losses compound.

    Azure service management is still down over 24 hours after onset of the incident (link in TFS). They'll stand it up again eventually but they haven't yet. This is bad but it's not "lost data" bad yet. It's not "Danger" bad yet. Over the next few days we'll find out what actual data loss occurred, what transactions in flight were lost, which hosted databases were munged. Most simple hosting customers will be unaffected and they will skew the results so that Microsoft can say "only a few customers had severe issues." Though the prime enterprise customers with 10,000-100,000 users were totally hosed because they were most active, they're only one customer each so their customer count doesn't count in the PR scheme.

    The biggest loss is the loss of confidence. Azure hadn't failed this badly in public before and now it has. This is the failbar other services will have to get over to differentiate their service and some hosted cloud providers are now breathing a sigh of relief because this is a really low bar. "We haven't failed this bad yet!" will be their advertising slogan. When pressured for a competitive argument against Azure they're going to ask: "On Leap Day then what?" That's the five word closing argument for a whole lot of cloud services tomorrow.

    Azure becomes the "Azure screen of death." Ironically, Azure is the color of a cloudless sky. Perhaps the name is prophetic.

    --
    Help stamp out iliturcy.
  55. Re:The cloud runs Linux. by Eponymous+Hero · · Score: 1

    but do you run 24/7/366, that's the question here =P

    --
    insensitive clod overlords obligatory xkcd car analogy russian reversals whoosh pedant fanbois ftfy in 3...2...1..PROFIT
  56. data loss by Frank+T.+Lofaro+Jr. · · Score: 2

    When Amazon had that outage it was just thought to be an outage, but it turned out data was lost.

    Disruption is bad enough, but data loss is way worse, since people and businesses likely won't have their own backups, and loss of data, even a low percentage, can easily KILL a business.

    Lose an order or a customer's records or a customer's data and you likely lose a customer and get bad reputation.

    Lose business records and it might be impossible to exist.

    --
    Just because it CAN be done, doesn't mean it should!
  57. I wonder what their SLA is? by Anonymous Coward · · Score: 0

    If it's five 9's then that's gone!

  58. 28 days in February . . . . by Anonymous Coward · · Score: 0

    28 days in February . . . . . should be enough for ANYBODY.

  59. But our SLA guarantees... by Anonymous Coward · · Score: 0

    at least 9 5s of uptime!

  60. Use epoch seconds? by fsck! · · Score: 1

    How can date manipulation bring down a mission critical asset for 7 hours? Maybe someone can explain how you could accidentally write code that breaks this badly on Leap Day. I've never written anything that stores the data internally as anything other than epoch seconds or epoch milliseconds, precisely because it seems like a can of worms. I understand this is the norm for many Microsoft projects, right?

  61. MS PR will solve this! by Anonymous Coward · · Score: 0

    MS unconfirmed source said "Don't worry...we've scheduled a leap year remembrance event next year on Feb 29th"