Slashdot Mirror


Azure Failure Was a Leap Year Glitch

judgecorp writes "Microsoft's Windows Azure cloud service was down much of yesterday, and the cause was a leap year bug as the service failed to handle the 29th day of February. Faults propagated making this a severe outage for many customers, including the UK Government's recently launched G-cloud service."

7 of 247 comments (clear)

  1. Re:Same Story / Different Day by firex726 · · Score: 5, Interesting

    What is with MS and their apparent inability to cope with leap years?

  2. It wasn't just Microsoft... by Anonymous Coward · · Score: 5, Interesting

    ...they just had the most publicly catastrophic failure. I just noticed that all of the Google Chat messages I received yesterday were sent to me at various times on December 31, 1969.

    And it also seems that I didn't even receive any of them until today, March 1, implying that they were incapable of even sending them yesterday.

  3. Re:Same Story / Different Day by UnknowingFool · · Score: 5, Interesting

    No it came from Freescale in a driver that Toshiba used. Not many know that the original Zune was a Toshiba Gigabeat with a new UI and outer shell.

    --
    Well, there's spam egg sausage and spam, that's not got much spam in it.
  4. Microsoft Never Has Been Good At Time by Greyfox · · Score: 4, Interesting
    Dealing with time is hard, but it's been amusing to watch them experience problems solved by UNIX decades earlier. Daylight savings time was a constant problem for them in the early days, though they seem to have mostly got that ironed out. Every so often they seem to have a regression for a piece of new hardware. Maybe they'll eventually get it right.

    Funnily enough, I used to work at IBM doing OS/2 tech support. OS/2 and Windows NT share a common heritage, so a lot of the behind-the-scenes problems I witnessed in OS/2 were (And sometimes still are) problems with Windows. I'm not sure if this is one of them, but I got a call once from a guy who was trying to use his OS/2 system to track satellites. The problem was, the OS/2 timer API specified that you could set milliseconds but it didn't seem to work. I tracked it down to a timing driver which tracked two separate interrupts. The first interrupt happened every few milliseconds and would update the clock millis when that happened. However, if the system was busy it was possible to not handle that interrupt. There was also a system periodic interrupt every 1 second. When that occurred, the system hard-reset the milli time and incremented the seconds. So you could set the millis, but the clock would become inaccurate 1 second later. Just one example of how time has been a thorn in my side for my entire career. I wrote an APAR up on it which was promptly closed "Working as Designed." Dunno if he ever got it fixed...

    --

    I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

  5. Re:Same Story / Different Day by UnknowingFool · · Score: 4, Interesting

    According to the details I know it had to do with certificate validation. So part of Azure is using some code that doesn't use standard Windows APIs. Not shocking is that MS does not conform to standards. Shocking is that they don't conform to their own standards.

    --
    Well, there's spam egg sausage and spam, that's not got much spam in it.
  6. Re:Same Story / Different Day by jc42 · · Score: 5, Interesting

    What is with MS and their apparent inability to cope with leap years?

    I would like to know the same thing. This seems to be systemic.

    Yeah; it's systemic. Or at least it used to be a few years back, and I wouldn't be surprised if they haven't fixed the basic problem yet. The problem is fairly simple: Windows' internal clock is in local time.

    To a programmer with experience writing date/time code, I've found that this is all you need to tell them. Any software whose internal clock is in local time will be buggy, and it will never be completely fixed. Attempts to fix bugs will merely introduce bugs elsewhere in the chains of date/time handling. The sensible solution is to adopt a "universal time" internally, and convert at the last stage when you present the date/time to a human user. Yes, you theoretically can work with local time internally, but (teams of) humans can't actually make this work in practice. The best they can do is make it work in the "normal" cases. Bug fixes then tend to just move the time bugs around to different places in the code. But it can be very difficult to get management to accept this and agree to UT-only internally.

    Java also used to specify local time internally (and may still do so, but I haven't used it in years). I worked on a number of projects where, after repeated date/time disasters at every switch to/from DST and every Feb 29, java was abandoned and everything was rewritten in a language (usually C++) whose libraries supported a UT timestamp and didn't have all those time bugs.

    Does anyone know if MS Windows has introduced a UT internal time yet? If not, then we can reliably predict that such bugs will continue to plague their users.

    --
    Those who do study history are doomed to stand helplessly by while everyone else repeats it.
  7. Re:Who could have foreseen a leap year coming? by VortexCortex · · Score: 3, Interesting

    In all fairness, Microsoft never figured anyone would still be using this service by the time a leap year rolled around.

    Ah, that explains why Zunes went dark on New-Years 2009...

    Think about this. You're a software dev, and you use a MS C++ compiler. They wrote their standard libs, including the "time.h" / &ltctime&gt code... you use their time libraries.
    Now two things:
    0. MS employs some real nut-jobs that can't even use the standard time functions and instead write their own for each project...
    or
    1. MS doesn't even trust their own compiler / libraries to do the right thing?

    It scares me to think that MS makes operating systems... IMHO, they should get back to BASICs.