Azure Failure Was a Leap Year Glitch
judgecorp writes "Microsoft's Windows Azure cloud service was down much of yesterday, and the cause was a leap year bug as the service failed to handle the 29th day of February. Faults propagated making this a severe outage for many customers, including the UK Government's recently launched G-cloud service."
Anyone remember trying to turn on their Zune 3.5 years ago? That didn't work so well either.
If they can't handle an exception that is around since 2k years ago, what about newer exception? Would be interesting to see what could happen next June 30.
30 years ago, Arthur David Olson started engineering a solution to this problem that persists to this day, and which he supported personally for all but the last few months. The systems I have that run his software have never even burped through legislative changes of the calendar, leap-seconds, and the Century leap-year day, which is a separate cycle from the 4-year one.
Bruce Perens.
Some of the common leap year bugs that I've seen over the years:
1. A matrix with the number of days per month:
e.g. smallint dayspermonth[12]={31,28,31,30,31,30,31,31,30,31,30,31};
Indexing into the matrix for February (index 1) ignores leap years.
1. A matrix with 365 elements to represent a year's worth of something:
e.g. smallint hightemps[365];
This usually doesn't fail until Dec 31, when hightemp[mydate.dayofyear()-1] points to a non-existent element.
Of course, if dayofyear is calculated using the matrix in the prior bug, it will fail invisibly since that will be incorrect
as well.
2. Quck-n-dirty subtract one year math:
e.g. Convert date to char in YYYYDDMM format, convert char to int, subtract 10000, convert back to a char and then date.
Why people do this when you can dateadd(year,mydate,-1) is that easy, I have no clue. But it breaks horridly when
you use it to determine "one year ago today" from Feb 29.
Yeah, it was a really stupid bug, especially when you consider the OS provides a very useful set of APIs for dealing with it (basically convert a SYSTEMTIME (day/month/year/mm/hh/ss) into a FILETIME (64-bit unsigned int similar to time_t), do your math (the compiler will handle the 64-bit computations for you) and convert it back. Two OS calls.
If you're having ot do leap year calculations or even any sort of date calculations, stop. The OS or library will probably already have a set of functions for doing date calculations without you have to do it manually. Given how easy they are to screw up, far better to leave it to someone else.
Hell, given Windows worked fine, I don't even want to know what Azure is doing - the fundamental OS and runtimes all handle leap year date calculations with aplomb. Heck, that might be some of the oldest code in the kernel these days because it was written a long time ago, works well and has been thoroughly debugged through the decades.