The Exact Cause of the Zune Meltdown
An anonymous reader writes "The Zune 30 failure became national news when it happened just three days ago. The source code for the bad driver leaked soon after, and now, someone has come up with a very detailed explanation for where the code was bad as well as a number of solutions to deal with it. From a coding/QA standpoint, one has to wonder how this bug was missed if the quality assurance team wasn't slacking off. Worse yet: this bug affects every Windows CE device carrying this driver."
Here's your 500 year plan:
1900 - multiple of 100, not a multiple of 400, no leap day.
2000 - was a multiple of 100, but also a multiple of 400 so we still had a leap day.
2100 - see above
2200 - not a multiple of 400, no leap day.
2300 - not a multiple of 400, no leap day
2400 - multiple of 400, so have the leap day anyway.
It's an open source driver from Freescale.
It is driver code supplied by the manufacturer of the hardware platform on which the Zune and a couple of other devices are built. This platform includes a real-time clock which counts seconds since midnight and days since 1/1/1980. Considering that hardware component prices are cut-throat, there is probably no quality management for the software whatsoever. If it appears to work, it ships.
This was written by the Freescale guys, not MS, where it would make sense for the device manufacturer to ship their own date/time code.
This code is actually from the Windows CE OAL (OEM Abstraction Layer), part of the code that reads the current time from the RTC. As such, the implementation is hardware-dependent, which is why there isn't a standard implementation of this function for Windows CE.
In addition, this code is in a portion of Windows CE source code provided by a device's BSP developer, not by Microsoft. In most cases, Windows CE BSP developers start with sample BSPs written by a processor's manufacturer -- in this case, Freescale -- and then improve it.
It turns out that this bug is specific to the Freescale's BSP -- sample Windows CE BSPs for other procesors don't have it -- and other Freescale devices using Windows CE will only have this issue if their developers used this code verbatim. Since sample BSPs provided by processor manufacturers are often of poor quality, many Windows CE developers typically rewrite such functions. In other words, the impact of this particular bug may be quite limited, which may be why there haven't been reports of this issue on other devices.
In this particular case, though, Microsoft (or a contractor) was the Zune's BSP developer, so they certainly should have caught this.
This is kernel-level code -- part of the OEM Abstraction Layer -- that is used to read the current time from the RTC, hence it is hardware-specific. RTCs on other processors, or Freescale-based devices using external RTCs, may implement the OemGetRealTime () function differently than Freescale has done here (the buggy ConvertDays () function is just a helper function).
I think slashdot ate your < in the breaking line.
Because cleanup doesn't have access to the local variables of the calling function. This means they need to be passed in. The result is a very obscure function that takes in half a dozen or more variables and gets difficult to maintain since it's purpose makes absolutely no sense without the context in the calling function (not to mention easy to have bugs- forget to check just one pointer for null before using it and you're into undefined behavior, which may only occur in rare error conditions making it difficult to test for). Using a cleanup function like that just isn't practical.
I still have more fans than freaks. WTF is wrong with you people?
It's really probably not. Most of the basic calendar functions in libc (or glibc or dietlibc or uLibc) were written for 8 MHz machines running with 1 MB of system memory -- they'd do just fine on your embedded system.
It's a bug in the Slashdot software, eating "less than" and "greater than" characters in "Plain Old Text" mode.
No need to hard-code, there's an established algorithm for computing this.
Why not call it by its name: Zeller's Congruence.
Or from a more basic standpoint...
People make mistakes.
When testing leap year for a data set you like to see if you have a Febuary 29th and A March 1st, as well the days of the week are updated after the leap day. December 31st isn't on the top of days to check for leap year code.
Secondly coding for date times even with good prebuilt libraries is a pain. Unfortunately Time and Date are not really good mathematical functions. 365 days a year except for every 4 years where there is 366 The subset of year is split to months where each value is different of having 28, 29, 30, 31 days in it. Then we have 7 days weeks, which do not divide nicely with any other greater time unit (except for the 28 day month, which is only happens once a year... except for a leap year) Now each day has 24 hours, split into 2 12 hour segments, each hour is split up with 60 minutes and then 60 second per minute. Then finally after the second we can start using the Metric niceity in programming. Oh! Oh! don't forget about TIme Zones, and Daylight savings time (which is different per country, state, and follows political lines more then geographic lines.), And if you are going at high speeds for aerospace applications those Crazy Einstine theories come into play.
Now no one really goes with the same approach to follow all these crazy rules and having a common library is still tricky because we all do different math calculations, also when you do a time++, do you want it one more second like in Unix/Linux OS development or one extra day like in Microsoft SQL. Then when you get these values sorted or a quick search/filter. and you may need to sort them etc. American Time Format doesn't do a good job at this. So we need to switch it to European formats. All in all it is a lot of tough coding all of it is tough to QA Because you need to test all the times to truely know that there is no bugs in it.
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
Ok, I called your bluff. I actually went and searched for it.
The VERY top link is this slashdot article which states:
"We've all heard the story of Microsoft's battle cry of "DOS ain't done till Lotus won't run". Adam Barr investigates the myth, interviewing various Microsoft and Lotus old-timers (including Mitch Kapor), and finds no basis for its legitimacy or any case of 1-2-3 actually not running. Whom to blame for Lotus Notes is not discussed."
I checked the next few links and they pretty much all pointed to the same article, namely this one. One site even described it as a "complete and utter annihilation of the myth".
I actually thought you were disagreeing with me, but now I see you were pointing out that people have been claiming the same thing for years and it was just as unfounded then as it is now. Thank you, I couldn't have said it better myself.
+1 IDisagreeSoHeMustBeATrollOrAnAstroturferOrAShill