The Exact Cause of the Zune Meltdown
An anonymous reader writes "The Zune 30 failure became national news when it happened just three days ago. The source code for the bad driver leaked soon after, and now, someone has come up with a very detailed explanation for where the code was bad as well as a number of solutions to deal with it. From a coding/QA standpoint, one has to wonder how this bug was missed if the quality assurance team wasn't slacking off. Worse yet: this bug affects every Windows CE device carrying this driver."
Just before anybody claims to have a foolproof solution to leap years, make sure you test against the year 2100. It's a multiple of four, but also a multiple of 100 that's not a multiple of 400... and therefore NOT a leap year.
It's an open source driver from Freescale.
Amazon link eh? meh.
Try this link for your "sampling" : Deep C Secrets.
Took only 15 seconds for that link. Enjoy.
It is driver code supplied by the manufacturer of the hardware platform on which the Zune and a couple of other devices are built. This platform includes a real-time clock which counts seconds since midnight and days since 1/1/1980. Considering that hardware component prices are cut-throat, there is probably no quality management for the software whatsoever. If it appears to work, it ships.
Windows Mobile is best described as a subset of platforms based on a Windows CE underpinning. Currently, Pocket PC (now called Windows Mobile Classic), SmartPhone (Windows Mobile Standard), and PocketPC Phone Edition (Windows Mobile Professional) are the three main platforms under the Windows Mobile umbrella. Each platform utilizes different components of Windows CE, as well as supplemental features and applications suited for their respective devices.
So, every smartphone/PDA that currently uses Windows Mobile uses some form of CE.
This was written by the Freescale guys, not MS, where it would make sense for the device manufacturer to ship their own date/time code.
integer function f_isleap(year) :: Return 0 if a year is NOT leap year and a 1 otherwise. .or. .and.
IMPLICIT NONE
c
c Purpose
c
c Description: Every fourth year is a leap year. c But NOT when divisible
c by 100, except if the year is divisible by 400.
c
integer Year
if((MOD(Year,400).eq.0)
% ((MOD(Year,4).eq.0)
(MOD(Year,100).ne.0))) then
f_isleap=1
else
f_isleap=0
endif
return
end
But of course FORTRAN is not fancy enough for super cool C# coders.
This code is actually from the Windows CE OAL (OEM Abstraction Layer), part of the code that reads the current time from the RTC. As such, the implementation is hardware-dependent, which is why there isn't a standard implementation of this function for Windows CE.
In addition, this code is in a portion of Windows CE source code provided by a device's BSP developer, not by Microsoft. In most cases, Windows CE BSP developers start with sample BSPs written by a processor's manufacturer -- in this case, Freescale -- and then improve it.
It turns out that this bug is specific to the Freescale's BSP -- sample Windows CE BSPs for other procesors don't have it -- and other Freescale devices using Windows CE will only have this issue if their developers used this code verbatim. Since sample BSPs provided by processor manufacturers are often of poor quality, many Windows CE developers typically rewrite such functions. In other words, the impact of this particular bug may be quite limited, which may be why there haven't been reports of this issue on other devices.
In this particular case, though, Microsoft (or a contractor) was the Zune's BSP developer, so they certainly should have caught this.
This is kernel-level code -- part of the OEM Abstraction Layer -- that is used to read the current time from the RTC, hence it is hardware-specific. RTCs on other processors, or Freescale-based devices using external RTCs, may implement the OemGetRealTime () function differently than Freescale has done here (the buggy ConvertDays () function is just a helper function).
I think slashdot ate your < in the breaking line.
Wow. That link is to a book from a good web site: Free eBooks.
Other free books about C and C++: Free C and C++ books
Because cleanup doesn't have access to the local variables of the calling function. This means they need to be passed in. The result is a very obscure function that takes in half a dozen or more variables and gets difficult to maintain since it's purpose makes absolutely no sense without the context in the calling function (not to mention easy to have bugs- forget to check just one pointer for null before using it and you're into undefined behavior, which may only occur in rare error conditions making it difficult to test for). Using a cleanup function like that just isn't practical.
I still have more fans than freaks. WTF is wrong with you people?
It was found in driver code. Part of the goal of driver code is to be as lean and mean as possible- most embedded devices do not have a lot of rom space- what they have is measured in MBs, not GBs. Remember not all the world is cell phones and mp3 players. In that case writing your own leap year function is the correct answer- existing calendar libraries likely have far more functionality than you need and would blow out your size. Given a choice between statically linking an entire calendaring library and writing a simple IsLeapYear function, writing the leap year function is the correct choice for that environment.
I still have more fans than freaks. WTF is wrong with you people?
It's really probably not. Most of the basic calendar functions in libc (or glibc or dietlibc or uLibc) were written for 8 MHz machines running with 1 MB of system memory -- they'd do just fine on your embedded system.
It's a bug in the Slashdot software, eating "less than" and "greater than" characters in "Plain Old Text" mode.
Metallica?
Damn you young kids these days. /me cranks motorhead back up while yelling to "git off his lawn!"
...
It was found in driver code. Part of the goal of driver code is to be as lean and mean as possible
He failed. In the function in question he had the number of days since Jan 1, 1980. At the end of the loop, he was supposed to have the number of years since 1980 + the number of days since the beginning of the current year. His solution was to iterate the year beginning from 1980, check to see if it's a leap year, then subtract 365 or 366 days accordingly. The loop would supposedly continue until the desired state is achieved but, because of the bug, became an infinite loop at the end of leap years.
Not only was his function not "lean and mean" but it actually gets more expensive to run every year that passes :)
I'm also curious as to why 1980 is the epoch, but that's not as important.
Warning: Opinions known to be heavily biased.
But why use it you do not have too?
Computer science books and even my highschool basic programming class mentioned its not proper programming to use a GOTO. Is there any computer science professor that supports GOTO statements in programs?
GOTOs are not inherently bad. At some level, they're unavoidable. (Have you ever programmed in assembly language?)
As for CS professors, Donald Knuth uses and defends GOTO statements:
http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pdf
(His code comments often make reference to why he uses GOTOS. For example, in his implementation of the classic Adventure game, he writes: "By the way, if you don't like |goto| statements, don't read this. (And don't read any other programs that simulate multistate systems.)" In http://www-cs-faculty.stanford.edu/~knuth/programs/15puzzle-korf1.w, he writes: "as a full professor with tenure, I don't have to worry about being fired when I use |goto| statements.")
Not every programmer is good and a good programmer will refrain from making the program harder to read or more difficult to debug when jr programmers modify it later.
GOTOs by their very nature encourage bad programming as much as pointers do.
Pointers encourage bad programming? No offense, but do you have any real programming experience beyond your "highschool basic programming class"?
http://www.anythingbutipod.com/archives/2009/01/zune-bug-actually-a-freescale-bug-affecting-toshiba-gigabeats-too.php
http://www.popularculturegaming.com -- my blog about the culture of videogame players
Or from a more basic standpoint...
People make mistakes.
When testing leap year for a data set you like to see if you have a Febuary 29th and A March 1st, as well the days of the week are updated after the leap day. December 31st isn't on the top of days to check for leap year code.
Secondly coding for date times even with good prebuilt libraries is a pain. Unfortunately Time and Date are not really good mathematical functions. 365 days a year except for every 4 years where there is 366 The subset of year is split to months where each value is different of having 28, 29, 30, 31 days in it. Then we have 7 days weeks, which do not divide nicely with any other greater time unit (except for the 28 day month, which is only happens once a year... except for a leap year) Now each day has 24 hours, split into 2 12 hour segments, each hour is split up with 60 minutes and then 60 second per minute. Then finally after the second we can start using the Metric niceity in programming. Oh! Oh! don't forget about TIme Zones, and Daylight savings time (which is different per country, state, and follows political lines more then geographic lines.), And if you are going at high speeds for aerospace applications those Crazy Einstine theories come into play.
Now no one really goes with the same approach to follow all these crazy rules and having a common library is still tricky because we all do different math calculations, also when you do a time++, do you want it one more second like in Unix/Linux OS development or one extra day like in Microsoft SQL. Then when you get these values sorted or a quick search/filter. and you may need to sort them etc. American Time Format doesn't do a good job at this. So we need to switch it to European formats. All in all it is a lot of tough coding all of it is tough to QA Because you need to test all the times to truely know that there is no bugs in it.
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
Ok, I called your bluff. I actually went and searched for it.
The VERY top link is this slashdot article which states:
"We've all heard the story of Microsoft's battle cry of "DOS ain't done till Lotus won't run". Adam Barr investigates the myth, interviewing various Microsoft and Lotus old-timers (including Mitch Kapor), and finds no basis for its legitimacy or any case of 1-2-3 actually not running. Whom to blame for Lotus Notes is not discussed."
I checked the next few links and they pretty much all pointed to the same article, namely this one. One site even described it as a "complete and utter annihilation of the myth".
I actually thought you were disagreeing with me, but now I see you were pointing out that people have been claiming the same thing for years and it was just as unfounded then as it is now. Thank you, I couldn't have said it better myself.
+1 IDisagreeSoHeMustBeATrollOrAnAstroturferOrAShill
I'm also curious as to why 1980 is the epoch, but that's not as important.
MS-DOS defines the epoch as Jan 1, 1980.
This should be the proper version of the quote:
I know from the actual Novell developers (I worked for Novell in 1991-92) that on multiple occasions, Microsoft modified a new Dos version between the last beta and the actual release, in such a way that Novell's Netware client drivers stopped working.
Terje
"almost all programming can be viewed as an exercise in caching"
How exactly do you think that divisions are implemented ? Even in silicon ? Did you realize that the number of cycle for a DIV instruction is high and dependent on the operand size ?
Ever wondered why the x86 DIV was 14 cycles for 8bits operands, 22 cycles for 16bits and 38 cycles for 32bits ? (hint 6 cycles constant data access + 1 cycle per bit in the subtract/shift loop)
And if the time your teacher told you that was a few years ago, before processor had hardware divide instruction that implemented the loop in silicon, then the pascal run time had to implement division by a series of subtractions and halving...
Now, if he told you that it just subtracted (without halving) then, yeah, he was wrong...