Slashdot Mirror


The Exact Cause of the Zune Meltdown

An anonymous reader writes "The Zune 30 failure became national news when it happened just three days ago. The source code for the bad driver leaked soon after, and now, someone has come up with a very detailed explanation for where the code was bad as well as a number of solutions to deal with it. From a coding/QA standpoint, one has to wonder how this bug was missed if the quality assurance team wasn't slacking off. Worse yet: this bug affects every Windows CE device carrying this driver."

28 of 465 comments (clear)

  1. Warning, Y2.1K bug. by LostCluster · · Score: 3, Informative

    Just before anybody claims to have a foolproof solution to leap years, make sure you test against the year 2100. It's a multiple of four, but also a multiple of 100 that's not a multiple of 400... and therefore NOT a leap year.

    1. Re:Warning, Y2.1K bug. by LostCluster · · Score: 5, Informative

      Here's your 500 year plan:

      1900 - multiple of 100, not a multiple of 400, no leap day.
      2000 - was a multiple of 100, but also a multiple of 400 so we still had a leap day.
      2100 - see above
      2200 - not a multiple of 400, no leap day.
      2300 - not a multiple of 400, no leap day
      2400 - multiple of 400, so have the leap day anyway.

    2. Re:Warning, Y2.1K bug. by kybred · · Score: 5, Informative

      No need to hard-code, there's an established algorithm for computing this.

      Why not call it by its name: Zeller's Congruence.

    3. Re:Warning, Y2.1K bug. by PIBM · · Score: 2, Informative

      Insightful ?

      I believed it was a joke.. We are speaking of embeded devices with a very limited amount of resources to spare, and it was already very well readable along with being written in the way it's usually described.

      Haven't you learned that it's a leap year every 4 year first, then later on someone taught you it wasn't every 100 years even if divisble by 4, and then there's an exception if it's divisible by 400 where it's still a leap year, or they started in the reverse order?

      Also, what if the chosen fractions (100 / 400) to split the time were not chosen to be a multiple of each other ? Reversing the equation would not work in that case.

  2. "Leaked"...? by Anonymous Coward · · Score: 5, Informative

    It's an open source driver from Freescale.

    1. Re:"Leaked"...? by Anonymous Coward · · Score: 4, Informative

      Who's job is worth leaking a driver for a dumb microsoft player.

      The code is not specific to the Zune. It is specific to the MC13783 PMIC RTC that is used in many different pieces of hardware.

      Do we know how this ended up on the net?

      The authors (Freescale Semiconductor, Inc.) released the source under the terms of the GPL.

      Also has anybody else noticed that the source code seems to be nicely written (bar the bug)..... somewhat surprising for microsoft I allways assumed there code was written by a bunch of children.

      Microsoft didn't write the code. It was written by Freescale Semiconductor, Inc.

  3. Re:If this interests you by Creepy+Crawler · · Score: 3, Informative

    Amazon link eh? meh.

    Try this link for your "sampling" : Deep C Secrets.

    Took only 15 seconds for that link. Enjoy.

    --
  4. Re:Import calendar? by Anonymous Coward · · Score: 5, Informative

    It is driver code supplied by the manufacturer of the hardware platform on which the Zune and a couple of other devices are built. This platform includes a real-time clock which counts seconds since midnight and days since 1/1/1980. Considering that hardware component prices are cut-throat, there is probably no quality management for the software whatsoever. If it appears to work, it ships.

  5. Re:Why is this a surprise? by panoptical2 · · Score: 3, Informative
    As Wikipedia would have it here...

    Windows Mobile is best described as a subset of platforms based on a Windows CE underpinning. Currently, Pocket PC (now called Windows Mobile Classic), SmartPhone (Windows Mobile Standard), and PocketPC Phone Edition (Windows Mobile Professional) are the three main platforms under the Windows Mobile umbrella. Each platform utilizes different components of Windows CE, as well as supplemental features and applications suited for their respective devices.

    So, every smartphone/PDA that currently uses Windows Mobile uses some form of CE.

  6. Re:Why write any date/time code? by p0tat03 · · Score: 5, Informative

    This was written by the Freescale guys, not MS, where it would make sense for the device manufacturer to ship their own date/time code.

  7. No problem since 1966 in FORTRAN by slugmass · · Score: 2, Informative

    integer function f_isleap(year)
                IMPLICIT NONE
    c
    c Purpose :: Return 0 if a year is NOT leap year and a 1 otherwise.
    c
    c Description: Every fourth year is a leap year. c But NOT when divisible
    c by 100, except if the year is divisible by 400.
    c
                integer Year
                if((MOD(Year,400).eq.0) .or.
              % ((MOD(Year,4).eq.0) .and.
                        (MOD(Year,100).ne.0))) then
                      f_isleap=1
                else
                      f_isleap=0
                endif
                return
                end

    But of course FORTRAN is not fancy enough for super cool C# coders.

  8. Probably Not A Widespread Issue by nato10 · · Score: 5, Informative

    This code is actually from the Windows CE OAL (OEM Abstraction Layer), part of the code that reads the current time from the RTC. As such, the implementation is hardware-dependent, which is why there isn't a standard implementation of this function for Windows CE.

    In addition, this code is in a portion of Windows CE source code provided by a device's BSP developer, not by Microsoft. In most cases, Windows CE BSP developers start with sample BSPs written by a processor's manufacturer -- in this case, Freescale -- and then improve it.

    It turns out that this bug is specific to the Freescale's BSP -- sample Windows CE BSPs for other procesors don't have it -- and other Freescale devices using Windows CE will only have this issue if their developers used this code verbatim. Since sample BSPs provided by processor manufacturers are often of poor quality, many Windows CE developers typically rewrite such functions. In other words, the impact of this particular bug may be quite limited, which may be why there haven't been reports of this issue on other devices.

    In this particular case, though, Microsoft (or a contractor) was the Zune's BSP developer, so they certainly should have caught this.

  9. Re:Import calendar? by nato10 · · Score: 5, Informative

    This is kernel-level code -- part of the OEM Abstraction Layer -- that is used to read the current time from the RTC, hence it is hardware-specific. RTCs on other processors, or Freescale-based devices using external RTCs, may implement the OemGetRealTime () function differently than Freescale has done here (the buggy ConvertDays () function is just a helper function).

  10. Re:Sad code, sad article by chalkyj · · Score: 5, Informative

    I think slashdot ate your < in the breaking line.

  11. Free Book web site. by Futurepower(R) · · Score: 3, Informative

    Wow. That link is to a book from a good web site: Free eBooks.

    Other free books about C and C++: Free C and C++ books

  12. Re:Regardless of whatever code in it is faulty by AuMatar · · Score: 4, Informative

    Because cleanup doesn't have access to the local variables of the calling function. This means they need to be passed in. The result is a very obscure function that takes in half a dozen or more variables and gets difficult to maintain since it's purpose makes absolutely no sense without the context in the calling function (not to mention easy to have bugs- forget to check just one pointer for null before using it and you're into undefined behavior, which may only occur in rare error conditions making it difficult to test for). Using a cleanup function like that just isn't practical.

    --
    I still have more fans than freaks. WTF is wrong with you people?
  13. Re:Import calendar? by AuMatar · · Score: 3, Informative

    It was found in driver code. Part of the goal of driver code is to be as lean and mean as possible- most embedded devices do not have a lot of rom space- what they have is measured in MBs, not GBs. Remember not all the world is cell phones and mp3 players. In that case writing your own leap year function is the correct answer- existing calendar libraries likely have far more functionality than you need and would blow out your size. Given a choice between statically linking an entire calendaring library and writing a simple IsLeapYear function, writing the leap year function is the correct choice for that environment.

    --
    I still have more fans than freaks. WTF is wrong with you people?
  14. Re:Import calendar? by profplump · · Score: 4, Informative

    It's really probably not. Most of the basic calendar functions in libc (or glibc or dietlibc or uLibc) were written for 8 MHz machines running with 1 MB of system memory -- they'd do just fine on your embedded system.

  15. Re:Sad code, sad article by gnasher719 · · Score: 4, Informative

    It's a bug in the Slashdot software, eating "less than" and "greater than" characters in "Plain Old Text" mode.

  16. Re:Wow. by Barny · · Score: 3, Informative

    Metallica?

    Damn you young kids these days. /me cranks motorhead back up while yelling to "git off his lawn!"

    --
    ...
    /me sighs
  17. Re:Import calendar? by TrekkieGod · · Score: 3, Informative

    It was found in driver code. Part of the goal of driver code is to be as lean and mean as possible

    He failed. In the function in question he had the number of days since Jan 1, 1980. At the end of the loop, he was supposed to have the number of years since 1980 + the number of days since the beginning of the current year. His solution was to iterate the year beginning from 1980, check to see if it's a leap year, then subtract 365 or 366 days accordingly. The loop would supposedly continue until the desired state is achieved but, because of the bug, became an infinite loop at the end of leap years.

    Not only was his function not "lean and mean" but it actually gets more expensive to run every year that passes :)

    I'm also curious as to why 1980 is the epoch, but that's not as important.

    --

    Warning: Opinions known to be heavily biased.

  18. Re:Regardless of whatever code in it is faulty by 1729 · · Score: 2, Informative

    But why use it you do not have too?

    Computer science books and even my highschool basic programming class mentioned its not proper programming to use a GOTO. Is there any computer science professor that supports GOTO statements in programs?

    GOTOs are not inherently bad. At some level, they're unavoidable. (Have you ever programmed in assembly language?)

    As for CS professors, Donald Knuth uses and defends GOTO statements:

    http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pdf

    (His code comments often make reference to why he uses GOTOS. For example, in his implementation of the classic Adventure game, he writes: "By the way, if you don't like |goto| statements, don't read this. (And don't read any other programs that simulate multistate systems.)" In http://www-cs-faculty.stanford.edu/~knuth/programs/15puzzle-korf1.w, he writes: "as a full professor with tenure, I don't have to worry about being fired when I use |goto| statements.")

    Not every programmer is good and a good programmer will refrain from making the program harder to read or more difficult to debug when jr programmers modify it later.

    GOTOs by their very nature encourage bad programming as much as pointers do.

    Pointers encourage bad programming? No offense, but do you have any real programming experience beyond your "highschool basic programming class"?

  19. Not only a zune bug toshiba gigabeat affected too by bigbigbison · · Score: 3, Informative
    --
    http://www.popularculturegaming.com -- my blog about the culture of videogame players
  20. Re:Import calendar? by jellomizer · · Score: 4, Informative

    Or from a more basic standpoint...
    People make mistakes.

    When testing leap year for a data set you like to see if you have a Febuary 29th and A March 1st, as well the days of the week are updated after the leap day. December 31st isn't on the top of days to check for leap year code.

    Secondly coding for date times even with good prebuilt libraries is a pain. Unfortunately Time and Date are not really good mathematical functions. 365 days a year except for every 4 years where there is 366 The subset of year is split to months where each value is different of having 28, 29, 30, 31 days in it. Then we have 7 days weeks, which do not divide nicely with any other greater time unit (except for the 28 day month, which is only happens once a year... except for a leap year) Now each day has 24 hours, split into 2 12 hour segments, each hour is split up with 60 minutes and then 60 second per minute. Then finally after the second we can start using the Metric niceity in programming. Oh! Oh! don't forget about TIme Zones, and Daylight savings time (which is different per country, state, and follows political lines more then geographic lines.), And if you are going at high speeds for aerospace applications those Crazy Einstine theories come into play.
    Now no one really goes with the same approach to follow all these crazy rules and having a common library is still tricky because we all do different math calculations, also when you do a time++, do you want it one more second like in Unix/Linux OS development or one extra day like in Microsoft SQL. Then when you get these values sorted or a quick search/filter. and you may need to sort them etc. American Time Format doesn't do a good job at this. So we need to switch it to European formats. All in all it is a lot of tough coding all of it is tough to QA Because you need to test all the times to truely know that there is no bugs in it.

    --
    If something is so important that you feel the need to post it on the internet... It probably isn't that important.
  21. Re:Let's make sure this gets installed everywhere by neokushan · · Score: 4, Informative

    Ok, I called your bluff. I actually went and searched for it.

    The VERY top link is this slashdot article which states:

    "We've all heard the story of Microsoft's battle cry of "DOS ain't done till Lotus won't run". Adam Barr investigates the myth, interviewing various Microsoft and Lotus old-timers (including Mitch Kapor), and finds no basis for its legitimacy or any case of 1-2-3 actually not running. Whom to blame for Lotus Notes is not discussed."

    I checked the next few links and they pretty much all pointed to the same article, namely this one. One site even described it as a "complete and utter annihilation of the myth".

    I actually thought you were disagreeing with me, but now I see you were pointing out that people have been claiming the same thing for years and it was just as unfounded then as it is now. Thank you, I couldn't have said it better myself.

    --
    +1 IDisagreeSoHeMustBeATrollOrAnAstroturferOrAShill
  22. Re:Import calendar? by lewiscr · · Score: 2, Informative

    I'm also curious as to why 1980 is the epoch, but that's not as important.

    MS-DOS defines the epoch as Jan 1, 1980.

  23. Dos ain't done till Netware won't run by Terje+Mathisen · · Score: 3, Informative

    This should be the proper version of the quote:

    I know from the actual Novell developers (I worked for Novell in 1991-92) that on multiple occasions, Microsoft modified a new Dos version between the last beta and the actual release, in such a way that Novell's Netware client drivers stopped working.

    Terje

    --
    "almost all programming can be viewed as an exercise in caching"
  24. Re:Thats crazy thinking! by 7+digits · · Score: 2, Informative

    How exactly do you think that divisions are implemented ? Even in silicon ? Did you realize that the number of cycle for a DIV instruction is high and dependent on the operand size ?

    Ever wondered why the x86 DIV was 14 cycles for 8bits operands, 22 cycles for 16bits and 38 cycles for 32bits ? (hint 6 cycles constant data access + 1 cycle per bit in the subtract/shift loop)

    And if the time your teacher told you that was a few years ago, before processor had hardware divide instruction that implemented the loop in silicon, then the pascal run time had to implement division by a series of subtractions and halving...

    Now, if he told you that it just subtracted (without halving) then, yeah, he was wrong...