Slashdot Mirror


The Exact Cause of the Zune Meltdown

An anonymous reader writes "The Zune 30 failure became national news when it happened just three days ago. The source code for the bad driver leaked soon after, and now, someone has come up with a very detailed explanation for where the code was bad as well as a number of solutions to deal with it. From a coding/QA standpoint, one has to wonder how this bug was missed if the quality assurance team wasn't slacking off. Worse yet: this bug affects every Windows CE device carrying this driver."

23 of 465 comments (clear)

  1. Warning, Y2.1K bug. by LostCluster · · Score: 3, Informative

    Just before anybody claims to have a foolproof solution to leap years, make sure you test against the year 2100. It's a multiple of four, but also a multiple of 100 that's not a multiple of 400... and therefore NOT a leap year.

    1. Re:Warning, Y2.1K bug. by LostCluster · · Score: 5, Informative

      Here's your 500 year plan:

      1900 - multiple of 100, not a multiple of 400, no leap day.
      2000 - was a multiple of 100, but also a multiple of 400 so we still had a leap day.
      2100 - see above
      2200 - not a multiple of 400, no leap day.
      2300 - not a multiple of 400, no leap day
      2400 - multiple of 400, so have the leap day anyway.

    2. Re:Warning, Y2.1K bug. by kybred · · Score: 5, Informative

      No need to hard-code, there's an established algorithm for computing this.

      Why not call it by its name: Zeller's Congruence.

  2. "Leaked"...? by Anonymous Coward · · Score: 5, Informative

    It's an open source driver from Freescale.

    1. Re:"Leaked"...? by Anonymous Coward · · Score: 4, Informative

      Who's job is worth leaking a driver for a dumb microsoft player.

      The code is not specific to the Zune. It is specific to the MC13783 PMIC RTC that is used in many different pieces of hardware.

      Do we know how this ended up on the net?

      The authors (Freescale Semiconductor, Inc.) released the source under the terms of the GPL.

      Also has anybody else noticed that the source code seems to be nicely written (bar the bug)..... somewhat surprising for microsoft I allways assumed there code was written by a bunch of children.

      Microsoft didn't write the code. It was written by Freescale Semiconductor, Inc.

  3. Re:If this interests you by Creepy+Crawler · · Score: 3, Informative

    Amazon link eh? meh.

    Try this link for your "sampling" : Deep C Secrets.

    Took only 15 seconds for that link. Enjoy.

    --
  4. Re:Import calendar? by Anonymous Coward · · Score: 5, Informative

    It is driver code supplied by the manufacturer of the hardware platform on which the Zune and a couple of other devices are built. This platform includes a real-time clock which counts seconds since midnight and days since 1/1/1980. Considering that hardware component prices are cut-throat, there is probably no quality management for the software whatsoever. If it appears to work, it ships.

  5. Re:Why is this a surprise? by panoptical2 · · Score: 3, Informative
    As Wikipedia would have it here...

    Windows Mobile is best described as a subset of platforms based on a Windows CE underpinning. Currently, Pocket PC (now called Windows Mobile Classic), SmartPhone (Windows Mobile Standard), and PocketPC Phone Edition (Windows Mobile Professional) are the three main platforms under the Windows Mobile umbrella. Each platform utilizes different components of Windows CE, as well as supplemental features and applications suited for their respective devices.

    So, every smartphone/PDA that currently uses Windows Mobile uses some form of CE.

  6. Re:Why write any date/time code? by p0tat03 · · Score: 5, Informative

    This was written by the Freescale guys, not MS, where it would make sense for the device manufacturer to ship their own date/time code.

  7. Probably Not A Widespread Issue by nato10 · · Score: 5, Informative

    This code is actually from the Windows CE OAL (OEM Abstraction Layer), part of the code that reads the current time from the RTC. As such, the implementation is hardware-dependent, which is why there isn't a standard implementation of this function for Windows CE.

    In addition, this code is in a portion of Windows CE source code provided by a device's BSP developer, not by Microsoft. In most cases, Windows CE BSP developers start with sample BSPs written by a processor's manufacturer -- in this case, Freescale -- and then improve it.

    It turns out that this bug is specific to the Freescale's BSP -- sample Windows CE BSPs for other procesors don't have it -- and other Freescale devices using Windows CE will only have this issue if their developers used this code verbatim. Since sample BSPs provided by processor manufacturers are often of poor quality, many Windows CE developers typically rewrite such functions. In other words, the impact of this particular bug may be quite limited, which may be why there haven't been reports of this issue on other devices.

    In this particular case, though, Microsoft (or a contractor) was the Zune's BSP developer, so they certainly should have caught this.

  8. Re:Import calendar? by nato10 · · Score: 5, Informative

    This is kernel-level code -- part of the OEM Abstraction Layer -- that is used to read the current time from the RTC, hence it is hardware-specific. RTCs on other processors, or Freescale-based devices using external RTCs, may implement the OemGetRealTime () function differently than Freescale has done here (the buggy ConvertDays () function is just a helper function).

  9. Re:Sad code, sad article by chalkyj · · Score: 5, Informative

    I think slashdot ate your < in the breaking line.

  10. Free Book web site. by Futurepower(R) · · Score: 3, Informative

    Wow. That link is to a book from a good web site: Free eBooks.

    Other free books about C and C++: Free C and C++ books

  11. Re:Regardless of whatever code in it is faulty by AuMatar · · Score: 4, Informative

    Because cleanup doesn't have access to the local variables of the calling function. This means they need to be passed in. The result is a very obscure function that takes in half a dozen or more variables and gets difficult to maintain since it's purpose makes absolutely no sense without the context in the calling function (not to mention easy to have bugs- forget to check just one pointer for null before using it and you're into undefined behavior, which may only occur in rare error conditions making it difficult to test for). Using a cleanup function like that just isn't practical.

    --
    I still have more fans than freaks. WTF is wrong with you people?
  12. Re:Import calendar? by AuMatar · · Score: 3, Informative

    It was found in driver code. Part of the goal of driver code is to be as lean and mean as possible- most embedded devices do not have a lot of rom space- what they have is measured in MBs, not GBs. Remember not all the world is cell phones and mp3 players. In that case writing your own leap year function is the correct answer- existing calendar libraries likely have far more functionality than you need and would blow out your size. Given a choice between statically linking an entire calendaring library and writing a simple IsLeapYear function, writing the leap year function is the correct choice for that environment.

    --
    I still have more fans than freaks. WTF is wrong with you people?
  13. Re:Import calendar? by profplump · · Score: 4, Informative

    It's really probably not. Most of the basic calendar functions in libc (or glibc or dietlibc or uLibc) were written for 8 MHz machines running with 1 MB of system memory -- they'd do just fine on your embedded system.

  14. Re:Sad code, sad article by gnasher719 · · Score: 4, Informative

    It's a bug in the Slashdot software, eating "less than" and "greater than" characters in "Plain Old Text" mode.

  15. Re:Wow. by Barny · · Score: 3, Informative

    Metallica?

    Damn you young kids these days. /me cranks motorhead back up while yelling to "git off his lawn!"

    --
    ...
    /me sighs
  16. Re:Import calendar? by TrekkieGod · · Score: 3, Informative

    It was found in driver code. Part of the goal of driver code is to be as lean and mean as possible

    He failed. In the function in question he had the number of days since Jan 1, 1980. At the end of the loop, he was supposed to have the number of years since 1980 + the number of days since the beginning of the current year. His solution was to iterate the year beginning from 1980, check to see if it's a leap year, then subtract 365 or 366 days accordingly. The loop would supposedly continue until the desired state is achieved but, because of the bug, became an infinite loop at the end of leap years.

    Not only was his function not "lean and mean" but it actually gets more expensive to run every year that passes :)

    I'm also curious as to why 1980 is the epoch, but that's not as important.

    --

    Warning: Opinions known to be heavily biased.

  17. Not only a zune bug toshiba gigabeat affected too by bigbigbison · · Score: 3, Informative
    --
    http://www.popularculturegaming.com -- my blog about the culture of videogame players
  18. Re:Import calendar? by jellomizer · · Score: 4, Informative

    Or from a more basic standpoint...
    People make mistakes.

    When testing leap year for a data set you like to see if you have a Febuary 29th and A March 1st, as well the days of the week are updated after the leap day. December 31st isn't on the top of days to check for leap year code.

    Secondly coding for date times even with good prebuilt libraries is a pain. Unfortunately Time and Date are not really good mathematical functions. 365 days a year except for every 4 years where there is 366 The subset of year is split to months where each value is different of having 28, 29, 30, 31 days in it. Then we have 7 days weeks, which do not divide nicely with any other greater time unit (except for the 28 day month, which is only happens once a year... except for a leap year) Now each day has 24 hours, split into 2 12 hour segments, each hour is split up with 60 minutes and then 60 second per minute. Then finally after the second we can start using the Metric niceity in programming. Oh! Oh! don't forget about TIme Zones, and Daylight savings time (which is different per country, state, and follows political lines more then geographic lines.), And if you are going at high speeds for aerospace applications those Crazy Einstine theories come into play.
    Now no one really goes with the same approach to follow all these crazy rules and having a common library is still tricky because we all do different math calculations, also when you do a time++, do you want it one more second like in Unix/Linux OS development or one extra day like in Microsoft SQL. Then when you get these values sorted or a quick search/filter. and you may need to sort them etc. American Time Format doesn't do a good job at this. So we need to switch it to European formats. All in all it is a lot of tough coding all of it is tough to QA Because you need to test all the times to truely know that there is no bugs in it.

    --
    If something is so important that you feel the need to post it on the internet... It probably isn't that important.
  19. Re:Let's make sure this gets installed everywhere by neokushan · · Score: 4, Informative

    Ok, I called your bluff. I actually went and searched for it.

    The VERY top link is this slashdot article which states:

    "We've all heard the story of Microsoft's battle cry of "DOS ain't done till Lotus won't run". Adam Barr investigates the myth, interviewing various Microsoft and Lotus old-timers (including Mitch Kapor), and finds no basis for its legitimacy or any case of 1-2-3 actually not running. Whom to blame for Lotus Notes is not discussed."

    I checked the next few links and they pretty much all pointed to the same article, namely this one. One site even described it as a "complete and utter annihilation of the myth".

    I actually thought you were disagreeing with me, but now I see you were pointing out that people have been claiming the same thing for years and it was just as unfounded then as it is now. Thank you, I couldn't have said it better myself.

    --
    +1 IDisagreeSoHeMustBeATrollOrAnAstroturferOrAShill
  20. Dos ain't done till Netware won't run by Terje+Mathisen · · Score: 3, Informative

    This should be the proper version of the quote:

    I know from the actual Novell developers (I worked for Novell in 1991-92) that on multiple occasions, Microsoft modified a new Dos version between the last beta and the actual release, in such a way that Novell's Netware client drivers stopped working.

    Terje

    --
    "almost all programming can be viewed as an exercise in caching"