Slashdot Mirror


The Exact Cause of the Zune Meltdown

An anonymous reader writes "The Zune 30 failure became national news when it happened just three days ago. The source code for the bad driver leaked soon after, and now, someone has come up with a very detailed explanation for where the code was bad as well as a number of solutions to deal with it. From a coding/QA standpoint, one has to wonder how this bug was missed if the quality assurance team wasn't slacking off. Worse yet: this bug affects every Windows CE device carrying this driver."

20 of 465 comments (clear)

  1. Re:Old by LostCluster · · Score: 3, Interesting

    Yep, but it deserves to be covered so that everybody hears it. It's not just a laugh at Microsoft story, but also a lesson to aspiring programmers to watch there step when it comes to timekeeping. Gotta get a mention to the people who look at /. at work, gotta get a mention to the people who visit weeknights, gotta mention it for the weekend crowd.

  2. Re:Import calendar? by gcnaddict · · Score: 2, Interesting

    Well, that might explain why the same clock doesn't exist in the subsequent Zunes, but who knows.

    I'm more disturbed by one of the comments and the subsequent reply.

    --
    Viable Slashdot alternatives: https://pipedot.org/ and http://soylentnews.org/
  3. QA?? by Gorimek · · Score: 2, Interesting

    This kind of bug is where TDD shines. If you don't write any code unless you have a test that forces you to, it's very hard to produce this bug type.

    (TDD = Test Driven Development)

  4. Bigger bugs have gotten through on Windows CE by msgmonkey · · Score: 5, Interesting

    For example I had some code I developed on Windows CE 4.2 .NET which kept on hanging on calling the FindWindow() fuction call.

    Turns out that trying to find a window by class name will hang (this version of) CE every time, even though you would have thought its a very much used function call and would be caught by CE.

    So no I'm not surprised at all that this bug got through.

  5. Sad code, sad article by gnasher719 · · Score: 3, Interesting

    Both the original code and the various corrections in the article don't catch what the algorithm is supposed to do, and therefore create code that is too complicated.

    The essence of the algorithm is this: We start with number of days since 1/Jan/1980, with the first day having the number one. We want to end up with the correct year, with a day number relative to the first day of that year, with the first day again having the number one. So we set year = 1980. And as long as day is greater than the number of days in that year, we can't have the right value yet, so we change day and year accordingly. This produces a very simple loop:

    for (;;) {
        int daysInYear = IsLeapYear (year) ? 366 : 365;
        if (day = daysInYear) break;
        day -= daysInYear; year += 1;
    }

    This is what Knuth called an "N + 1/2" loop: A loop pattern where a more or less substantial bit of code has to be executed at the beginning of the loop before we can decide whether the loop needs exiting or continuing. By following the "N+1/2 loop" pattern we avoid repeating the same code (with possible small changes) completely. And that exactly was the problem here: The same code was used twice but slightly differently (one set number of days = 365, the other made it dependent on whether the year was a leap year or not). The solutions given in the article all contain repeated code; either two loop exits, or a duplicated calculation of the number of days in a year.

  6. Re:Regardless of whatever code in it is faulty by concernedadmin · · Score: 5, Interesting

    Lines 122, 521, 690, 710, and 748 scare me; gotos in C code...

    They've used one form of a goto that's actually quite readable and useful. Would you rather have:

    if (condition1 && condition2) {
    /* boilerplate code with a return */
    }

    if (issue1 || issue2) {
    /* same repeated boilerplate code with a return */
    }

    or

    if (condition1 && condition2) {
    goto cleanup;
    }

    if (issue1 || issue2) {
    goto cleanup;
    }
    cleanup:
    /* just one instance of this code,
    no need for duplication of efforts */
    Believe it or not, there are useful reasons to use goto, and Microsoft happened to use goto for the right reason here. The Linux kernel also happens to use this practice to boost the readability of the code.

  7. Not so uncommon by fermion · · Score: 2, Interesting
    These functions that are only used once in great while are the devil to test. I think anyone who has programmed in any complex situation will have to admit to one of these silly bugs, and maybe even the bug going to production.

    What I see here is a really convoluted piece of code to perform a really simple task. There are a lot of constants that are written as constants. If there a #define orginyear, the why not #define daysperyear and #define daysperleapyear. The first is used only once, while the rest are used twice.

    In any case, the fundamental problem is not encapsulating data. This is quite a common error is code architecture. In this case, this function knows a lot of things it does not need to know. It know about leap years, number of days, and all this confuses the reader. They layout of the function already has the overhead of a fuction call, so why do we not let this overhead work for us by not returning the proxy leap year boolean, but what we actually want, which is the number of days in this year.

    int daysperyear;
    for(;;)
    {
    daysperyear=howmaydaysthisyear(year);
    if(days>daysperyear)
    {
    days -= daysperyear;
    year++;
    }
    else
    {
    break;
    }
    };

    In this case all days per year information and leap year information is encapsulated in a single function, and the top function does not need to know about either. This, I think, is writing quality into code, and not depending on QA to catch mistakes common to novice programmers. No guarantee this will work as is, it is just psuedo code, not even checking the logic completely.

    --
    "She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
  8. Re:Why is this a surprise? by winphreak · · Score: 1, Interesting

    Also, most GPS units use some form of CE, but afaik it's all old versions that're pretty cut up to only do the basics. So I really doubt the driver issue would apply to them either.

    --
    "I'm a well-wisher, in that I don't wish you any specific harm."
  9. Calendrical Calculations by kabloom · · Score: 4, Interesting

    The proper way to do this would be with division and modulus, which gives you a nice constant time solution even if you're still using your Zune in 2108. They ought to read Calendrical Calculations by Nachum Dershowitz and Ed Reingold and learn how to do this properly.

  10. Re:Warning, Y2.1K bug. by Anonymous Coward · · Score: 2, Interesting

    That doesn't seem to be the best way to write that code: I would imagine something more like:

    static int IsLeapYear(int Year)
    {
    if ((Year % 400) == 0) return 1;
    if ((Year % 100) == 0) return 0;
    if ((Year % 4) == 0) return 1;

    return 0;
    }

  11. WWSD? by qazwart · · Score: 2, Interesting

    Way back in the pre-Cambrian days when I actually was a decent C programmer, there was a book chalked full of algorithms. I can't remember now if it was the "Stevens" book or the "Stevenson" book. It was our bible. Our guide. The holiest book in our bookshelf. Whenever we got the yen to do some programming, we always took out the "Stevens" book and asked ourselves "What Would Stevens Do?"

    In this day, is there not one such book or place where someone says "Gee, I have to write some code that will calculate the date, day of week, and year from a fixed day. I wonder if I can look up this bit of code in some reference book, and do it right the first time?"

    And, then the second question: Why in the heck does the Zune care a fig about today's date? I believe there's some other device on the market that rhymes with "Shapple ShiPod" that does something similar to the Zune and yet doesn't care one whit about today's date. I won't claim that particular device is error free, but I but you a couple of doughnuts that it won't freeze up the day before a big holiday because it doesn't realize that 2008 has 366 days in it.

  12. Re:Modified Julian Day by rcw-home · · Score: 2, Interesting

    OP refers to usage #1 (a linear mapping of dates from c. 4700 BC onward).

    You'd seriously use this for doing calculations between two dates on a modern calendar? You'd convert beginning-of-the-day-midnight to middle-of-the-day-midnight and back again? You'd flip a coin and decide whether or not to store dates internally in a common timezone? You'd add in your own leap years when necessary? (which brings us back to this bug - please look at what exactly the faulty source code was trying to do!)

    There are very good reasons for internally storing dates as ordinal. But unless there is a good reason not to, please use your operating system's (or SQL database's, etc) native format/epoch for it, and please use their code, not your own, for calculating those dates. And if you do find yourself in a position where you're the one writing that code for others' benefit, please be at least as pedantic as I have been in this thread. Society at large is counting on you.

  13. Re:Regardless of whatever code in it is faulty by QRDeNameland · · Score: 4, Interesting

    The addition of single bool avoids both the specialized cleanup() function and the goto:

    bool needs_cleanup = false;

    if (condition1 && condition2) {

    needs_cleanup = true;

    }

    if (issue1 || issue2) {

    needs_cleanup = true;

    }

    if (needs_cleanup) {

    // clean up local vars exactly as you would have done

    // have done under the cleanup: label with the goto

    }

    --
    Momentarily, the need for the construction of new light will no longer exist.
  14. Here's what TDD would have really done by SuperKendall · · Score: 2, Interesting

    This kind of bug is where TDD shines.

    I'm not so sure. Let's look at the timeline without TDD:

    1) Microsoft writes method (say, one hour).
    2) Microsoft discovers but on December 31st, 2008
    3) Microsoft spends one hour fixing bug (assuming documentation and source control and test of fix)

    Now lets look at that timeline with TDD:

    1) Microsoft writes method (let's say one hour again)
    2) Microsoft writes test for method. Test includes random dates but not December 31st, 2008. One hour.
    3) Microsoft discovers but on December 31st, 2008
    4) Microsoft spends one hour fixing bug (assuming documentation and source control and test of fix)
    5) Microsoft updates test (one more hour to make sure all cases are caught)

    Basically they spent more time on both ends, but very likely would not have prevented the error anyway. You could perhaps say the fix would take lest time since there's less testing to be done, but that's not really true as you have to verify the (also simple) changes to the test suite are correct as well...

    The only advantage that TDD would have given is one more chance for the developer to think about the possible edge case after the method was written. But I would argue that with anything that fundamental more time should have gone into initial development, and TDD is the death of a thousand cuts in terms of time to write and maintain tests. Over time that gets unwieldy - I'm a believer in tests when they are meaningful and light and do not detract too much from time spent improving the code instead of tests.

    Indeed I can also see where TDD could well have caused this bug. Many TDD proponents would write the test first, and then code to it - which is just the kind of thinking that lets you relax when you should be at your most vigilant, actually writing the code that does the work. I find it a lot easier to consider possibilities when I am staring at a piece of code that does some work as opposed to compiling and coding a list of potential issues into a test.

    Another potential issue is that tests tend to be written by the programmer who produced the original code, and of course the natural urge is to produce tests that fit the code as is, since the general thinking is to prevent bugs from future changes. I'm a huge believer (from experience) in the value of having a QA department that does nothing but write test code and makes sure the code always passes that. It works far better than programmers managing code and really produces quality efforts. Unfortunately it has the same ossifying effect where refactoring is harder as you go along because tests must be altered along with code.

    --
    "There is more worth loving than we have strength to love." - Brian Jay Stanley
    1. Re:Here's what TDD would have really done by Gorimek · · Score: 2, Interesting

      There is some confusion here

      Now lets look at that timeline with TDD:

      1) Microsoft writes method (let's say one hour again)
      2) Microsoft writes test for method. Test includes random dates but not December 31st, 2008. One hour.

      That is not Test Driven Development.

      In TDD, you actually let the tests drive the development. You first write a test, then the simplest code that will satisfy it. Then add more tests/assertions and modify the code. Rinse and repeat until you've run out of edge cases. For a function like this, you'd probably do 3-5 iterations. Sounds like a lot, but it shouldn't take more than 10-20 minutes.

      Of course there can still be bugs, but it's much less likely when every line you write is in direct response to, and is executed and verified by, a test. Actually executing the code, rather than looking at it and thinking "yeah, that looks right" gives much better results. I've done both models for years and would never go back.

      Writing tests after the code is a fine practice too. It's much better (and easier) than writing no tests, but TDD is a quantum leap beyond that. It is an acquired skill though. If you're not learning it from someone who knows the techniques, it's hard to get very effective with it.

  15. Re:Probably Not A Widespread Issue by petermgreen · · Score: 2, Interesting

    Different RTC chips measure time in different ways. This particular one used time of day and a day count afaict. Some however give you a time broken down into hours,minuites,seconds,days,months and years.

    So if the API was designed arround the latter style of RTC chip the hardware vendor would have to write code to convert to the format the API expects and when writing driver code you generally can't just go and call your regular libraries.

    --
    note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
  16. Never use a while loop in a driver by Anonymous Coward · · Score: 1, Interesting

    Never use a while loop in a driver. Loops used in drivers should have a defined exit condition.

    for(;;) is while(1). If you mean while(1), code while(1). Use for() when you want to enumerate something.

    End of story.

  17. Re:Talked-about layoffs at MSFT by Jaime2 · · Score: 2, Interesting

    Microsoft is famous for "stack-ranking" employees at review time. This means that somebody in every group will get a "worst of the group" review and somebody will get "best of the group". If you are in the bottom 10% at MSFT, you are never going to get a raise and you will eventually quit.

  18. It's called OR by rdnetto · · Score: 2, Interesting

    Couldn't you just do this:


    if((condition1 && condition2) || (issue1 || issue2)){
    //do cleanup
    }

    --
    Most human behaviour can be explained in terms of identity.
  19. Re:Regardless of whatever code in it is faulty by dkf · · Score: 2, Interesting

    The addition of single bool avoids both the specialized cleanup() function and the goto:

    [...]

    The problem with that is that it tends (in real code) to either greatly increase the depth of nesting of the code (making it harder to understand) or, worse yet, spray a great collection of various flags indicating the various types of cleanup required and what conditions have and haven't been checked. Making conditions more complex isn't a good plan at all for maintenance, since it increases the difference between the pseudo-code for the function (i.e. the level that you think about, without all the grotty code for handling failure modes and obscure cases) and the real code...

    --
    "Little does he know, but there is no 'I' in 'Idiot'!"