Slashdot Mirror


The Exact Cause of the Zune Meltdown

An anonymous reader writes "The Zune 30 failure became national news when it happened just three days ago. The source code for the bad driver leaked soon after, and now, someone has come up with a very detailed explanation for where the code was bad as well as a number of solutions to deal with it. From a coding/QA standpoint, one has to wonder how this bug was missed if the quality assurance team wasn't slacking off. Worse yet: this bug affects every Windows CE device carrying this driver."

35 of 465 comments (clear)

  1. Import calendar? by TurtleBlue · · Score: 5, Insightful

    "From a coding/QA standpoint, one has to wonder how this bug was missed if the quality assurance team wasn't slacking off."

    I can't remember the last time a QA department was asked to test date functions... but then again, I can't remember the last time anyone wrote their own Leap Year calendaring calculator from scratch.

    I'm sure there are a hundred reasons to do it (licensing being one of them) but really, when was the last time you didn't just import calendaring from another library and call it a day?

    Please clarify to me if this is something at the hardware driver level: I honestly don't know. If this were me, my own bosses wouldn't ask "Why didn't QA catch this", as much as "why are you wasting time writing your own calendar code? And then why didn't you flag it as functionality that needed to be tested?"

    1. Re:Import calendar? by nedlohs · · Score: 2, Insightful

      FOr fuck sake, how do you manage to read that part and not read the part before it: "source code for the bad driver".

      The QA person/people testing the driver for the real time clock better damn well be testing date and time stuff.

    2. Re:Import calendar? by TurtleBlue · · Score: 5, Insightful

      Thanks - that makes a tad more sense. I see everyone running around blaming Microsoft for the code since their name is on the product, even if it was a 3rd party vendor. They certainly are still liable for all the busted Zunes, but I couldn't imagine Microsoft didn't have *some* C leap-year code sitting around that actually worked, and could be compiled for any chip they wanted.

      Microsoft still has to take the hit up front, but then they'll sue or "renegotiate contracts" with the vendor that supplied the bad driver code, based on what it costs them.

      I'm still shocked that the manufacturer couldn't dig up *some* free/open calendaring code that's was around pre-2004. But hey, at least we know they were honest about not ripping off some other source code and calling it their own.

    3. Re:Import calendar? by jc42 · · Score: 4, Insightful

      [Microsoft] designed it and sell it as a unit even if parts are from other places. They have a responsibility to test it.

      No, they don't. They have several decades of success selling untested software. Their customers have given them a very strong message: "We don't care about quality or reliability. We only want to buy whatever Microsoft sells. Don't tell us about something from another vendor that's higher quality; we aren't interested."

      They've become a huge success. They're obviously doing what their customers want. They'd be stupid to waste good money on silly things like testing that won't increase their sales.

      Or, as others here keep pointing out, Microsoft's only "responsibility" is to its shareholders. Their ongoing success has hugely rewarded their shareholders. They are obviously Doing It Right. They have no other responsibilities.

      If there are signs that this fuss has a lasting effect on their profitability, they'll do something about it. But it'll probably die off in a short time, and only a few geeks (non-customers) will remember it. How many people can name even one piece of software that died on Jan 1, 2000? Customers won't remember this one, either. Most of them will never even hear about it. So Microsoft's management isn't worried about it.

      And anyway, the problem is "fixed" now (for another 3.99 years). So why even bother discussing it?

      (What, me cynical? ;-)

      --
      Those who do study history are doomed to stand helplessly by while everyone else repeats it.
  2. Re:Warning, Y2.1K bug. by Gothmolly · · Score: 2, Insightful

    No need to hard-code, there's an established algorithm for computing this.

    --
    I want to delete my account but Slashdot doesn't allow it.
  3. Re:Warning, Y2.1K bug. by narcberry · · Score: 3, Insightful

    I think you missed the point.

    --
    Modding me -1 troll doesn't make me wrong.
  4. That code is real bad code! by canuck57 · · Score: 2, Insightful

    Looking at that code, it never had effective code review or Q/A. If I was the manager responsible I would be looking up those who signed off on the code in the last review. I didn't spot one, but 4 issues in that code and would not doubt more exist. Second off, there are much simpler ways of doing this in the C libraries, and simplicty has value.

    But the design, I suspect is very flawed. Why not use asctime() and rely on it's more proven calculations of leap year and the like via the OS libraries?

    And when you see something like this, you know someones brain was in the off position:

    556 day -= 366;
    557 year += 1;

  5. Re:Old by larry+bagina · · Score: 5, Insightful

    Comments in the last zune slashdot story (yesterday?) were just as detailed as this "story". Maybe slashdot editors should read their own site. Or maybe I should start submitting all +5 comments for their own story.

    --
    Do you even lift?

    These aren't the 'roids you're looking for.

  6. MOD PARENT UP Re:Why write any date/time code? by exphose · · Score: 5, Insightful

    Exactly, just goes to show the dangers in not QA'ing the whole codebase including supplied drivers. You can't trust your own code so you QA it, why should you trust your partner's code.

    1. Re:MOD PARENT UP Re:Why write any date/time code? by JoeMerchant · · Score: 3, Insightful

      But really, no one can be expected to QA every single line of code that's shipped through their device

      All depends on the level of concern - for a music player, what the hell, it's only the company's reputation that's riding on it... (now, if the company isn't already a laughingstock, maybe this might matter.) If this were code on a Mars Surveyor mission whose failure would set back an entire program by 2 years or more - I'd be checking every line of code, everywhere, three times.

  7. Re:Regardless of whatever code in it is faulty by KiltedKnight · · Score: 3, Insightful
    Ever written code for an OS or device driver? You use them there... frequently... as "get me the frack out of here because of a fatal error"...

    Never mind that if done properly, there is nothing wrong with using a goto statement... just make sure that you only move in one direction... ideally "down" towards the end of the function, not somewhere else in the whole program.

    --
    OCO is Loco
  8. Re:Bigger bugs have gotten through on Windows CE by Anonymous Coward · · Score: 2, Insightful

    If you dealt with Platform Builder you could see the WindowsCE source code. There were some amazing bugs in there. One that is more relevant that i saw in WindowsCE 4.x was the daylight savings changeover code.

    They had a function that returned whether or not time needed to be added for daylight savings. The function would have a list of changeover times for various regions. If the time was past the first changeover time of the year the function would add time, if the function was past the second changeover time in the year the function would remove time.

    The end result was the product shipped with time that was incorrect for the entire Southern Hemisphere.

  9. Re:Regardless of whatever code in it is faulty by jd142 · · Score: 2, Insightful

    Why not this:

    function cleanup():void
    { //do something
    }

    if (condition1 && condition2) {
    cleanup();
    }

    if (issue1 || issue2) {
    cleanup();
    }

    If something is done exactly the same way twice, that's a function. Heck, if something is sufficiently complicated that it makes the main code easier to read, that's a function too in my book.

    Of course, I prefer my braces to line up vertically, so what do I know?

  10. Re:Why write any date/time code? by cbhacking · · Score: 3, Insightful

    Ah, thank you. This explains better why the 2nd-gen and 3rd-gen Zunes didn't suffer this problem; they were completely designed and developed in-house.

    --
    There's no place I could be, since I've found Serenity...
  11. Re:Warning, Y2.1K bug. by gcnaddict · · Score: 2, Insightful

    The code in the freescale driver actually covers this. Check the IsLeapYear() function in the code (line 162):

    static int IsLeapYear(int Year)
    {
    int Leap;

    Leap = 0;
    if ((Year % 4) == 0) {
    Leap = 1;
    if ((Year % 100) == 0) {
    Leap = (Year%400) ? 0 : 1;
    }
    }

    return (Leap);
    }

    --
    Viable Slashdot alternatives: https://pipedot.org/ and http://soylentnews.org/
  12. Re:Sad code, sad article by Anonymous Coward · · Score: 1, Insightful

    Actually, that's what Knuth wouldn't call a loop, since it only runs once. That's a single = not a == in the if there, so it would only terminate if daysInYear is 0, and that gets set to 366 or 365.

  13. Re:Warning, Y2.1K bug. by jrumney · · Score: 2, Insightful

    It's easy to imagine such software in a general computing environment, especially in the financial and insurance arenas where people might be taking a long term view or processing historical data, but in a single purpose consumer electronics product?

  14. Re:Warning, Y2.1K bug. by PIBM · · Score: 5, Insightful

    Actually, it's far from being good. In 99% of the cases you will do 3 modulos operation, in 0.75% you will do 2 modulos and in 0.25% you will do 1 modulo, for an average modulo cost of 2.9875 per run.

    With the initial solution, you have 1 modulo in 75% of the cases, 2 modulo in 24% of the cases, and 3 modulo in 1% of the cases, for a total average modulo cost of 1.26 per run.

  15. Re:Warning, Y2.1K bug. by 4D6963 · · Score: 2, Insightful

    The problem with CS graduates is too often that they try to reinvent something (poorly) when they should do some digging first, not so much that they're unable to work out a solution.

    --
    You just got troll'd!
  16. Re:Calendrical Calculations by larry+bagina · · Score: 2, Insightful

    division (and modulo) are much slower than integer Math on the ARM.

    --
    Do you even lift?

    These aren't the 'roids you're looking for.

  17. Not QA's fault by Sleepy · · Score: 3, Insightful

    "evidence of QA.. slacking off"

    These comments routinely come from two groups:

    1) Software Developers
    2) Joe the Plumber

    Or put another way: elitism or ignorance.

    If a software division is letting QA "test" all on their own, that's a recipe for disaster... and it's the head of engineering at fault.

    See, software testing does not occur in a vacuum, no more than developers code without a list of requirements from Sales or Marketing.

    Engineering takes takes the requirements, use that to produce an agreed upon set of specifications.

    QA follows the same model... they take the software specs and derive a set of effective tests.... tests which are agreed upon by Engineering, and signed off on.

    When I did QA, it was mostly for startups who lacked this kind of process. The result was QA was always 2 steps behind software that continually morphed: hardware changed, or the customer changed their mind. I'm not placing the blame on any 1 group here... I come from Support, then QA, and now develop. Startups can be rough.

    But at the end of the day, not documenting and agreeing on what the product and tests should be will cost you big time.. maybe 7 out of 10 times.

  18. Braindead comments by Anonymous Coward · · Score: 1, Insightful

    Is it just me or does stuff like this really annoy everyone:

    //Calculate current day of the week
    dayofweek = GetDayOfWeek(days);

    That's basically like saying:
    // See if i is greater than zero
    if( i > 0 ) ....

    Adding unnecessary comments is worse than not commenting at all, it just dirtys up the code.

  19. Re:Warning, Y2.1K bug. by Bozdune · · Score: 4, Insightful

    If my code's still running in 2100, our society has got way bigger problems than me not figuring leap years correctly.

  20. LOL Algorithms by Khyber · · Score: 1, Insightful

    Yes, they're handy, but this is just fucking ridiculous.

    If year = (input year here) Then February has 29 days.

    Forget trying to calculate it out. I can rattle off the leap years without thinking. Make a list. Knowing how most software lasts (consumer anyways,) you only need to account for maybe 12 leap years, if your software even lasts that long to begin with.

    Doing things the complicated way when a simple list and table would've sufficed is just beyond me.

    Probably why I don't program, either. At least, not for any business purposes.

    --
    Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
  21. Re:Let's make sure this gets installed everywhere by neokushan · · Score: 4, Insightful

    When has Microsoft ever actually done that? Apple has released updates that DELIBERATELY bricked devices (jailbroken iphones for one), but that's ok, yet when a Microsoft device breaks due to a very obvious bug (obvious in that it's obvious it IS a bug, not obvious in that it really should have been noticed - bugs do happen in pretty much ALL software) that has a stupidly simple fix (Let it drain the battery then turn it on again), suddenly the Conspiracy theories are out in full force and they're once again branded as the most Evil Corporation on the planet? Please.

    There's so much you can bash Microsoft for (legitimately), why do you feel the need to actually make shit up?

    Besides, from all the reports I've read so far, Windows 7 is actually looking to be a worthy Upgrade (if you're a windows user, that is - for anyone else, your mileage may vary) and I don't just mean from Vista, I mean from XP as well.

    But no, it's easier to just hate the large, monolithic, rich company than accept that sometimes shit just happens.

    --
    +1 IDisagreeSoHeMustBeATrollOrAnAstroturferOrAShill
  22. Re:Regardless of whatever code in it is faulty by Blakey+Rat · · Score: 2, Insightful

    Ok, so I have a loop within a loop, and I need to drop out of BOTH loops because of some error condition in the inner loop:

    for( x = 0; x 10000; x++ )
    {
        for( y = 0; y 100; y++ )
        {
            if( error_condition ) goto fatal_error;
        }
    }

    fatal_error:
    CleanupAndQuit();

    The only alternative is to create a boolean flag and check it each go through the loop, but that's retarded. And note that, even Javascript, which is an modern language designed with all the lessons learned from C, also has a way to break to a specific label, added exactly because it lacked a GOTO.

  23. Re:Warning, Y2.1K bug. by lyml · · Score: 3, Insightful

    That's just silly, readability trumps over using modulo a thousand times. Always, with no exceptions.

  24. Re:Why is this a surprise? by ozphx · · Score: 4, Insightful

    Lots of things use Windows CE, which is fine.

    The problem is with the Freescale Semiconductor's* RTC driver. So if you aren't using that specific chip and driver then CE is unaffected.

    * No, this doesn't excuse MS from proper QA.

    --
    3laws: No freebies, no backsies, GTFO.
  25. Re:Let's make sure this gets installed everywhere by gutter · · Score: 4, Insightful

    Ok, I'm getting sick of this claim. There is no proof that Apple has ever deliberately bricked devices. This is completely unfounded.

    In fact, go back and look at the reports of iPhones breaking, and you'll see that most of them started working again with a later OS release. About the only thing that happens on upgrade with jailbroken phones these days is that they are locked again.

    --
    Check out DRM-free movies at http://www.bside.com
  26. Re:Warning, Y2.1K bug. by Anonymous Coward · · Score: 3, Insightful

    Yuk. Unreadable tripe. It's basically the same algorithm, implemented very poorly. Try:

    def isLeapYear(year):
            return (year % 4 == 0 and (not year % 100 == 0)) or (year % 400 == 0)

  27. Re:Regardless of whatever code in it is faulty by weicco · · Score: 2, Insightful

    This is the structure I used when I was writing stuff in C some ten years ago.

    do {
            if (condition1)
                    break;
            do_something();

            if (condition2)
                    break;
            do_something_else();
    }
    while(0);

    cleanup();

    --
    You don't know what you don't know.
  28. A simpler reason by kaiwai · · Score: 2, Insightful

    How about a much simpler reason - it plain well sucks giant donkey balls.

    We're talking about a device which only works with Windows, only available in a small mumber of countries (I don't give a shit about the music service - you can put music on it without a fucking music service so the need to 'roll out the service' is a bullshit excuse) and the software sucks balls.

    Its a top to bottom epic failure - and its in the mold of Microsoft NEVER to learn from these failures or more correctly, learn from its rivals who are making gains. Then again, Microsoft is kinda like a mini-America, the world uses metric, the US uses imperial. The world uses 240V, and the uses 110V etc. etc.

  29. Re:Warning, Y2.1K bug. by jcr · · Score: 3, Insightful

    COBOL really shouldn't have been allowed to survive past 1990.

    I disagree. I wouldn't want to write a 3D CAD program in it, but COBOL is still a fine choice when the task at hand is to account for a million monthly utility bills. The built-in BCD arithmetic is very well suited for implementing financial applications, and COBOL isn't susceptible to buffer-overflow bugs like the C-based languages are.

    -jcr

    --
    The only title of honor that a tyrant can grant is "Enemy of the State."
  30. Re:Let's make sure this gets installed everywhere by db32 · · Score: 4, Insightful

    First of all, this was a braindead stupid bug. Unbelievably poor implementation of what should have been a fairly simple thing leads to an infinite loop on special days. Just looking at the damned loop without actually tracing through every possibility reveals a infinite loop at first glance. This was mindbogglingly stupid.

    Second...Apple didn't "deliberately brick" devices. Your bias here is unbelievable. What Apple did was fix a bug that was allowing people to jailbreak and that caused problems from jailbroken phones. They fixed a security flaw that caused something that took advantage of that security flaw to cease to function correctly. Now, personally I would like it if the iPhone didn't require jailbreaking to open it up, but fixing the flaw that allows people to break your security model is not "deliberatly bricking". WGA is deliberately bricking, where it arbitrarily decides that you are invalid and shuts you off. In both cases it is incorrect useage of the word "brick" since either device can be easily recovered. So...to recap. Apple fixed a security flaw that caused bad news for people jailbreaking. Microsoft told your computer to call home every day so they could arbitrarily decide if you were valid or not and then shut you off if you werent.

    It is easier to hate the large monolitic rich company that uses illegal business practices, breaks the standards, and buys off the DoJ to avoid punishment (Go look at MS political contributions to either party before the trial...virtually nil...then the year they get busted...they contribute big bucks to both sides and walk away with a wrist slap). Trust me...big time criminals don't need cheerleaders like you to help them out. People like you are like the wife that geats her ass kicked and says "no, but he really loves me, he really is a good guy".

    --
    The only change I can believe in is what I find in my couch cushions.
  31. Re:Let's make sure this gets installed everywhere by neokushan · · Score: 2, Insightful

    So MACOS X is bug free? What about any licensed version of Unix? What about...oh I don't know...just about every piece of commercial software out there? Bugs happen, it's a sad shame, but it happens and no software is ever "bug free". This is why they measure bugs not by raw number, but by number per XXX lines of code - if you can keep your average below a set number, you're doing well.

    I'm sure you're happy to let Linux be a tad buggy because it's open source and thus "free", but there's quite a few licensed distros you're supposed to pay for, what about those?

    You should have properly researched your choice for buying Windows 95 at the time. Maybe it wasn't the right software for you, maybe you should have bought a Mac or installed Linux, but from what you're saying, you expect a piece of software to be absolutely perfect when you pay for it, so perfect that future, better versions of the OS should never be needed, so I'm fairly sure you would be bitching and complaining now that you had to upgrade no matter what you decided to buy.

    --
    +1 IDisagreeSoHeMustBeATrollOrAnAstroturferOrAShill