The Exact Cause of the Zune Meltdown

← Back to Stories (view on slashdot.org)

The Exact Cause of the Zune Meltdown

Posted by kdawson on Sunday January 4, 2009 @11:05AM from the off-by-one-every-four dept.

An anonymous reader writes "The Zune 30 failure became national news when it happened just three days ago. The source code for the bad driver leaked soon after, and now, someone has come up with a very detailed explanation for where the code was bad as well as a number of solutions to deal with it. From a coding/QA standpoint, one has to wonder how this bug was missed if the quality assurance team wasn't slacking off. Worse yet: this bug affects every Windows CE device carrying this driver."

14 of 465 comments (clear)

Wow. by LeadLine · 2009-01-04 11:08 · Score: 4, Funny

It wasn't a bug! It was an unexpected feature!
Microsoft is taking a stance against teenagers blowing their ears out with loud music.
1. Re:Wow. by Vadatajs · 2009-01-04 11:19 · Score: 3, Funny
  
  Teenagers these days are too young to remember metallica.
2. Re:Wow. by Yvan256 · 2009-01-04 11:22 · Score: 4, Funny
  
  They're too busy listening to NiMHica.
3. Re:Wow. by Anonymous Coward · 2009-01-04 12:00 · Score: 5, Funny
  
  No they didn't no they didn't lalalala I can't hear you That piece of shit was definitely not any Metallica I know.
QA team slacking off... by feepness · 2009-01-04 11:23 · Score: 4, Funny

From a coding/QA standpoint, one has to wonder how this bug was missed if the quality assurance team wasn't slacking off.
MSFT's QA team hasn't been slacking off. They haven't slacked on since about the mid 90s.
Regardless of whatever code in it is faulty by scourfish · 2009-01-04 11:34 · Score: 5, Funny

Lines 122, 521, 690, 710, and 748 scare me; gotos in C code...
Re:Sad code, sad article by xlv · 2009-01-04 11:53 · Score: 5, Funny

for (;;) {
int daysInYear = IsLeapYear (year) ? 366 : 365;
if (day = daysInYear) break;
day -= daysInYear; year += 1;
}
This is what Knuth called an "N + 1/2" loop
No, this is what Knuth would call an infinite loop as there's no way to terminate the loop except on the last day of each year...
Re:Probably Not A Widespread Issue by TheSunborn · 2009-01-04 11:58 · Score: 3, Funny

I still wonder: Why is code that translate from a number of days, to a year hardware dependent?
Getting the number of seconds since epos is hardware depending, but translating this to other time measurements should not be,
unless they are building a time machine.
Re:Modified Julian Day by rcw-home · 2009-01-04 12:00 · Score: 4, Funny

I highly recommend that in cases like this, programmers be good Catholics and abide by the decree of Pope Gregory XIII. Software written to work with modern dates should use Gregorian, not Julian. Or did you mean ordinal?
From the article you linked to: The use of Julian date to refer to the day-of-year (ordinal date) is usually considered to be incorrect, however it is widely used that way in the earth sciences and computer programming.
Obligatory XKCD by Failed+Physicist · 2009-01-04 12:04 · Score: 3, Funny

http://xkcd.com/376/
Re:Warning, Y2.1K bug. by Anonymous Coward · 2009-01-04 12:29 · Score: 5, Funny

For Slashdotters you lot seem pretty confident the Zune is going to be around for awhile.
Re:Old by Goldberg's+Pants · 2009-01-04 13:18 · Score: 4, Funny

First to finish is not always a good thing. Just ask your girlfriend.
Re:Warning, Y2.1K bug. by Anonymous Coward · 2009-01-04 15:14 · Score: 5, Funny

I can't help but imagine how I would be directed by work to "solve" this problem.
First, they would tell me that it's too difficult, expensive, and complicated to implement the correct solution. Even if I gave them a working prototype, they wouldn't change their minds.
Then they would tell me "just assume every 100th year is not a leap year." So I would do that instead. In the time from 2100 to 2400, they would say that "a better solution is due to come out next quarter." They would say this every quarter for 299 years.
In 2399, they would finally give me permission to fix the problem. But the leap year-calculating code works, and they don't want me to mess with it. Instead, they'd tell me to add a test when the program starts to see what year it is. If it's 2400, then it will refuse to run. (We'll definitely have a better solution in place by Q1. Definitely.)
But the program often runs for an extended period of time without being restarted, so it's possible that someone will start it in December 2399 and it will still be running in February/March 2400. Management has a simple fix for this one: calculate the average run time for the program, add a margin of error, and use that to determine the actual "upper limit" on when the program is allowed to start. My boss would be really excited about this, because it would allow us to refine our earlier not-after-January-1st estimate to be "completely accurate."
Unfortunately, we don't know the average run time for this program. So I'm told to add code to it to track when it starts and ends and store the results in a file. When the program starts, it examines that file (in addition to recording its own start time), calculates the average run time, adds 10% (there are still director-level meetings about whether we should round up to the nearest hour or day), and subtracts that value from February 28th, 2400. If the current timestamp is greater than or equal to the result we got from that, the program won't start.
That's pretty good, but my boss would be worried about the program crashing. If that happens, after all, we won't know the program's end time -- never mind that it's November by now and there's no chance of getting useful data no matter what -- so instead of logging an end time, the program logs a heartbeat every minute. Now, you can determine when the program ended -- to within a minute! -- simply by looking at the heartbeat timestamps. When you encounter a gap of more than 1 minute (plus a small margin of error), you know the program ended. This has the bonus, my boss tells me, of simplifying the design by only requiring you to log one type of message to the file. He also assures me that this "telemetry data" has the potential to be "really useful for data mining." He talks about adding information on CPU time consumed, memory in use, I/Os, all sorts of stuff, then putting it in a database to be retrieved later. I manage to talk him out of it by pointing out that "the better solution [with which I am completely uninvolved] will be out in just a few months, so you should just make sure it makes it into that instead."
Not that I'm bitter.
Re:Warning, Y2.1K bug. by rilian4 · 2009-01-04 16:03 · Score: 3, Funny

tested your algorithm. It breaks where y%100 is not 0..at least in python 2.5x using windows idle.
I found the following more accurate:
def leapyear(y):
y4=True
y100=True
y400=True
if y%4==0:
y4=False
if y%100==0:
y100=False
if y%400==0:
y400=False
ly=(not y4) and (y100) or (not y400)
return ly
Might not be the most efficient but it works as far as I can see.

--

...quicker, easier, more seductive the darkside is...but more powerful, it is not.