Slashdot Mirror


Software Error Likely Killed MGS Spacecraft

Aglassis writes "NASA investigators have determined that a software update performed in June of 2006 may have doomed the 10-year-old spacecraft. Apparently the software error caused the solar arrays to drive against a mechanical stop which then forced the spacecraft into safe mode. Unfortunately, after that the spacecraft's radiator was pointed at the sun which overheated the battery and destroyed it. Contact was lost with the Mars Global Surveyor spacecraft in November 2006. NASA will form an internal review board to determine formally the cause of the loss of the spacecraft and what remedial actions are needed for future missions."

3 of 199 comments (clear)

  1. YACCS -Yet Another Computer Corkup in Space by Ancient_Hacker · · Score: 4, Informative
    Just one more example of how Computer Science sint quite up to the reliability requirements of Space:
    • A missing comma in a Do-loop statement causes the first mission to Mars rocket to go off course and blow up.
    • The space-shuttle programs had a race condition that causes the first launch to be scrubbed.
    • The space-shuttle re-entry program had one important variable off by a factor of -4, causing rthe first re-entry to be a bit wobbly.
    • A Ariane guidance program had multiple basic design glitches that caused the first launch to blow up.
    • The F-16 autopilot worked very well, until the plane was deployed to Australia, where on its way there it bounced off the equator.
    • The LEM landing program didnt protect itself from spurious radar data, causing the computer to get behind.

    Aero and space are very unforgiving of human coding errors.

    1. Re:YACCS -Yet Another Computer Corkup in Space by Fishbulb · · Score: 5, Informative

      The F-16 didn't "bounce off the equator". Before it ever flew, in simulation the computer flipped the plane over when it crossed the equator due to a bug that incorrectly handled southern lattitudes. Additionally, since the computer "flip" happened instantaneously, and the f-16 can roll at much higher G forces than the pilot can take, the flip would have killed the pilot (and the F-16 would have happily continued on its way).

      http://portal.acm.org/ft_gateway.cfm?id=163293&typ e=pdf&coll=GUIDE&dl=GUIDE&CFID=11154656&CFTOKEN=19 136062

  2. Re:Better than a metric-English conversion error by iamlucky13 · · Score: 4, Informative

    It wasn't one engineer. It was a team effort. And it wasn't a very simple matter of "forgetting". Several factors combined, including re-use of code from the MGS mission (a conversion factor was in the old code, but not recognized when the code was adapted for the doomed MCO) and budget constraints that limited pre-flight testing (so bug was missed...and in fact might have still been missed even with more testing). The effects of the bug were also subtle enough that 3 minor main engine firings were conducted without enough error showing up to reveal the problem. It wasn't until the long orbital insertion firing that the error in the trajectory became noticeable, and by then it was too late. The team's first clue something was wrong was when the spacecraft didn't radio home after the engine burn.

    The details are really convoluted, but the Wikipedia page on the mission has a decent write up explaining how the mistake was made, with additional resources cited. The PDF paper giving a perspective from the MCO team is particularly revealing, if you've got some time on your hands.