Slashdot Mirror


Mars Global Surveyor Died from Single Bad Command

wattsup writes "The LA Times reports that a single wrong command sent to the wrong computer address caused a cascade of events that led to the loss of the Mars Global Surveyor spacecraft last November. The command was an orientation instruction for the spacecraft's main communications antenna. The mistake caused a problem with the positioning of the solar power panels, which in turned caused one of the batteries to overheat, shutting down the solar power system and draining the batteries some 12 hours later. 'The review panel found the management team followed existing procedures in dealing with the problem, but those procedures were inadequate to catch the errors that occurred. The review also said the spacecraft's onboard fault-protection system failed to respond correctly to the errors. Instead of protecting the spacecraft, the programmed response made it worse.'"

11 of 141 comments (clear)

  1. Emmentaler vs. Gruyere by DingerX · · Score: 4, Insightful

    One bad command started the chain, but it needed a series of system failures to kill it. In other words, a slight misalignment of the solar panels (or whatever it was) may have been a necessary cause, but not sufficient. The thing needed a safe-mode that wasn't safe, and battery logic that failed to consider environmental variables. All the conditions lined up.

    It's like saying that a mid-air collision occurred because two jetliners were assigned the same altitude and jetway in opposite directions at the same time. Yeah, but A) How they got that assignment is kinda complicated and B) any number of traffic control and collision avoidance systems have to fail too.

    1. Re:Emmentaler vs. Gruyere by v1 · · Score: 2, Insightful

      In most complex problems where catastrophic failure occurs, the problem manifests as a result of multiple smaller failures that combine in an unfortunate way, or as a chain reaction. By nature, people will want to narrow down the problem so they can identify a "cause". This is sometimes not appropritate as we see here, where a collection of less critical failures lead to catastrophy, any of which having been avoided would have prevented disaster. It's a bit like team theory... after losing a game the coach does not go looking for the one player that lost them the game - it's a team effort and everyone is involved and bears some responsibility. Unless someone made a blatant and major mistake that was responsible for the vast majority of the fallout that resulted, you have to accept that no one was "at fault". In this case several people made minor mistakes that by themselves are minor, but combined in such a way proved fatal to the craft. It's not anyone's fault, these things happen. All you can do to prevent this from happening again is to tighten up procedures to try to lower the number of minor failures that will occur (and you must accept you will never get them all) and to institute more review/backstopping to make it more likely that not only minor problems will be identified and fixed, but also that the result of complex interrelated events is predicted and prepared for.

      --
      I work for the Department of Redundancy Department.
  2. You nits by Bastard+of+Subhumani · · Score: 3, Insightful

    The mistake caused a problem with the positioning of the solar power panels
    What was it this time, degrees vs radians?
    --
    Only three things are certain; death, taxes, and apocryphal quotations - Ben Franklin.
  3. *Design* flaw by CatoNine · · Score: 3, Insightful

    From TFA: ".... That exposed one of the batteries to direct sunlight, causing it to overheat." So, also a small naviation error or small mechanical failure could already cause this thing to overheat. It should have been constructed more robust.

    1. Re:*Design* flaw by dsanfte · · Score: 2, Insightful

      This has been standard procedure for many decades.


      And yet, it failed.
      --
      occultae nullus est respectus musicae - originally a Greek proverb
  4. Re:Give NASA a break by Anonymous Coward · · Score: 3, Insightful

    "did YOU get $40 million a year out of those desktop photos?"

    MGS mapped targetted parts of the surface of Mars at much higher resolution than any previous mission. Among other things, it was responsible for finding the gullies that are probably signs of water being expelled recently at the surface. The length of the mission allowed it to detect changes at these sites, suggesting the process is still occurring today.

    What you really seem to be saying is that exploration of Mars by any means is a big waste of money, in which case, if you're going to complain about any specific mission as an example, MGS should be way down the list, because it was comparatively cheap, long-lived, and successful.

  5. Re:Give NASA a break by dreamchaser · · Score: 2, Insightful

    I propose that you get a clue. It's hard to place a value on science like this, but the advancement of knowledge and working towards getting off this rock are both highly valuable fields of endeavor.

    As for your 'proposal'...did you just pull those numbers out of your ass? If anything costs increase over the years due to rising wages and inflation. $220 million was *dirt cheap* by space mission standards.

    We waste far more money on subsidies and entitlements in the US than we spend on science like this.

  6. More robust == heavier by mangu · · Score: 3, Insightful
    It should have been constructed more robust.


    So, which scientific experiment would you remove in order to put additional heat shielding? No, the thermal shielding and other protection systems are just right for a spacecraft that had to travel a hundred million kilometers.


    What really failed was the ground-based software, that didn't have a good enough thermal model, and the technical support team. Equipment may fail, operators may commit errors, but there should be enough experienced engineers around to do a correct analysis to catch those errors. Downgrading of the engineering team is the true problem here. Look at what happened to Columbia. It blew up on reentry because of a failure that had happened on take-off, was caught on video, but not analyzed correctly.


    NASA isn't alone in these failures, perhaps one could say they set the pace for the rest of the industry. The lack of a good thermal model is typical of a whole generation of engineers used to do everything in Excel. With the current CPUs one has at each desktop, it wouldn't be so hard to do a correct thermal model of the spacecraft, but it would imply in solving a system of partial differential equations in C++, something very few engineers are able to do, even when given an extensive library.

  7. Re:It wasn't a single wrong command by Jerry+Beasters · · Score: 3, Insightful

    Smart people make mistakes, deal with it.

  8. Better, fast, cheaper - the reality by kilodelta · · Score: 3, Insightful

    NASA has been on this kick of doing quick, reduced cost and inexpensive projects for some time now. They really have no choice since congress will only give them funding for unmanned and low cost missions.

    So occasionally you get the stunning successes, E.G. the Mars rovers Spirit and Opportunity. Considering they were only supposed to last 90 sols and they're somewhere out to 1075 or more sols it means that the Steve Squyers is currently the start of NASA.

    But more likely you get the devastating failures.

    It's really sad that we blow a few billion a month on our little Iraq and Afghanistan ventures yet sciences take a back seat.

  9. Re:Give NASA a break by brassman · · Score: 3, Insightful

    > did YOU get $40 million a year out of those desktop photos? I didn't.

    So divide that by 200 million (roughly) to get your share.

    I got two quarters' worth. Heck, you can't get a comic book for fifty cents.

    --
    "Ain't no right way to do a wrong thing."