Slashdot Mirror


Mars Global Surveyor Died from Single Bad Command

wattsup writes "The LA Times reports that a single wrong command sent to the wrong computer address caused a cascade of events that led to the loss of the Mars Global Surveyor spacecraft last November. The command was an orientation instruction for the spacecraft's main communications antenna. The mistake caused a problem with the positioning of the solar power panels, which in turned caused one of the batteries to overheat, shutting down the solar power system and draining the batteries some 12 hours later. 'The review panel found the management team followed existing procedures in dealing with the problem, but those procedures were inadequate to catch the errors that occurred. The review also said the spacecraft's onboard fault-protection system failed to respond correctly to the errors. Instead of protecting the spacecraft, the programmed response made it worse.'"

8 of 141 comments (clear)

  1. Re:*Design* flaw by dsanfte · · Score: 2, Interesting

    Some temperature monitors on critical, exposed devices would also help. All you need is the CPU temperature diode present on just about every motherboard sold today. In fact, how about many of them, arranged at strategic positions on the spacecraft hull to give real-time temperature information to the satellite's computer? I guess complicated ideas like these get ignored in favor of simpler solutions, like relying on large chains of command and bureaucratic procedures carried out 30 light-minutes from the point of failure.

    --
    occultae nullus est respectus musicae - originally a Greek proverb
  2. Give NASA a break by patio11 · · Score: 2, Interesting

    It worked for a decade at a cost of a piddling $220 million, plus $20 million a year in upkeep. At a hair over $40 million a year, thats much, much less wasteful than most NASA missions. (Yeah, I suppose you could consider whether the return was worth it. Heh, who are we kidding -- did YOU get $40 million a year out of those desktop photos? I didn't.)

    I propose that next time NASA spend $150 million on the construction phase, which is just a slush fund for defense contractors anyhow, and then issue the lethal command before launch. Then we'd save a decade worth of upkeep costs and the $65 million launch budget. NASA could even have a $10 million prize going to the person who most creatively identified a possible fatal error, since thats the only fun part of these missions for people who aren't rocket scientists and we wouldn't want to skimp on it.

  3. Impressive by hcdejong · · Score: 4, Interesting

    Not the error itself, but the fact NASA was able to figure out what happened in such detail, when the spacecraft it happened to is not giving any diagnostic information and cannot be examined directly.

  4. Re:It wasn't a single wrong command by MichaelSmith · · Score: 4, Interesting

    So, if the procedures were better, this wouldn't have happened. If the fault protection system was better, this wouldn't have happened. If the designers had predicted this exact problem might occur this wouldn't have happened.

    TFA:

    over the years budgets and staff had been cut "in an effort to operate the mission as economically as possible."

    MGS was well into bonus time in the sense that the original goals had been reached. The project was running on a reduced budget and this made a mistake inevitable. I can't help thinking that at a higher level this was considered to be a good thing. When you have new missions to run and a fixed budget to run them on you want your old missions to stop so that you can draw a line under it and go on to the next thing.

    The last thing management want is to have to decide to shut the spacecraft down because they don't have the budget for operations on the ground. Reducing the budget is a way of inducing the shutdown.

  5. Re:*Design* flaw by maxume · · Score: 3, Interesting

    They don't get to build the best damn space probe they can build, they get to build the best damn space probe they can build for $X. Thermal management isn't easy; controlling orientation allows them to spend money on the stuff they are interested in, rather than insulation and shielding.

    --
    Nerd rage is the funniest rage.
  6. Typical multiple-factor catastrophe by mattr · · Score: 4, Interesting

    There are an awful lot of posts here that disparage the people who have built and operated this system. To me it looked very much like the explanation for an aircraft accident. The easy failure modes are all known, so the really hard ones are left. In aircraft accidents, and it seems space accidents now too, a fatal result is generally the result of a number of seemingly disparate factors including system states, environmental state, and human impressions of what is going on.

    In one major aircraft accident I know a lot about, the (Airbus) jet crashed in part because it ended up being a tug of war between a human pilot and a robot autopilot that should have been disengaged, causing and up and down roller coaster ride. There were lots of other distracting things that were maybe wrong or maybe not, but a key part was the difficulty in knowing what state the machine was in.

    It was a similar situation with this accident, it seems, and though the misuse of metric units caused another recent accident it appears that these incidents have elements in common. They are also made more probable it strikes me by funding pressures and also in the way that operating these systems involves radical commands while the systems also lack enough power to be self-aware enough to preserve themselves.

    I am not going to do any more guessing because the people involved can probably figure it out themselves, and it seems that these combined factor accidents at least are not costing human lives, while they are adding to knowledge about how not to make the accident the next time.

    In that regard my hope is that some of the money being spent on Mars can be used to improve autonomous robotic systems to reduce accidents both on Mars and on Earth.

  7. Re:It wasn't a single wrong command by osu-neko · · Score: 2, Interesting

    Score: 5 Interesting for a really, really lame theory?

    There are many ways to end a mission. The one best for NASA is to close it, point to its budget, and wait for the cries of underfunding, using the closure of a perfectly good mission as evidence that the agency is truly underfunded based on its needs.

    The worst is for the mission to fail in some spectacular fashion, making people wonder if they should be giving these bozos any more money.

    So, you're telling me you think NASA would intentionally end a mission in the worst possible way for its future prospects of getting money when it had the option of doing in the way most likely to get it more money instead?

    That's kinda stupid, don't you think? Shouldn't a conspiracy theory at least have the conspirators doing something that benefits them rather than cutting their own throats?

    --
    "Convictions are more dangerous enemies of truth than lies."
  8. Re:*Design* flaw by Tablizer · · Score: 2, Interesting

    Personally if I was designing a space craft I'd duplicate the (presumably quite small) onboard computer and radio hardware, because it seems quite common for software/electronics failures to result in loss of communications. Having two processors running different software, each capable of reprogramming the other one if it became broken, would seem like a sensible route to take.

    Or maybe have a small back-up battery in the center of the probe where it cannot be heated by the sun even if the probe gets pointed in the wrong direction. However, this won't necessarily solve the problem of a damaged main battery via sun exposure. Perhaps design the craft so that any wrong orientation is not fatal. However, this cranks up the costs.

    We may have reached a threashold in unmanned exploration where the operations and software is more expensive than building and launching the hardware itself (at least for going to Mars). This may mean that it is actually cheaper to let failures slip through every now and then. For example, it may be cheaper to have 5 probes with a 40% of failure than 2 probes with 10% failure. However, such may result in national embarassment. Optimum science per dollar and national pride may be in conflict here.