Mars Global Surveyor Died from Single Bad Command
wattsup writes "The LA Times reports that a single wrong command sent to the wrong computer address caused a cascade of events that led to the loss of the Mars Global Surveyor spacecraft last November. The command was an orientation instruction for the spacecraft's main communications antenna. The mistake caused a problem with the positioning of the solar power panels, which in turned caused one of the batteries to overheat, shutting down the solar power system and draining the batteries some 12 hours later. 'The review panel found the management team followed existing procedures in dealing with the problem, but those procedures were inadequate to catch the errors that occurred. The review also said the spacecraft's onboard fault-protection system failed to respond correctly to the errors. Instead of protecting the spacecraft, the programmed response made it worse.'"
Of course, these things do happen. Al we can do is find out why, and stop it from happening again.
The preliminary official report is availiable from here. The summary conclusions are:
* A modification to a spacecraft parameter, intended to update the High Gain Antenna's (HGA) pointing direction used for contingency operations, was mistakenly written to the incorrect spacecraft memory address in June 2006. The incorrect memory load resulted in the following unintended actions:
** Disabled the solar array positioning limits.
** Corrupted the HGA's pointing direction used during contingency operations.
* A command sent to MGS on November 2, 2006 caused the solar array to attempt to exceed its hardware constraint, which led the onboard fault protection system to place the spacecraft in a somewhat unusual contingency orientation.
* The spacecraft contingency orientation with respect to the sun caused one of the batteries to overheat.
* The spacecraft's power management software misinterpreted the battery over temperature as a battery overcharge and terminated its charge current.
* The spacecraft could not sufficiently recharge the remaining battery to support the electrical loads on a continuing basis.
* Spacecraft signals and all functions were determined to be lost within five to six orbits (ten-twelve hours) preventing further attempts to correct the situation.
* Due to loss of power, the spacecraft is assumed to be lost and all recovery operations ceased on January 28, 2007.
"Goodness me, how unlike the FBI to abuse the trust of the American public." -- The Onion
Some temperature monitors on critical, exposed devices would also help. All you need is the CPU temperature diode present on just about every motherboard sold today.
I looked at the actual report on the NASA website; it said "the spacecraft's power management software misinterpreted the battery over temperature as a battery overcharge and terminated its charge current."
There was a temperature monitor on the critical, exposed component. Furthermore, the information from the sensor was used in a sensible manner: Li-poly/li-ion batteries can catch fire under some circumstances (see also: sony laptop batteries) so if your li-poly battery overheats while being charged you stop charging it (because you'd rather have a flat battery than an exploded battery).
After the craft stopped charging the battery it never started charging the battery again. The battery ran down and the craft stopped working.
The obvious question is: why didn't charging resume after the battery had cooled down? It might not have cooled down (as it was hot in the first place due to being exposed to the sun) or the system might have been waiting for a 'resume charging' command from ground control, which was never received as the high-gain antenna was in the wrong position.
Personally if I was designing a space craft I'd duplicate the (presumably quite small) onboard computer and radio hardware, because it seems quite common for software/electronics failures to result in loss of communications. Having two processors running different software, each capable of reprogramming the other one if it became broken, would seem like a sensible route to take.
Just my $0.02.
"Goodness me, how unlike the FBI to abuse the trust of the American public." -- The Onion
In Soviet Russia perfect probe sends lens cap code back to you!
A wiki link to help with the lens part.
http://en.wikipedia.org/wiki/Venera_program
Domestic spying is now "Benign Information Gathering"
http://nssdc.gsfc.nasa.gov/database/MasterCatalog? sc=1996-062A
I realize this was dirt cheap by space mission standards. A laptop encrusted with diamonds which costs $80,000 is dirt cheap by laptop-encrusted-with-diamonds standards. That *doesn't make it worth the money*. I know we waste far more than $40 million a year on many things -- and, logically, every one of them except one can be justified by "We waste more money on another program, don't cut *my* hobby horse!"
Its interesting that you draw the distinction between subsidies/entitlements and science, since NASA is a fairly naked subsidy directly to defense contractors, who make all of the really expensive bits. I'm all for giving Lockheed Martin money when its required, but lets be honest and get ourselves something which blows up in a suitably impressive manner when we do, OK? Similarly, I might even be persuaded that the US federal government should fund science projects -- great, then *fund science*! Don't blow $160 million just to accelerate a tin can out of the atmosphere to get a few close up pictures of rocks. $160 million could fund an awful lot of real science down here, much of which would produce actual results (or, alternatively, you could fund research gazing into the Clear Blue Sky, which is *still* cheap when you do it somewhere in atmosphere).
Help poke pirates in the eyepatch, arr.
Not a lack of thermal model (such things DO exist for most spacecraft), and they DO spend quite a lot of time in both modeling and test (thermal balance) where they shine artificial sunlight on the spacecraft in a vacuum chamber while it's operating to verify that the model works. http://mpfwww.jpl.nasa.gov/martianchronicle/martia nchron7/mgs.html
a nchron2/marschro29.html
In fact, because MGS used aerobraking, which heats the spacecraft during the dips into the atmosphere, I'll bet the thermal model for MGS is better than most.
But, the previous poster is right..ultimately it's a budget issue.. if you designed the spacecraft to handle every eventuality, it would be too heavy to launch. if you did ground analysis for every conceivable situation, there aren't enough engineers in the world to finish the job before the "every two years" launch opportunity. At some point, you rely on judgement.. hordes of people in reviews shooting at your design, and you figure you've covered 99.9% of the stuff... time to ship and shoot.
let's also remember that this puppy has been going for >10 years, which means it was designed 15 years ago... Call it 1990. Somehow I don't think the thermal design engineers at Martin Marietta and JPL were rookies using Excel for the first time, and I suspect that they are fully capable of understanding how to numerically solve partial differential equations, and the limitations of those numerical methods. They can also solve them analytically.. my gosh, with a slide rule, even.
http://mpfwww.jpl.nasa.gov/martianchronicle/marti
FWIW, C++ is hardly necessary.. This kind of thing is really the domain of good old FORTRAN. Good optimizing compilers, well validated numerical codes, etc.