Slashdot Mirror


Mars Global Surveyor Died from Single Bad Command

wattsup writes "The LA Times reports that a single wrong command sent to the wrong computer address caused a cascade of events that led to the loss of the Mars Global Surveyor spacecraft last November. The command was an orientation instruction for the spacecraft's main communications antenna. The mistake caused a problem with the positioning of the solar power panels, which in turned caused one of the batteries to overheat, shutting down the solar power system and draining the batteries some 12 hours later. 'The review panel found the management team followed existing procedures in dealing with the problem, but those procedures were inadequate to catch the errors that occurred. The review also said the spacecraft's onboard fault-protection system failed to respond correctly to the errors. Instead of protecting the spacecraft, the programmed response made it worse.'"

141 comments

  1. It wasn't a single wrong command by 91degrees · · Score: 4, Informative
    It was a whole series of errors. Either that or every accident ever is caused by a single minor fault. Here's what the article says

    The review panel found that the management team followed procedures in dealing with the problem but that the procedures "were inadequate to catch the errors that occurred."

    The review also said the spacecraft's onboard fault protection system failed to respond to the errors. Instead of protecting the spacecraft, the programmed response made it worse.
    So, if the procedures were better, this wouldn't have happened. If the fault protection system was better, this wouldn't have happened. If the designers had predicted this exact problem might occur this wouldn't have happened.

    Of course, these things do happen. Al we can do is find out why, and stop it from happening again.
    1. Re:It wasn't a single wrong command by cheater512 · · Score: 1

      Yes but one thing was bad and everything else did its job and the whole thing fell apart.

    2. Re:It wasn't a single wrong command by maestroX · · Score: 1

      Of course, these things do happen. Al we can do is find out why, and stop it from happening again.
      Misalignment of the solar panels should have been handled properly in *any* case, as the machine is relying on solar power mainly. These things do happen at a local car shop -- we're talking NASA with smart people.
    3. Re:It wasn't a single wrong command by roaddemon · · Score: 5, Funny

      "Either that or every accident ever is caused by a single minor fault."

      I agree. Otherwise WWII was caused by Hitler's mom having one too many drinks the night she met his dad.

    4. Re:It wasn't a single wrong command by Darth_brooks · · Score: 3, Funny

      I agree. Otherwise WWII was caused by Hitler's mom having one too many drinks the night she met his dad.

      How can you come up with such a woefully shortsighted and limited in scope analysis? Honestly. There are at least two theories to work under for the cause of World War II.

      WWII was caused by a series of reactions several billion years ago between amino acids. Or it was started 5000 years ago when God created Eve for Adam. Everything else in between is just a smattering of minor details.

      --
      There are some people that if they don't know, you can't tell 'em.
    5. Re:It wasn't a single wrong command by MichaelSmith · · Score: 4, Interesting

      So, if the procedures were better, this wouldn't have happened. If the fault protection system was better, this wouldn't have happened. If the designers had predicted this exact problem might occur this wouldn't have happened.

      TFA:

      over the years budgets and staff had been cut "in an effort to operate the mission as economically as possible."

      MGS was well into bonus time in the sense that the original goals had been reached. The project was running on a reduced budget and this made a mistake inevitable. I can't help thinking that at a higher level this was considered to be a good thing. When you have new missions to run and a fixed budget to run them on you want your old missions to stop so that you can draw a line under it and go on to the next thing.

      The last thing management want is to have to decide to shut the spacecraft down because they don't have the budget for operations on the ground. Reducing the budget is a way of inducing the shutdown.

    6. Re:It wasn't a single wrong command by Jerry+Beasters · · Score: 3, Insightful

      Smart people make mistakes, deal with it.

    7. Re:It wasn't a single wrong command by Anonymous Coward · · Score: 0

      Why must spacecraft always have a 'self-destruct' command?

    8. Re:It wasn't a single wrong command by Poltras · · Score: 1

      Yes, one minor error and everything else went according to plan. gnak gnak gnak

    9. Re:It wasn't a single wrong command by TapeCutter · · Score: 1

      Yep smart people allright, and carefull too. It was planned to last two years and they ended up crashing it by accident on the tenth. Ten years is a long time, maybe they got bored and wanted to go map another planet?

      --
      And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
    10. Re:It wasn't a single wrong command by Anonymous Coward · · Score: 0
      Everything else in between is just a smattering of minor details.


      It's been mostly underage?

    11. Re:It wasn't a single wrong command by DerekLyons · · Score: 2, Informative

      The last thing management want is to have to decide to shut the spacecraft down because they don't have the budget for operations on the ground.

      A nice theory, but one that fails to coincide with the facts. NASA routinely shuts down missions for lack of budget.
    12. Re:It wasn't a single wrong command by osu-neko · · Score: 2, Interesting

      Score: 5 Interesting for a really, really lame theory?

      There are many ways to end a mission. The one best for NASA is to close it, point to its budget, and wait for the cries of underfunding, using the closure of a perfectly good mission as evidence that the agency is truly underfunded based on its needs.

      The worst is for the mission to fail in some spectacular fashion, making people wonder if they should be giving these bozos any more money.

      So, you're telling me you think NASA would intentionally end a mission in the worst possible way for its future prospects of getting money when it had the option of doing in the way most likely to get it more money instead?

      That's kinda stupid, don't you think? Shouldn't a conspiracy theory at least have the conspirators doing something that benefits them rather than cutting their own throats?

      --
      "Convictions are more dangerous enemies of truth than lies."
    13. Re:It wasn't a single wrong command by The_Wilschon · · Score: 1

      Does this invoke Godwin's Law? IANALawyer, so I'm not sure of the applicability in this situation.

      --
      SIGSEGV caught, terminating

      wait... not that kind of sig.
    14. Re:It wasn't a single wrong command by Nethead · · Score: 1

      You first name wouldn't happen to be Valentine? Maybe you have a hidden agenda in wanting the MGS to fail. Hitting a little close to home, maybe?

      --
      -- I have a private email server in my basement.
    15. Re:It wasn't a single wrong command by networkBoy · · Score: 1

      I don't think so, as it was not used so much as an attack as a funny... borderline though.
      -nB

      --
      whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
    16. Re:It wasn't a single wrong command by toddestan · · Score: 1

      Godwin's law is simply that Nazi Germany will eventually come up in any discussion on the internet, given enough time. It says nothing about context. So it does apply here.

    17. Re:It wasn't a single wrong command by Anonymous Coward · · Score: 0

      This makes you wonder if it was an accident or if it was on purpose but made to look like an error.

      conspiracy! conspiracy!

    18. Re:It wasn't a single wrong command by edbarbar · · Score: 1

      The error path almost never works in software. Software is just not sofisticated [sic] enough.

      --
      Ed Barbar, President and General Manager, Furnit USA
    19. Re:It wasn't a single wrong command by chris.evans · · Score: 1

      if radiosig != true { move_antenna() } if mainsvcc desired { ++solarpanel // adjust solar panel } solved! :-)

    20. Re:It wasn't a single wrong command by Boronx · · Score: 1

      Does falling on your face invoke the law of gravity?

    21. Re:It wasn't a single wrong command by Linagee · · Score: 1

      Satellite commits suicide. Proof of AI, news at eleven.

  2. Emmentaler vs. Gruyere by DingerX · · Score: 4, Insightful

    One bad command started the chain, but it needed a series of system failures to kill it. In other words, a slight misalignment of the solar panels (or whatever it was) may have been a necessary cause, but not sufficient. The thing needed a safe-mode that wasn't safe, and battery logic that failed to consider environmental variables. All the conditions lined up.

    It's like saying that a mid-air collision occurred because two jetliners were assigned the same altitude and jetway in opposite directions at the same time. Yeah, but A) How they got that assignment is kinda complicated and B) any number of traffic control and collision avoidance systems have to fail too.

    1. Re:Emmentaler vs. Gruyere by Anonymous Coward · · Score: 0

      This is always the case when looking at a failure in hindsight.
      It is not that the procedures are all bad, but something happened that was not covered completely. A wellknown mid-air collision happened because one pilot followed the ATC directions and the other followed the anti-collision system. But the two acted in opposite direction.

    2. Re:Emmentaler vs. Gruyere by DingerX · · Score: 1

      When the anti-collision system kicks in and issues a Resolution Advisory, it's because ATC has failed. When the box on board effectively says "ATC has failed in its task, please start climbing." then "Dude, ATC has failed, climb a hell of a lot more, now", only more laconically and imperatively, you don't continue to listen to ATC and dive.

      So, yeah, like Ueberlingen, here you had a chain of events that results in a catastrophic failure. Was a "bad command" to blame? Only if the system has zero tolerance for "bad commands", and this one did not. There were four months between when the "bad command" was issued, and the situation got bad enough for the MGS to go into safe mode. Why did it have to reach such a level before being caught? And then "safe mode" really needs to be safe. You can't predict every situation, but this failed pretty spectacularly.

      Still, ten years is pretty good for an unmanned research spacecraft, and it's outstanding for anything to do with Mars.

    3. Re:Emmentaler vs. Gruyere by GTMoogle · · Score: 2, Funny

      Oh, Gruyere, definitely. Emmentaler is fine, but certainly its popularity can be attributed merely to the region's inadequate defense of the name, allowing cheap knockoffs to proliferate.

      *cricket* *cricket*

      What?

    4. Re:Emmentaler vs. Gruyere by v1 · · Score: 2, Insightful

      In most complex problems where catastrophic failure occurs, the problem manifests as a result of multiple smaller failures that combine in an unfortunate way, or as a chain reaction. By nature, people will want to narrow down the problem so they can identify a "cause". This is sometimes not appropritate as we see here, where a collection of less critical failures lead to catastrophy, any of which having been avoided would have prevented disaster. It's a bit like team theory... after losing a game the coach does not go looking for the one player that lost them the game - it's a team effort and everyone is involved and bears some responsibility. Unless someone made a blatant and major mistake that was responsible for the vast majority of the fallout that resulted, you have to accept that no one was "at fault". In this case several people made minor mistakes that by themselves are minor, but combined in such a way proved fatal to the craft. It's not anyone's fault, these things happen. All you can do to prevent this from happening again is to tighten up procedures to try to lower the number of minor failures that will occur (and you must accept you will never get them all) and to institute more review/backstopping to make it more likely that not only minor problems will be identified and fixed, but also that the result of complex interrelated events is predicted and prepared for.

      --
      I work for the Department of Redundancy Department.
  3. That'll Teach 'Em by Anonymous Coward · · Score: 5, Funny

    That'll teach those NASA folks to stop just using "sudo" when a command doesn't work under regular user permissions...

    1. Re:That'll Teach 'Em by ilovecheese · · Score: 0

      Nah, it was just another bad windows update ;)

    2. Re:That'll Teach 'Em by Tibor+the+Hun · · Score: 4, Funny

      You've just lost thousands of Windows folks...
      su..do...?

      --
      If you don't know what AltaVista is (was), get off my lawn.
    3. Re:That'll Teach 'Em by Anonymous Coward · · Score: 4, Funny

      ku!

    4. Re:That'll Teach 'Em by Spudtrooper · · Score: 4, Funny

      Mars Global Surveyor wants to commit seppuku: Cancel or Allow?

    5. Re:That'll Teach 'Em by shannara256 · · Score: 1

      Sudo means Simon Says.

  4. Oblig. by TehBlahhh · · Score: 3, Funny

    It was the Tamil Tigers that hacked it, and inserted this insidious command! The threat of terrorists is everywhere! This would have been preveneted if we had kept up the war on terror.

    1. Re:Oblig. by Anonymous Coward · · Score: 1, Funny

      NASA has been militarized, and given the singular task of bringing Democracy to Mars.

    2. Re:Oblig. by Dogtanian · · Score: 1

      Yes, you were joking; but I've said this before. What's to stop hostile parties from hacking, DOSsing or simply hijacking your average space probe?

      --
      "Slashdot - News and Chat Sites Deviant". (Click "homepage" link above for details).
    3. Re:Oblig. by maxume · · Score: 1

      $$$. I have no idea what hitting the right spot with a signal strong enough to cause problems(let alone with properly encoded commands) would take, but it strikes me as likely being non-trivial, with little benefit gained(because the bad publicity can be controlled by claiming that there was a measurement problem or whatever).

      --
      Nerd rage is the funniest rage.
    4. Re:Oblig. by uncoveror · · Score: 1

      The only "Mars probes" that haven't crashed are in Arizona.

      --
      The Uncoveror: It's the real news.
    5. Re:Oblig. by WindBourne · · Score: 1

      The lack of an antenna with enough power to send the commands? Not knowing the sequence to start it? Not knowing exactly where the probe is? I would be VERY surprised if anybody would spend the millions required to pull off something like this, except for maybe another country. And that would then amount to an attack on America. While this admin is incapable of doing a war correctly, I suspect that damn few countries want the havoc that we seem to cause.

      --
      I prefer the "u" in honour as it seems to be missing these days.
    6. Re:Oblig. by Dogtanian · · Score: 1

      When this thought occurred to me, I *was* thinking of foreign governments; particularly during the cold war era. Not to mention the fact that some of the probes out there were launched quite a long time ago and won't have modern standards of "security".

      --
      "Slashdot - News and Chat Sites Deviant". (Click "homepage" link above for details).
  5. Bad command or filename by anss123 · · Score: 1

    C:\>

    1. Re:Bad command or filename by robably · · Score: 5, Funny

      C:\>
      You know, that looks like the emoticon for an egghead with a beard, frowning. Very appropriate.
    2. Re:Bad command or filename by NeilTheStupidHead · · Score: 1, Funny

      Egghead? No, no. It's the over developed cranium of our new martian overlords.

      --
      Lose: misplace or fail || Loose: not bound together
    3. Re:Bad command or filename by Dasher42 · · Score: 4, Funny

      *sniff*

      You just made a beautifully appropriate commentary on a common fixture of my childhood. Dude.

    4. Re:Bad command or filename by srussia · · Score: 1

      C:\>
      You know, that looks like the emoticon for an egghead with a beard, frowning. Very appropriate. Or tilting your head the other way, a monkey wearing a dunce cap... equally appropriate.
      --
      Set your phasers on "funky"!
    5. Re:Bad command or filename by Zaiff+Urgulbunger · · Score: 1

      C:\>
      You know, that looks like the emoticon for an egghead with a beard, frowning. Very appropriate.
      Or tilting your head the other way, a monkey wearing a dunce cap... equally appropriate.
      Or a *really* happy dude wearing a pointy hat at a jaunty angle!
    6. Re:Bad command or filename by jo7hs2 · · Score: 1

      Looks like a big-nosed wizard to me.

    7. Re:Bad command or filename by oleerich · · Score: 1

      If you take the inverse you get a dude with a potato nose and a partyhat.....

  6. You nits by Bastard+of+Subhumani · · Score: 3, Insightful

    The mistake caused a problem with the positioning of the solar power panels
    What was it this time, degrees vs radians?
    --
    Only three things are certain; death, taxes, and apocryphal quotations - Ben Franklin.
    1. Re:You nits by Anonymous Coward · · Score: 0

      If it's American engineers it's probably hogsheads and rods instead.

    2. Re:You nits by harry666t · · Score: 1, Funny

      > What was it this time, degrees vs radians?

      Grads.

    3. Re:You nits by Anonymous Coward · · Score: 0

      Read about it all here.

    4. Re:You nits by Anonymous Coward · · Score: 0

      If it's in space, the odds are very good that it was put there by American engineers.

  7. Crispy fried... by Anonymous Coward · · Score: 0

    ...martian spacecraft. The ultimate geek's meal.

    Sorry, my mind's wandering.

    1. Re:Crispy fried... by Anonymous Coward · · Score: 0

      Sorry, my mind's wandering.
      Quick! Shoot it!!
  8. *Design* flaw by CatoNine · · Score: 3, Insightful

    From TFA: ".... That exposed one of the batteries to direct sunlight, causing it to overheat." So, also a small naviation error or small mechanical failure could already cause this thing to overheat. It should have been constructed more robust.

    1. Re:*Design* flaw by dsanfte · · Score: 2, Interesting

      Some temperature monitors on critical, exposed devices would also help. All you need is the CPU temperature diode present on just about every motherboard sold today. In fact, how about many of them, arranged at strategic positions on the spacecraft hull to give real-time temperature information to the satellite's computer? I guess complicated ideas like these get ignored in favor of simpler solutions, like relying on large chains of command and bureaucratic procedures carried out 30 light-minutes from the point of failure.

      --
      occultae nullus est respectus musicae - originally a Greek proverb
    2. Re:*Design* flaw by Mike1024 · · Score: 4, Informative

      Some temperature monitors on critical, exposed devices would also help. All you need is the CPU temperature diode present on just about every motherboard sold today.

      I looked at the actual report on the NASA website; it said "the spacecraft's power management software misinterpreted the battery over temperature as a battery overcharge and terminated its charge current."

      There was a temperature monitor on the critical, exposed component. Furthermore, the information from the sensor was used in a sensible manner: Li-poly/li-ion batteries can catch fire under some circumstances (see also: sony laptop batteries) so if your li-poly battery overheats while being charged you stop charging it (because you'd rather have a flat battery than an exploded battery).

      After the craft stopped charging the battery it never started charging the battery again. The battery ran down and the craft stopped working.

      The obvious question is: why didn't charging resume after the battery had cooled down? It might not have cooled down (as it was hot in the first place due to being exposed to the sun) or the system might have been waiting for a 'resume charging' command from ground control, which was never received as the high-gain antenna was in the wrong position.

      Personally if I was designing a space craft I'd duplicate the (presumably quite small) onboard computer and radio hardware, because it seems quite common for software/electronics failures to result in loss of communications. Having two processors running different software, each capable of reprogramming the other one if it became broken, would seem like a sensible route to take.

      Just my $0.02.

      --
      "Goodness me, how unlike the FBI to abuse the trust of the American public." -- The Onion
    3. Re:*Design* flaw by Detritus · · Score: 1
      You should win the Olympic medal for jumping to conclusions.

      Almost all spacecraft do have a large number of temperature sensors that are connected to the spacecraft telemetry system. They are used to detect equipment problems and thermal management issues. This has been standard procedure for many decades.

      --
      Mea navis aericumbens anguillis abundat
    4. Re:*Design* flaw by dsanfte · · Score: 2, Insightful

      This has been standard procedure for many decades.


      And yet, it failed.
      --
      occultae nullus est respectus musicae - originally a Greek proverb
    5. Re:*Design* flaw by maxume · · Score: 3, Interesting

      They don't get to build the best damn space probe they can build, they get to build the best damn space probe they can build for $X. Thermal management isn't easy; controlling orientation allows them to spend money on the stuff they are interested in, rather than insulation and shielding.

      --
      Nerd rage is the funniest rage.
    6. Re:*Design* flaw by FSWKU · · Score: 1

      ...large chains of command and bureaucratic procedures carried out 30 light-minutes from the point of failure.

      Which (sadly enough) makes NASA exactly the same as every other workplace with >50 people...
      --
      "So after all this, you make my case for me. To end this stalemate, you must die..."
    7. Re:*Design* flaw by Anonymous Coward · · Score: 0

      I'd go further than that. I'd have two spacecraft, but one in a simulation chamber on the ground. Feed all instructions to the simulator FIRST, then go live (maybe a day later, or even better after the simulator has reached an equilibrium condition). If you did indeed have a duplicate of the spacecraft, this fault would have been observed FIRST on the ground system.

    8. Re:*Design* flaw by DogDude · · Score: 1

      That's easy for you to say, and maybe you're right. But you have to remember that NASA is driving remote-controlled cars, complete with all kinds of sensing equipment MILLIONS ON MILES AWAY. I for one am amazed this project, and I have the utmost respect for NASA.

      --
      I don't respond to AC's.
    9. Re:*Design* flaw by shawnce · · Score: 1

      Not easy to simulate the environment accurately or fully. In the end is was an environmental factor that triggered the slide to death.

    10. Re:*Design* flaw by ckedge · · Score: 1

      > Personally if I was designing a space craft

      If you were designing a space craft it would be 2000 lbs overweight and require a $1-billion launch vehicle instead of a $100-million launch vehicle, and you'd get your ass fired.

      What kind of moron "quips offhand" that he'd have been smart enough to fix/predict all 50,000 imaginary possible 5-factor problems (remember, this is just the one of 50,000 that occurred) as compared to the $100-million dollars worth of PhD-Years that someone else already spent on the problem.

    11. Re:*Design* flaw by Tablizer · · Score: 2, Interesting

      Personally if I was designing a space craft I'd duplicate the (presumably quite small) onboard computer and radio hardware, because it seems quite common for software/electronics failures to result in loss of communications. Having two processors running different software, each capable of reprogramming the other one if it became broken, would seem like a sensible route to take.

      Or maybe have a small back-up battery in the center of the probe where it cannot be heated by the sun even if the probe gets pointed in the wrong direction. However, this won't necessarily solve the problem of a damaged main battery via sun exposure. Perhaps design the craft so that any wrong orientation is not fatal. However, this cranks up the costs.

      We may have reached a threashold in unmanned exploration where the operations and software is more expensive than building and launching the hardware itself (at least for going to Mars). This may mean that it is actually cheaper to let failures slip through every now and then. For example, it may be cheaper to have 5 probes with a 40% of failure than 2 probes with 10% failure. However, such may result in national embarassment. Optimum science per dollar and national pride may be in conflict here.

    12. Re:*Design* flaw by jespley · · Score: 1

      "The command and data-handling system is built around two redundant flight computers that run in parallel." Albee, A. et al., Overview of the Mars Global Surveyor Mission, Journal of Geophysical Research, VOL. 106, NO. E10, PAGES 23,291-23,316, OCTOBER 25, 2001

    13. Re:*Design* flaw by pookemon · · Score: 1

      "Li-poly/li-ion batteries can catch fire under some circumstances (see also: sony laptop batteries) so if your li-poly battery overheats while being charged you stop charging it (because you'd rather have a flat battery than an exploded battery)."

      Has anyone tested LiPo's to see how well they burn in space?

      --
      dnuof eruc rof aixelsid
    14. Re:*Design* flaw by sjames · · Score: 1

      It WAS more robust. The craft was well beyond it's design lifetime when it was finally lost. All batteries lose capacity over time. I don't know for sure, but it wouldn't surprise me that if this had happened within it's design lifetime, the remaining bettery would have been adequate to maintain contact and allow for corrective commands from the ground such as rotate x degrees and resume charging.

      Everyone wants everything built robust enough to last till the end of life on earth or longer, but nobody wants to pay for it.

    15. Re:*Design* flaw by Jonathan_S · · Score: 1

      We may have reached a threashold in unmanned exploration where the operations and software is more expensive than building and launching the hardware itself (at least for going to Mars). This may mean that it is actually cheaper to let failures slip through every now and then. For example, it may be cheaper to have 5 probes with a 40% of failure than 2 probes with 10% failure. However, such may result in national embarassment.
      One possibly method to try to address this (other than just letting some probes die) is to accept some inefficiencies and split probes into standard and mission specific pieces (at least for orbital probes).

      You could build a base orbiter design around a common chasse, using standardized software to handle things like orbital maneuvering/orientation, power generation/storage/allocation, and communications. There would be reserved weight / volume for mission specific hardware and sensors. That base software (and standard sensors) would have most of the fail safe code of the probe, to handle things like loss of comms, improper orientation, thermal limits, partial power failure, etc.
      Hopefully that would allow the common software to be improved over time and allow future missions to focus on the sensors they want to install, without having to worry about reinventing fail safe modes for the base orbiter.

      However, there would certainly be downsides to doing this.
      For one thing, it would put a crimp on the size and placement of sensors or other mission hardware because they would have to be built around the volume/mass and probably mass distribution limits of the basic chasse.
      For another you might have to work around the base chasse having too little power generation/storage for the mission hardware; or conversely waste mass and thus raising the cost of delivery by having much larger solar panels and batteries than you require.

      It would really only be practical if you wanted to build quite a few fairly similar probes.
  9. Newbie programmers take notice. by Anonymous Coward · · Score: 0

    ... It makes me actually want to go over my implementation of a FSM (finite state machine, not flying spagetti monster) for a program I'm working on. It's amazing how these errors mirror a software error: some small bug/hidden "feature" propagates into one or multiple big problems. Your system fails, because something so simple as an if statement.

    1. Re:Newbie programmers take notice. by Anonymous Coward · · Score: 0

      Your system fails, because something so simple as an if statement.

      Man, that's deep.

  10. Give NASA a break by patio11 · · Score: 2, Interesting

    It worked for a decade at a cost of a piddling $220 million, plus $20 million a year in upkeep. At a hair over $40 million a year, thats much, much less wasteful than most NASA missions. (Yeah, I suppose you could consider whether the return was worth it. Heh, who are we kidding -- did YOU get $40 million a year out of those desktop photos? I didn't.)

    I propose that next time NASA spend $150 million on the construction phase, which is just a slush fund for defense contractors anyhow, and then issue the lethal command before launch. Then we'd save a decade worth of upkeep costs and the $65 million launch budget. NASA could even have a $10 million prize going to the person who most creatively identified a possible fatal error, since thats the only fun part of these missions for people who aren't rocket scientists and we wouldn't want to skimp on it.

    1. Re:Give NASA a break by Anonymous Coward · · Score: 3, Insightful

      "did YOU get $40 million a year out of those desktop photos?"

      MGS mapped targetted parts of the surface of Mars at much higher resolution than any previous mission. Among other things, it was responsible for finding the gullies that are probably signs of water being expelled recently at the surface. The length of the mission allowed it to detect changes at these sites, suggesting the process is still occurring today.

      What you really seem to be saying is that exploration of Mars by any means is a big waste of money, in which case, if you're going to complain about any specific mission as an example, MGS should be way down the list, because it was comparatively cheap, long-lived, and successful.

    2. Re:Give NASA a break by dreamchaser · · Score: 2, Insightful

      I propose that you get a clue. It's hard to place a value on science like this, but the advancement of knowledge and working towards getting off this rock are both highly valuable fields of endeavor.

      As for your 'proposal'...did you just pull those numbers out of your ass? If anything costs increase over the years due to rising wages and inflation. $220 million was *dirt cheap* by space mission standards.

      We waste far more money on subsidies and entitlements in the US than we spend on science like this.

    3. Re:Give NASA a break by brassman · · Score: 3, Insightful

      > did YOU get $40 million a year out of those desktop photos? I didn't.

      So divide that by 200 million (roughly) to get your share.

      I got two quarters' worth. Heck, you can't get a comic book for fifty cents.

      --
      "Ain't no right way to do a wrong thing."
    4. Re:Give NASA a break by Nethead · · Score: 1

      I'm willing to bet that 40 million of us got more than a dollar a year worth of postcards. I know I feel I did.

      --
      -- I have a private email server in my basement.
    5. Re:Give NASA a break by patio11 · · Score: 1

      Splendid, then we should have no problem privatizing the next probe and funding it with postcard sales.

  11. I knew it! by Anonymous Coward · · Score: 0

    It was a bad idea to begin with by adding the crash.exe file.

  12. The actual report by Mike1024 · · Score: 5, Informative

    The preliminary official report is availiable from here. The summary conclusions are:

    * A modification to a spacecraft parameter, intended to update the High Gain Antenna's (HGA) pointing direction used for contingency operations, was mistakenly written to the incorrect spacecraft memory address in June 2006. The incorrect memory load resulted in the following unintended actions:
    ** Disabled the solar array positioning limits.
    ** Corrupted the HGA's pointing direction used during contingency operations.
    * A command sent to MGS on November 2, 2006 caused the solar array to attempt to exceed its hardware constraint, which led the onboard fault protection system to place the spacecraft in a somewhat unusual contingency orientation.
    * The spacecraft contingency orientation with respect to the sun caused one of the batteries to overheat.
    * The spacecraft's power management software misinterpreted the battery over temperature as a battery overcharge and terminated its charge current.
    * The spacecraft could not sufficiently recharge the remaining battery to support the electrical loads on a continuing basis.
    * Spacecraft signals and all functions were determined to be lost within five to six orbits (ten-twelve hours) preventing further attempts to correct the situation.
    * Due to loss of power, the spacecraft is assumed to be lost and all recovery operations ceased on January 28, 2007.

    --
    "Goodness me, how unlike the FBI to abuse the trust of the American public." -- The Onion
    1. Re:The actual report by jacksonj04 · · Score: 1

      So hang on... they *overwrote* the memory which contained the contingency operations plan and the hardware limitations data for the solar array? Surely that's bad design, you shouldn't be able to overwrite something like that (Unless the hardware limits plan on changing mid-mission). NASA fault protection modules evidently don't do their job too well :-/

      --
      How many people can read hex if only you and dead people can read hex?
    2. Re:The actual report by MichaelSmith · · Score: 1

      Hmmm. When they went looking for it the MGS wasn't where they expected it to be. Hard to see how the failure mode they describe would have made it change its trajectory by a significant degree.

    3. Re:The actual report by gfilion · · Score: 2, Informative

      The preliminary official report is availiable from here.

      Thanks for the link. The report is only three pages long and very interesting to read. The cause (quoted below) is really stunning, I wonder what's the probability of this sequence of event to happen.

      The LM team performed a fault analysis to determine the cause of the spacecraft anomaly. An LM spacecraft engineer ultimately determined that the likely cause of the anomaly was an incorrect parameter upload that had occurred 5 months earlier (June 2006). A direct memory command to update the HGA's positioning for contingency operations was mistakenly written to the wrong memory address in the spacecraft's onboard computer. This resulted in the corruption of two independent parameters and had dire consequences for the spacecraft. The first parameter error caused one solar array to be driven against its hard stop, leading the MGS fault-protection system to incorrectly believe it had a stuck gimbal, causing MGS to enter contingency mode. Upon entry into contingency mode, the spacecraft's orientation was such that one of the batteries was directly exposed to the sun. This caused the battery to overheat which in turn gave a false indication of an overcharged battery and led to the premature termination of battery charging on each subsequent orbit. Even though the remaining battery continued to be charged, it was not being charged sufficiently to support the full electrical load, which was normally supported by both batteries. The end result was that both batteries were depleted, probably within 12 hours. The second parameter error caused the HGA to point away from the Earth when the spacecraft was, in fact, properly oriented to communicate to Earth. Communication from the spacecraft to the ground was therefore impossible, and the unsafe thermal and power situation could not be identified by the MGS's ground controllers.
    4. Re:The actual report by gfilion · · Score: 2, Informative

      So hang on... they *overwrote* the memory which contained the contingency operations plan and the hardware limitations data for the solar array? Surely that's bad design, you shouldn't be able to overwrite something like that (Unless the hardware limits plan on changing mid-mission). NASA fault protection modules evidently don't do their job too well :-/

      Actually, they had to correct a previous error by writing directly to memory. I believe that writing directly to memory is not a standard operating procedure. The PDF report linked by the GP states that:

      [...] The HGA parameter was actually updated on the two redundant control systems at two different times. The updates were commanded with slightly different (operator input) precision. This difference in precision, while numerically inconsequential, resulted in an inconsistency between the computer memories. A full memory readout taken at a later date revealed the difference between the two positioning angles, which warranted a correction by the operations team. During the effort to correct the inconsistency, the operations team specified incorrect memory addresses. The incorrect memory addresses caused the command upload to enter data into erroneous memory locations, resulting in the consequences described above.
    5. Re:The actual report by Tablizer · · Score: 1

      Actually, they had to correct a previous error by writing directly to memory. I believe that writing directly to memory is not a standard operating procedure. The PDF report linked by the GP states that:

      Either you don't allow change to the safety programs, or you risk breaking them if you do allow changes. Changing as much as possible dynamically has proven very useful in the past. One problem with the Huygens Titan lander was that the radio broadcast code was in firm-ware instead of software, and a doppler shift logic error was later detected. They had to replan large parts of the mission to work around this problem because of that. Because the orbiter has a complex pin-ball-like orbit, this was expensive. If it was reprogrammable, then they could have fixed the doppler handler and kept the same mission plan.

      I suppose they should be extra careful when commands modify safety systems, but bleep happens and as the article stated, they had cut down operations expenditures to save money. This may be a factor.

    6. Re:The actual report by ScrewMaster · · Score: 1

      A modification to a spacecraft parameter, intended to update the High Gain Antenna's (HGA) pointing direction used for contingency operations, was mistakenly written to the incorrect spacecraft memory address in June 2006.

      To me, that sounds like they need some "managed code" up there. Windows SE.NET (Space Edition) would have prevented this, I'm sure.

      --
      The higher the technology, the sharper that two-edged sword.
  13. An old error strikes back! by Gazzonyx · · Score: 1
    Deleted from TA:

    In a tragic comedy of errors, NASA accidently sends the Mars Global Surveyor a confirmation to execute "con/con". Microsoft explains that this will be patched in TerraWindows (TM), and for the moment their only suggestion is to "...do the Microsoft '1,2 shuffle'; sigh heavily and do a hard reboot..."

    John Dvorak has been contacted as a possible canidate to go manually reboot the Surveyor, but has yet to accept the proposition.


    *ducks*

    --

    If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.

  14. Solar panel caused battery to overheat ? by Ace905 · · Score: 0, Troll

    The article mentions that a new round of global-warming may be taking place on Mars - does this lend any credence to the theory that global warming is an unavoidable solar event? Maybe Mars and Earth switch off and on in turns - making one hospitable to life while the other becomes a desolate barren wasteland. Maybe we all just need to move 35 Million Miles away.

    Sometimes I feel like I need to.

    Also, The slashdot write-up says a, 'wrong command to the wrong computer address'. It was the right command, to the wrong computer address. If you're going to just play 'telephone' entering stories, pay attention. You made it more complicated and wrong. Maybe you should go work for NASA; got some diapers and surgical tubing?

    ---
    Diapers and surgical tubing!

    --

    Ace
    1. Re:Solar panel caused battery to overheat ? by Ambitwistor · · Score: 0, Offtopic

      The article mentions that a new round of global-warming may be taking place on Mars - does this lend any credence to the theory that global warming is an unavoidable solar event? No.

      There are good reasons to believe that the warming on Mars is not due to the Sun (here and here). There are even better reasons to believe that most of the warming on Earth is not due to the Sun (e.g., here and a bunch of essays here).
  15. Are you telling me ... by WrongSizeGlass · · Score: 1

    ... that NASA doesn't have an undo command? I guess they really have cut their budget.

    1. Re:Are you telling me ... by dominious · · Score: 1

      I guess they really have cut their budget.
      this must be true! in fault tolerance, when the system reaches a bad state, it should undo by
      using backward error recovery: saved variables in past checkpoints which will allow a rollback to a good state.
      This is ofcourse costly in terms of processing and data storage, and may not always succeed:
      "Please ignore incoming missle."
  16. Impressive by hcdejong · · Score: 4, Interesting

    Not the error itself, but the fact NASA was able to figure out what happened in such detail, when the spacecraft it happened to is not giving any diagnostic information and cannot be examined directly.

  17. wrong parameter? by advocate_one · · Score: 4, Funny

    /sudo shutdown -h now sent instead of /sudo shutdown -r now

    --
    Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.
    1. Re:wrong parameter? by jsiren · · Score: 1
      What they actually did, but are afraid to admit, was the canonical thinko:

      $ sudo /etc/init.d/sshd stop ; start
      Stopping sshd...Connection closed by remote host.
      Uh-oh. Honestly, i was thinking of the complete syntax...

      $ ssh critter
      ssh: Connection refused.
      Oh crud.
      --
      Usage: km/h for speed (kilometers per hour); kph for very slow impulses (kilopond hours).
  18. Ah, the good old days of APM by Anonymous Coward · · Score: 0

    Heh, seems I am not the only one to have a problem with the Linux/ACPI combo.
    (I am kidding, honestly)

  19. So then by Impy+the+Impiuos+Imp · · Score: 0, Offtopic

    > Fuk Li, manager

    Hehehe

    --
    (-1: Post disagrees with my already-settled worldview) is not a valid mod option.
  20. What's in a name.. by owlnation · · Score: 2, Funny

    Admittedly offtopic, but...

    Somehow I find it reassuring that NASA employs someone called "Dolly Perkins". It has that warm cosy 1950's feeling of Golden Age Space Exploration. Now, if only we could get the astronauts named "Buck", "Rock", or "Trent".

    1. Re:What's in a name.. by i_want_you_to_throw_ · · Score: 1

      I AM Dolly Perkins you insensitive clod!!!!

    2. Re:What's in a name.. by mangu · · Score: 1
      I find it reassuring that NASA employs someone called "Dolly Perkins". It has that warm cosy 1950's feeling of Golden Age Space Exploration.


      So, what do you make of "Fuk Li, manager of the Mars exploration program at JPL"?...

  21. Good code Is For Old People by AHuxley · · Score: 2, Informative
    In Capitalist West you send sloppy code to perfect probe.
    In Soviet Russia perfect probe sends lens cap code back to you!


    A wiki link to help with the lens part.
    http://en.wikipedia.org/wiki/Venera_program

    --
    Domestic spying is now "Benign Information Gathering"
    1. Re:Good code Is For Old People by Tablizer · · Score: 1

      Quote: The Venera 9 and 10 landers had two cameras each. Only one functioned because the lens covers failed to separate from the second camera on each lander. The design was changed for Venera 11 and 12, but this made the problem worse and all cameras failed on those missions. Venera 13 and 14 were the only landers on which all cameras worked properly; although ironically, the lens cap on Venera 14 landed exactly in the way of the soil compression probe...

      Rotten luck. As I read it elsewhere (but in peices), the first set of probes had their lens caps get stuck. So on the second set (probe 2.0 if you will), they changed the design to make softer lens caps that would be easier to push off, but the heat of venus caused the new caps to melt and stick on the lens itself. With the third set (probe 3.0) the lens caps came off properly, but happened to fall in the way of the soil experiment arm, blocking it. I would hate to be in the Lens Cap Department. It would be like working for Iraqi War Planning.

  22. More robust == heavier by mangu · · Score: 3, Insightful
    It should have been constructed more robust.


    So, which scientific experiment would you remove in order to put additional heat shielding? No, the thermal shielding and other protection systems are just right for a spacecraft that had to travel a hundred million kilometers.


    What really failed was the ground-based software, that didn't have a good enough thermal model, and the technical support team. Equipment may fail, operators may commit errors, but there should be enough experienced engineers around to do a correct analysis to catch those errors. Downgrading of the engineering team is the true problem here. Look at what happened to Columbia. It blew up on reentry because of a failure that had happened on take-off, was caught on video, but not analyzed correctly.


    NASA isn't alone in these failures, perhaps one could say they set the pace for the rest of the industry. The lack of a good thermal model is typical of a whole generation of engineers used to do everything in Excel. With the current CPUs one has at each desktop, it wouldn't be so hard to do a correct thermal model of the spacecraft, but it would imply in solving a system of partial differential equations in C++, something very few engineers are able to do, even when given an extensive library.

    1. Re:More robust == heavier by Anonymous Coward · · Score: 1, Informative

      Not a lack of thermal model (such things DO exist for most spacecraft), and they DO spend quite a lot of time in both modeling and test (thermal balance) where they shine artificial sunlight on the spacecraft in a vacuum chamber while it's operating to verify that the model works. http://mpfwww.jpl.nasa.gov/martianchronicle/martia nchron7/mgs.html

      In fact, because MGS used aerobraking, which heats the spacecraft during the dips into the atmosphere, I'll bet the thermal model for MGS is better than most.

      But, the previous poster is right..ultimately it's a budget issue.. if you designed the spacecraft to handle every eventuality, it would be too heavy to launch. if you did ground analysis for every conceivable situation, there aren't enough engineers in the world to finish the job before the "every two years" launch opportunity. At some point, you rely on judgement.. hordes of people in reviews shooting at your design, and you figure you've covered 99.9% of the stuff... time to ship and shoot.

      let's also remember that this puppy has been going for >10 years, which means it was designed 15 years ago... Call it 1990. Somehow I don't think the thermal design engineers at Martin Marietta and JPL were rookies using Excel for the first time, and I suspect that they are fully capable of understanding how to numerically solve partial differential equations, and the limitations of those numerical methods. They can also solve them analytically.. my gosh, with a slide rule, even.

      http://mpfwww.jpl.nasa.gov/martianchronicle/martia nchron2/marschro29.html

      FWIW, C++ is hardly necessary.. This kind of thing is really the domain of good old FORTRAN. Good optimizing compilers, well validated numerical codes, etc.

    2. Re:More robust == heavier by mangu · · Score: 1
      let's also remember that this puppy has been going for >10 years, which means it was designed 15 years ago... Call it 1990. Somehow I don't think the thermal design engineers at Martin Marietta and JPL were rookies using Excel for the first time


      Exactly my point, more so considering that not so many people used Excel at that time. I have been working with commercial satellites since 1984 and, although I've never had direct contact with NASA, we have the same suppliers.


      Thermal models are the least developed studies in spacecraft design. They *do* test them in thermal vacuum chambers, of course, but that's before it's launched. The reason for these comparatively crude thermal models is that we have had very good ways of simulating mechanical systems involving ordinary differential equations for decades, but things like thermal or diffusion models need partial differential equations, whose solution is orders of magnitude more complex.


      The big advances in CPU power in the last decades would have been a great opportunity for advancing studies in thermal and other models that need lots of CPU, but, unfortunately, we have had this "dilbertization" of engineering, where only management skills are valued.


      The fact that you mention FORTRAN is *very* significative, IMHO. Because this shows how little effort has been made in developing new engineering models lately. Fortran is, essentially, a dead language. No young engineers and scientists are doing much effort in Fortran, it hasn't been taught in mainstream undergraduate engineering courses for at least 20 years. But we still have large scale systems modeled in Fortran, because very few young people are doing such studies.


      If there existed a wide interest in the science and engineering of modeling physical phenomena, there would exist no need to use old FORTRAN code. People would be developing new optimized libraries for physical models using C++ or Python or Ruby. After all, if FORTRAN is so intrinsically effective for optimizing numerical calculations, then why nobody uses it for 3D games?


      Companies invest so much in developing new hardware for simulations in games, if Fortran were able to generate significantly better code, don't you think it would be the preferred platform for game development? No, my friend, FORTRAN, or Fortran if you prefer the new name convention, is essentially dead. Only old codgers who haven't been able to evolve with the times use it. And, unfortunately for NASA, those old codgers, who developed the thermal models of spacecraft that everybody uses today, haven't been able to use all the new power available in CPUs today to improve that thermal model.


      NASA's problem isn't the budget, throwing more money at it wouldn't do anything. Their problem is that no one values the true engineers that got us to the Moon. Think what people like those could accomplish today, if they used improved tools like Python and the current CPUs, instead of FORTRAN and sliderules.

    3. Re:More robust == heavier by Anonymous Coward · · Score: 0

      There are quite advanced thermal solvers out there, and have been for decades (see SINDA). The engineers don't need to write the Fortran code for solving partial differential equations for each model. Currently, tools available allow engineers to build geometric models that are translated into conductance/capacitance networks with orbital heat loads applied and solved without a line of code being written by the engineer (see Thermal Desktop). This is not a failure of available tools but rather a failure to think of everything, which in my opinion is the goal of engineering design. You can never really think of everything, but the things you don't think of had better be really unlikely, or they'll bite you in the ass.

    4. Re:More robust == heavier by Anonymous Coward · · Score: 0

      Hmmm. large systems of partial differential equations.. seems like a pretty classic High Performance Computing (HPC) application. And, there *is* substantial work being done in FORTRAN in that area. Why? Lots of reasons: there are validated codes to start with; the problems are typically well structured, so one doesn't need the abstraction and dynamicism (for lack of a better term) of OO. They're compute bound, and FORTRAN compilers typically generate faster programs, because of being able to take advantage of the array structure, strides, etc. There was quite a discussion of just this on the beowulf list a few weeks ago (http://www.beowulf.org/ should find it)
      >>If there existed a wide interest in the science and engineering of modeling physical phenomena, there would exist no need to use old FORTRAN code. People would be developing new optimized libraries for physical models using C++ or Python or Ruby. After all, if FORTRAN is so intrinsically effective for optimizing numerical calculations, then why nobody uses it for 3D games?

      There IS such interest, but there's also a realization that "if it works, don't break it".. refactoring and rewriting code isn't always worth the investment. You might also take a look at the NASA sponsored work at JPL on integrating mechanical,thermal, dynamic, and electromagnetic models. IMOS is but one example. They ARE running this stuff on modern computers, using modern languages. It's hardly a dead area with no development.. My comment was addressed to the age of MGS.. it was designed in the late 80s early 90s.. life has moved much farther since then, and a spacecraft being designed today would use better models.

      With respect to 3D graphics, etc. FORTRAN is painful for such applications.. they are characterized by lots of dynamic allocation as you're concerned with the interactions of a continuously varying cast of objects being rendered or visualized. And, indeed, a more OO approach is used in these areas. And, there is some effort to bring graphics techniques into the computational modeling area. For instance, there's a big effort to be able to take mechanical drawings for manufacturing into the FEM codes. The problem is that what makes a good model for SolidWorks or Pro/E doesn't necessarily make a good model for thermal, mechanical, or electromagnetic purposes.

      As far as trying to "remodel" an old spacecraft using new techniques.. it's a bit tricky, and you have a resource allocation problem.. only so many thermal engineers to go around, and you have to decide where to put them.. modeling a 5 years past life expectancy spacecraft for which you don't have a lot of good data to start with, or modeling the spacecraft you're going to launch in 3 or 4 years from now.

      As for why FORTRAN and it's ilk are not taught in school is another issue entirely. CS departments choose languages for pedagogy.. they're not "trade schools", because they (correctly) figure that if you can learn to engineer software in 2 or 3 different languages(it doesn't matter which), you've learned the core skill of design and implementation, and the "coding" (which is a tiny part of the overall task) doesn't much depend on the language of implementation. And, there's always fashion to consider. FORTRAN isn't fashionable, Ruby on Rails is. 30 years ago, Pascal was fashionable.. and forward thinking CS departments used it, notwithstanding that it's not much used today. However, the skills developed by folks in that program are still useful, aren't they?

  23. HAL: there is a problem with the AE35 antenna by Anonymous Coward · · Score: 0

    Are they sure the computer didn't reposition the antenna deliberatly

  24. It all started... by Anonymous Coward · · Score: 0

    when that damned butterfly began to flap its wings.

  25. The command by bl8n8r · · Score: 1

    [root@surveyor]# dd if=/dev/urandom of=/dev/solar_panels

    --
    boycott slashdot February 10th - 17th check out: altSlashdot.org
  26. Fewer than I expected... by WgT2 · · Score: 1

    There sure are fewer MS jokes than I expected.

  27. Batteries Overheating by ptelligence · · Score: 2, Funny

    Guess they weren't aware of the recall on those Dell batteries.

  28. Nope, those are the real numbers by patio11 · · Score: 2, Informative

    http://nssdc.gsfc.nasa.gov/database/MasterCatalog? sc=1996-062A

    I realize this was dirt cheap by space mission standards. A laptop encrusted with diamonds which costs $80,000 is dirt cheap by laptop-encrusted-with-diamonds standards. That *doesn't make it worth the money*. I know we waste far more than $40 million a year on many things -- and, logically, every one of them except one can be justified by "We waste more money on another program, don't cut *my* hobby horse!"

    Its interesting that you draw the distinction between subsidies/entitlements and science, since NASA is a fairly naked subsidy directly to defense contractors, who make all of the really expensive bits. I'm all for giving Lockheed Martin money when its required, but lets be honest and get ourselves something which blows up in a suitably impressive manner when we do, OK? Similarly, I might even be persuaded that the US federal government should fund science projects -- great, then *fund science*! Don't blow $160 million just to accelerate a tin can out of the atmosphere to get a few close up pictures of rocks. $160 million could fund an awful lot of real science down here, much of which would produce actual results (or, alternatively, you could fund research gazing into the Clear Blue Sky, which is *still* cheap when you do it somewhere in atmosphere).

    1. Re:Nope, those are the real numbers by inviolet · · Score: 1

      Drat, where's my mod points when I need them to un-Troll a great post like yours.

      --
      FATMOUSE + YOU = FATMOUSE
    2. Re:Nope, those are the real numbers by Anonymous Coward · · Score: 0

      Mod parent up!

    3. Re:Nope, those are the real numbers by Anonymous Coward · · Score: 0
      > I'm all for giving Lockheed Martin money when its required, but lets be honest and get ourselves something which blows up in a suitably impressive manner when we do, OK?

      Hey, we gave you the Space Shuttle. It's built by contractors in all 50 states to ensure continued Congressional funding, it costs twice as much as a Mars probe every time it launches, and it doesn't just "blow up in a suitably impressive manner", it also kills people when it does!

      Now that the government's requirements are met, can we please do some interplanetary science? :)

  29. Hearken Ye Back by Aquitaine · · Score: 1

    That command was:

    win

  30. Obligatory Simpson's reference by sizzzzlerz · · Score: 1

    D'oh!

  31. Better, fast, cheaper - the reality by kilodelta · · Score: 3, Insightful

    NASA has been on this kick of doing quick, reduced cost and inexpensive projects for some time now. They really have no choice since congress will only give them funding for unmanned and low cost missions.

    So occasionally you get the stunning successes, E.G. the Mars rovers Spirit and Opportunity. Considering they were only supposed to last 90 sols and they're somewhere out to 1075 or more sols it means that the Steve Squyers is currently the start of NASA.

    But more likely you get the devastating failures.

    It's really sad that we blow a few billion a month on our little Iraq and Afghanistan ventures yet sciences take a back seat.

    1. Re:Better, fast, cheaper - the reality by JWW · · Score: 1

      Yeah, and Mars Global Surveyor was considered a stunning success too! This is an old spacecraft that was operating beyond its missions original lifespan. While its sad it can't continue its mission, it did achieve its original goals and then some.

      I say NASA should mark this as a success, take the lessons learned from this mission and build the next probe to send to Mars to take Global Surveyor's place, like was part of the plan anyway.

      I really hate seeing all the harping on NASA here when this was a REALLY successful mission.

    2. Re:Better, fast, cheaper - the reality by Anonymous Coward · · Score: 0

      The competitions also have to bear some of the "faster, better, cheaper" blame. Teams propose more and more science for the dollar, resulting in missions that are less and less realistic. These proposals let everyone deny the realities of budget cuts, and they win. The budgets balloon once the grown-ups get a look at the mission, and...I'm just depressing myself at this point.

      Squyres is undoubtedly the star, but I can't help but wonder if all the heavy vetting on MER was worthwhile. At the risk of heresy: If something lasts 10X beyond its success criteria, isn't that overbuilt?

      I know I'm riding the fence here, but I can't help it.

    3. Re:Better, fast, cheaper - the reality by Thomas+Shaddack · · Score: 1
      At the risk of heresy: If something lasts 10X beyond its success criteria, isn't that overbuilt?

      You spelled "properly engineered" wrong.

  32. The killer command was by Eradicator2k3 · · Score: 0

    POKE 59458,PEEK(59458)OR 32

    --
    Mr. T pitied this fool on 27 July 1992.
  33. Today in News... by Anonymous Coward · · Score: 0

    Fuk Li, manager of the Mars exploration program at JPL, was reported to say, "Fuk Mi..."

  34. By any chance... by Anonymous Coward · · Score: 0

    ...was is contolled by a Commodore PET?

    1. Re:By any chance... by Tablizer · · Score: 1

      ...was is contolled by a Commodore PET?

      I don't get the joke. Did PET have battery problems?

  35. Typical multiple-factor catastrophe by mattr · · Score: 4, Interesting

    There are an awful lot of posts here that disparage the people who have built and operated this system. To me it looked very much like the explanation for an aircraft accident. The easy failure modes are all known, so the really hard ones are left. In aircraft accidents, and it seems space accidents now too, a fatal result is generally the result of a number of seemingly disparate factors including system states, environmental state, and human impressions of what is going on.

    In one major aircraft accident I know a lot about, the (Airbus) jet crashed in part because it ended up being a tug of war between a human pilot and a robot autopilot that should have been disengaged, causing and up and down roller coaster ride. There were lots of other distracting things that were maybe wrong or maybe not, but a key part was the difficulty in knowing what state the machine was in.

    It was a similar situation with this accident, it seems, and though the misuse of metric units caused another recent accident it appears that these incidents have elements in common. They are also made more probable it strikes me by funding pressures and also in the way that operating these systems involves radical commands while the systems also lack enough power to be self-aware enough to preserve themselves.

    I am not going to do any more guessing because the people involved can probably figure it out themselves, and it seems that these combined factor accidents at least are not costing human lives, while they are adding to knowledge about how not to make the accident the next time.

    In that regard my hope is that some of the money being spent on Mars can be used to improve autonomous robotic systems to reduce accidents both on Mars and on Earth.

    1. Re:Typical multiple-factor catastrophe by PPH · · Score: 1
      It all depends on how one evaluates possible failures and deals with them. There are two schools of thought when it comes to this: One is that systems must be designed to deal with a certain combination of faults, regardless of the probability of occuring. The other attempts to address the probability of various combinations of faults, determine their effect on the overall system and deal only with those where the probability of a significant outcome exceeds some threshold. Both approaches have their problems. The latter analysis is difficult due to the complexities in dealing with the tail of a bell curve. Both have problems in handling hidden cause-effect relationships that system designers might miss.

      Example: Many years ago, the 747 systems designers debated over the elimination of certain backup engine ignition systems, arguing that the probability of the loss of all 4 engines and their respective generating systems was infinitesimal. The non-probabilistic school of though agreed, based upon careful design which had separated engine control systems physically, keeping a single event from affecting multiple engines. Then a 747 flew through a volcanic ash cloud in Alaska causing all engines to shut down. They were restarted with the backup systems that engineers had sought to eliminate.

      Interesting side note: At about this same time, the Air Force bomber command was lobbying for resources to update its fleet. They based their budget request upon the need to follow a nuclear missile strike against the Soviet Union (or whoever) with bombers to 'clean up' missed targets. When Boeing analysts crunched the numbers and included the effects of aircraft flying though the inevitable dust and smoke clouds, the probability of mission success didn't look so good anymore. The Air Force officials told the analysts to shut the hell up.

      --
      Have gnu, will travel.
    2. Re:Typical multiple-factor catastrophe by sjames · · Score: 1

      A big factor not being considered in this discussion is that the spacecraft was well past it's design lifetime. I don't know all of the details of the design, but it's quite likely that at one time the unaffected battery WOULD have been adequate to maintain contact and allow for corrective action.

      The loss really was a combination of many small things including the increasingly fragile state of the surveyor itself. This is not any sort of embarrasment for NASA (or shouldn't be) or any particular person's great failure. In fact, the fact that it lasted this long is quite an accomplishment.

  36. I'm sorry, Dave by nanosquid · · Score: 1

    Hal. Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error.

  37. VxWorks == no protected memory?!? by mosel-saar-ruwer · · Score: 1


    A modification to a spacecraft parameter, intended to update the High Gain Antenna's (HGA) pointing direction used for contingency operations, was mistakenly written to the incorrect spacecraft memory address in June 2006.

    I am well aware that you can do some nifty things in VxWorks, but at some point, shouldn't you be using an OS [like QNX, Integrity, or, gasp!, WinCE] that offers a little more memory protection?

    Especially if you're writing code in a language with pointers?

    1. Re:VxWorks == no protected memory?!? by Anonymous Coward · · Score: 0

      If you have a sensible ground system, you program a command specifically for writing the parameter you want to the correct memory location. Instead of using a predefined command to uplink the new HGA parameter, these fools decided to do a direct memory write using to a memory address.

      Posting anonymously so The Man doesn't get me...

  38. Command leaked out in special report by elgatozorbas · · Score: 1

    HLT

  39. Orbiting brick by Tekoneiric · · Score: 1

    It sucks when your hacking the firmware in your gadgets and brick them.

    --
    *It's not what you can do for the Dark Side but what the Dark Side can do for you!*
  40. So... everybody makes mistakes by guruevi · · Score: 1

    It is great they found out what caused the problem, but that isn't going to bring the craft back to life. And for all you people that comment here, they should've done this and that... get out of school, find a job in aeronautics and see if you can do it better, if you can they'll happily accept you and still, in your design, you'll make errors.

    I have worked for different large and small companies, everybody makes mistakes. I've seen all connections for a large datacenter going down because somebody made a mistake updating a single firewall, I've seen professional cooling solution designers install a triple-redundant system for said datacenter which went down completely because the datacenter didn't produce enough heat yet and one of the regulators had a wrong offset which caused the cooling to freeze.

    I have seen people insert commands in a mainframe which hung the whole thing and it took a few IBM engineers to start it up again.

    Yes, people make mistakes and hopefully they'll learn from it. We shouldn't be outraged about it or fire them, because those mistakes are basically paid-for education. If you can do it better, they'll hire you, if you can't, STFU. Space is large, and those devices are just like servers, a single mistake can bring them down. The problem is the ping time is 60 minutes, so before you even get a response from a system, it's an hour later. The sun is a powerful source of energy and in an hour you can get sun-burnt in summer on earth, try sunbathing on the planet Mercury or just in outer space, and see the difference after an hour, that's what we're talking about.

    --
    Custom electronics and digital signage for your business: www.evcircuits.com
  41. Another slashdot, another troll by BlueParrot · · Score: 1

    I am pointing this out simply because of completeness. I normally ignore trolls, but this matter is sufficiently important to warrant a propper response. Before you start dismissing the worlds scientists as incompetent, youc ould at least read the Wikipedia articles on the matter before assuming the vast majority of the scientific comunity and every meterological institution on earth are incompetent enough that they all fail to MEASURE the solar irradiance. Global warming has been vigorously discussed in the scientific comunity since the 80ies, weather forecasts have been arround since god knows when. We have good records of how much energy the sun has been putting out ( in many ways better than the temperature record ). Here, have a look: http://en.wikipedia.org/wiki/Image:Solar-cycle-dat a.png That is the solar irradiance over the last 35 years or so. Now don't come and tell me about a time lag, because that graph stays fairly constant yet rate of global warming has been accelerating at a steady rate ( even when the solar radiation has been on the decline of a cycle ). Now try to explain how the planet's temperature can not only increase, but doing so at a steadily icnreasing rate, while the solar irradiance remain constant or even while it decreases. Oh, and just in case you are going to claim it reduces the CO2 content in the oceans by heating them... http://en.wikipedia.org/wiki/Ocean_acidification Furthermore, explain why the following shows a steady increase in perfect correlation to the rate of fossil fuel consumption and deforestation, yet doesn't show a single sudden peak at the dates of major volcanic erruptions: http://en.wikipedia.org/wiki/Image:Mauna_Loa_Carbo n_Dioxide.png Furthermore, do explain why the following graph shows a fairly allright correlation between temperature and solar activity while CO2 remained fairly constant, only for that correlation to break down completely once CO2 starts to really shoot off: http://en.wikipedia.org/wiki/Image:Temp-sunspot-co 2.svg If you could also explain why solar variation which should allegedly affect the earth and mars coudl cause one of them to start heating up at a different time and why some of the solar system's planets and moons have even experienced cooling during the same period that would be nice too. It would also be interesting to know why we would miss so heavily on CO2's potential as a greenhouse gas given that the absrobtion spectrum of CO2 is known to several significant figures of accurace, and the spectrum of radiation emitted by the earth has been carefully measured by sattelites in orbit.

    1. Re:Another slashdot, another troll by Ace905 · · Score: 1

      You sure have a lot of evidence to argue against such a cut and dry subject.

      Mods put the parent back up, its obvious this has been a discussion from the first second. Discussion != troll.

      --

      Ace
  42. Re:wrong parameter? No, wrong command! by chris_sawtell · · Score: 1

    /sudo shutdown -h now
    bash: /sudo: No such file or directory

    That would not cause any problems whatsoever.

  43. Original post not a troll, responses are. by Ace905 · · Score: 1

    Wow, I take a commonly discussed 'question' about global warming - reference it. As if it, you know, is discussed by people - and I'm called a Troll.

    All of your links to other web sites appear to me, to be Trolling. Am I the scientists that are debating the causes of global warming? No. So I'm not going to look at your chosen data sets and do the math - I'm not qualified to. Is global warming a linear process? No, and all of the scientists agree on that. Until someone wants to prove it's caused by CO2 - then the data is linear running along with the linear correlation between food production and global warming. Maybe producing food is causing global warming.

    Why don't the MODS here think for themselves? I know I do when I'm a moderator. You should lose your account just for using 'poster is a troll' in the subject of your message. I'm not encouraging you to talk to me at all, believe me. I'm saying it's possible that earth and mars atmospheres and weather have some connection to each other. Maybe, man-made global warming triggers the onset of a hospitable environment on mars.

    There I said it again, ban me.

    --

    Ace