Slashdot Mirror


Mars Failures: Bad luck or Bad Programs?

HobbySpacer writes "One European mission is on its way to Mars and two US landers will soon launch. They face tough odds for success. Of 34 Mars missions since the start of the space age, 20 have failed. This article looks at why Mars is so hard. It reports, for example, that a former manager on the Mars Pathfinder project believes that "Software is the number one problem". He says that since the mid-70s "software hasnâ(TM)t gone anywhere. There isnâ(TM)t a project that gets their software done."" Or maybe it has to do with being an incredible distance, on an inhumane climate. Either or.

12 of 389 comments (clear)

  1. The software motto... by Xentax · · Score: 4, Insightful

    ...is "garbage in, garbage out" right? One of the mottos anyway.

    If you underestimate the resources you need to do software right, of course you'll have problems -- either getting it done on time, or getting the quality to the level it needs to be (or both).

    That problem is hardly unique to the space programs. And of course, it would be a little tricky trying to upload a software patch to a hunk of solar-powered metal a few million miles away.

    I wonder how much NASA et al. really tap the resources they should be tapping -- I mean, there ARE areas of industry where mission-critical or life-critical software has been developed and deployed for some time now. Maybe it's just a question of getting the right kind of experience in-house...

    Xentax

    --
    You shouldn't verb words.
  2. Re:We landed on the moon with 512 bytes of RAM by Niles_Stonne · · Score: 5, Insightful

    I think that is part of the difficulty...

    With 512 BYTES of ram you can literally look at the entire contents. You can be aware of every single bit on the system.

    Now, where we have gigabytes of ram, and even more other storage it is simply impossible to sort through every bit. This errors roll in.

    I'm not sure what to do about it, but I see why there is difficulty.

    --
    Sticks and Stones may break my bones, but copyright will always protect me.
  3. Rocket Science is hard by fname · · Score: 5, Insightful

    Well, there are a lot of reasons thing go wrong. Landing a spacecraft on a different planet is inherently difficult, and when you read about how MER-1 and MER-2 will land, it's amazing that they can work at all.

    The flip side is that. After Mars Ovserver spectatularly failed in 1993 ("Martians"), NASA started to go with faster, cheaper, better. The idea was, instead of a single $1 billion mission every 5 years with with 90% chance of success, why not 2 $200 million missions every two years, with an 80% chance of success. Everyone loves this idea when it works (Pathfinder), but when a cheap spacecraft fails, the public doesn't care if it cost $10 million or $10 billion, all we know is that NASA is wasting money.

    So, the answer is, NASA has hit some bad luck. But the idea of faster, cheaper, better is ultimately a cost-effective one, so if we can solve these software problems (I mean, can't someone independently design a landing simulator?), and NASA can get 80-90%, we'll be getting a lot more science for the dollar. But NASA-haters will always have some missions to point to as a "waste" of money, and try to cut funding as it's mismanaged; other space junkies will insst that anything under 100% is unacceptble, and costs should double to move from 80% to 100%. I don't which attitude is more damaging.

    NASA has a "good" track record since Observer, unfortunately, the highest profile missions have generally failed. If MER-1, and MER-2 are both succesful, and SIRTF flies this summer, then everyone should get off of NASA unmanned program's back for a while.

  4. Tough assignment... by Kjella · · Score: 4, Insightful

    Seriously. Space is tough, as the US has experienced with both Challenger and Columbia, and those should only reach orbit. Going even further away in space is tougher. So much can go wrong, and so little can be done to correct it. Certainly a few blunders like the feet-to-meter bug is huge, but they try. I'm not so sure any private corporation that had been asked to do the same would fare any better. They are pushing limits, where you fail and (hopefully) learn from your mistakes.

    Which is why we should continue to try. Giving up, saying "space travel is just too costly and risky" is a big cop-out. If we could send people to a different stellar object (the moon) in 1969 with the equivalent of a pocket calculator but not now, what does that say of our technology? Or sociology? Sure you could take the narrow-minded approach and say "and what does that bring us? The ability to jump from rock to rock in our solar system?" If so, you might as well ask why people decided to go to the poles (just ice) or whatever. You're still missing the point.

    Kjella

    --
    Live today, because you never know what tomorrow brings
  5. NASA Management Practices and Quality of Software by ChuckDivine · · Score: 5, Insightful

    In my years at NASA Goddard I saw a dysfunctional management operate in ignorance of reality.

    There was much praise of the employee who "went the extra mile", "put in long hours" and "served the customer" (that applied to contractor employees). There was also very little thought paid to the consequences of those practices.

    What's the first thing to go when you're tired? It's not your body -- it's your mind. That's right -- if you're staying at work until you're feeling tired, you're making mistakes that need to be corrected later. The tireder you are, the more mistakes. The tireder you are, the less you can actually do.

    I witnessed people who wore their exhaustion as a badge of honor. And, when they got into management, insist that others emulate their bad example. The result that I saw was people who should have been kept out of management becoming increasingly dominant. This was accentuated by the "faster, better, cheaper" ideology promulgated by former NASA administrator Goldin. This ideology was used to get rid of more experienced (and thus costly) people who were aware of the consequences of trying to squeeze more work out of fewer people.

    It could take a long time for NASA to recover from this culture. The failure of projects in the past few years, the crash of Columbia could be turning points -- or they could be used by incompetents to justify even more dysfunctional behavior.

    --
    "Beer is proof God loves us and wants us to be happy." -- B. Franklin
  6. Programmers by Cujo · · Score: 4, Insightful

    Yes, programmers have erred. To err is human, to allow errors to propagate into mission failures is a failure of systems engineering, and I think that is where the real blame lies. A lot of the problem is thatspacecraft systems engineers often have a very amateurish grasp of software, if any at all.

    For example, on Mars Climate orbiter, a junior programmer failed to properly understand the requirements. However, systems failed to:

    1. Properly identify the thruster force data as a critical interface.
    2. Failed to demand proper, thorough and timely verification ON BOTH SIDES OF THE INTERFACE.
    3. Failed to make sure the requirements were properly understood by the implementers.
    4. Ignored or missed prima-facie evidence that the interface wasn't working (closely related to 1).
    --

    Helium balloons want to be free.

  7. It's really quite simple by foxtrot · · Score: 5, Insightful

    Space Exploration isn't easy.

    Look at the Space Shuttle. The space shuttle has never had a catastrophic computer failure-- but every line of code on that truck has survived review by a group of programmers. They've examined it, line by line, multiple times, in order to ensure that it's exactly right, because the cost of failure is 7 astronauts and a multimillion dollar orbiter.

    The new Mars programs, however, are part of the streamlined "do it on the cheap" NASA. NASA put the Mars Rover down using mostly off-the-shelf and open-source software and a small amount of home-brew stuff. No matter how good open source software gets, it still hasn't undergone the level of review that the Space Shuttle code has seen. No matter how popular an off-the-shelf package is, it's not cost-effective for the manufacturer to give it that sort of treatment. NASA can't afford to do that level of code review because that costs them the ability to do some other program.

    NASA is simply trying to do more with less in the unmanned launches, and the cost of that is we need to expect some failures. These failures are unfortunately very visible...

    -JDF

  8. Disagreeing with Hemos by AntiFreeze · · Score: 4, Insightful
    Quoth Hemos: Or maybe it has to do with being an incredible distance, on an inhumane climate. Either or.

    I have to really disagree with this. NASA is used to dealing with alien climates and terrain and astronomical distances. NASA is also used to dealing with problems. They have some of the best problem solvers out there, and when something goes wrong, then tend to pinpoint why. When NASA says A, B, and C are the causes of failure, I believe them. When NASA cannot figure out why something went wrong, I worry.

    What I'm trying to say is, distance and inhuman conditions shouldn't have that much of an affect on how well a probe works. We built Voyagers I and II, didn't we? They worked even better than expected. And they encountered climates and conditions which make Mars look easy.

    NASA has dealt with so many varying circumstances and climates over the years, and been so blunt about their mistakes, I find it hard to believe that they would blame the failures of an entire class of missions on something "easy." And yes, blaiming failures on software is an easy way out, how many times have you heard someone say "Oh! It must be the software!" when something doesn't go as expected?

    Now, I know this guy doesn't speak for NASA as a whole, but as a NASA trained administrator, and the head of some very large projects, I'm willing to take his opinions at face value. If he says it looks like software has really been a cause of failure, who am I to laugh at his expertise and belittle his explanations? I might not like his explanation, but I buy it.

    --

    ---
    "Of course, that's just my opinion. I could be wrong." --Dennis Miller

  9. Re:I disagree, Mr. Editor by Hal-9001 · · Score: 4, Insightful

    Software errors didn't just cause problems with the Mars landers--they caused a total loss of the spacecraft. We are just lucky that we made those errors before attempting a manned mission to Mars.

    Regarding the losses of the two space shuttles, it is hardly fair to compare hardware failure to software failure. The physical behavior of a mechanical system is not deterministic--stress something hard enough and it will break, and it is impossible to predict when a particular part will fail in advance. You can do lots of testing to get a sense of when, on average, a part will fail under certain conditions, and you can design and engineer as best as possible for something to work even if a part fails, but parts will fail and sometimes hardware failures are irrecoverable.

    Software, on the other hand, is completely deterministic. With error-checking and proper testing, it is possible, at least in principle, to write software that will not fail. Software failure that results in loss of life is simply inexcusable.

    --
    "It take 9 months to bear a child, no matter how many women you assign to the job."
  10. Software is Hard by Teckla · · Score: 4, Insightful

    Most PHB's haven't figured it out yet: SOFTWARE IS HARD. It's amazingly complicated. It's also notoriously hard to come up with realistic estimates.

    PHB's also haven't figured out that developers aren't interchangeable widgets. If you know C, it doesn't mean you'll be immediately productive in Korn shell scripting, and vice-versa.

    PHB's also haven't figured out that experience is key. There are exceptions, but generally speaking, a young hotshot isn't going to be as productive as an experienced professional. Sure, the young hotshot might get v1.0 done first, but it'll be buggy, unreliable, unscalable, hard to maintain, etc.

    The "problem with software" is almost entirely a management issue, imho.

    -Teckla

  11. Re:We landed on the moon with 512 bytes of RAM by EvilTwinSkippy · · Score: 4, Insightful
    means that your desktop probably (Anyone want to do the math?) has more computing power than all the deep space explorers ever launched, combined.

    Yes, but can your computer recover from a triple memory failure? Can you rewire your computer remotely to fall back on a redundent system? Frankly I keep the covers off my case to keep my CPU from overheating.

    State of the art is not always measured in Gigahertz.

    --
    "Learning is not compulsory... neither is survival."
    --Dr.W.Edwards Deming
  12. Re:I disagree, Mr. Editor by AKnightCowboy · · Score: 4, Insightful
    Software is NEVER deterministic in an operating environment. Just because you can put it on a bench and test the snot out of it does not certify it's behavior in the real world. I have written many programs that work perfectly in testing, only to have a user punch in an unexpected value and bring things to a crashing halt.

    That's just bunk. As a programmer writing software for spacecraft you must be able to anticipate every possible value and account for it. Every condition should be able to be gracefully handled by an error checking routine. There is zero room for failure. If that means it takes 20 years to write, test, rewrite, and retest the perfect program, then so be it. When human life is involved price is not an object. (well, within reason of course since there's a dollar value on human life in the space program, but the negative publicity value is astronomically more than the dollar value of the loss of human life.)