Mars Failures: Bad luck or Bad Programs?
HobbySpacer writes "One European mission is on its way to Mars and two US landers will soon launch. They face tough odds for success. Of 34 Mars missions since the start of the space age, 20 have failed. This article looks at why Mars is so hard. It reports, for example, that a former manager on the Mars Pathfinder project believes that "Software is the number one problem". He says that since the mid-70s "software hasnâ(TM)t gone anywhere. There isnâ(TM)t a project that gets their software done."" Or maybe it has to do with being an incredible distance, on an inhumane climate. Either or.
Make it simple. The original software used (like in the moonshots) was Very simple control loops... no OS, no overhead.. just a simple program doing a VERY simple job over and over. Read stick, fire retros as appropriate.
Also, solid state, however big and bulky, isn't susceptible to the radiation that many mega-tiny chips are... by writing (and testing) the software in the simplest manner, and building a VERY specific piece of hardware out of solid state components.. and lots of unit testing... you're more likely to get there.
For the same reason the 486 was the only space-rated intel processor for quite a long time (not sure if thats still true).
I'd rather go on "slower" simpler hardware that does a very specific job... and you can repair with a soldering iron.
meh
Of course, the stupid metric conversion problem only accounted for one of the failures, but it's indicitive of a larger problem. There's obviously a shortcoming in quality control and verification if such an obvious mistake could be overlooked. What less obvious problems are we missing all together? Most of the failures occured during the orbital entry phase, during which time they shut off the transmitter, and therefore don't have up to the second data on the reason for the failure. Sure, they likely wouldn't have much of an opportunity to save the mission, but they would have a good chance at figuring out what the problem actually was so it could be fixed the next time around. Instead, we're left to guess. Cost concerns are always mentioned as the reason, but how much have we "saved" really? An extra million $$ to keep the transmitter on would probably have paid for itself a long time ago.
-Restil
Play with my webcams and lights here
same as the Volkswagen Beetle (old versions) is still deemed the worlds most reliable car, no water,engine management systems,injections,turbos,massive wiring looms air con,etc etc ,
so basic that the error rate is significantly reduced to a point that identifying and fixing errors are trivial without the need to plug a single computer in or sort through 2miles of cables looking for a single break
i digress technology makes life harder not easier
cheers
Thing is, space exploration isn't done with *current* technology. The computing technology used in a lot of aerospace applications is 20-30 years old. There are a number of reasons for this, but the ones I've heard of are:
1. The projects are long-term, and have been in development for a lot of years. Especially when it comes to government projects. They can't just up and switch to the latest tech whenever it comes around, otherwise it will end up like DNF and never see the light of day.
2. The engineers don't trust the latest and greatest. The technology isn't considered mature enough. All the bugs have been worked out in the older tech, so it's more robust, the engineers are more familiar with it, and more often than not, manufacturers have shunk and simplified the designs significantly since introduction.
It's more likely that you'd find a 8086 processor in the space shuttle than a Pentium 4 unless someone brings a laptop aboard. It wasn't all that long ago that NASA put adds on websites and geek magazines appealing for old 8086 processors for spare parts. I haven't heard anything since, so either they found a supplier, or they're too busy piecing together the Columbia.
Programmers get paid to do their job to the best of their ability, just like any other employee.
When not even the best programmers can get it right it might be time to start thinking that there's a hard problem in there, docking pay isn't the way to fix it.
Perhaps one of the reasons that the software isn't getting done on time is that much of the system is written from the ground up. Perhaps it would be better to design a common, open source spacecraft platform. So many of the basic tasks that spacecraft software must perform are essentially identical. The main differences for critical spacecraft systems would be the hardware. If a general purpose OS and spacecraft toolkit were designed, then the main things that would have to written from scratch for different missions would be drivers for the hardware and various configuration settings.
I'm not sure how suitable RT Linux would be from a technical/performance standpoint, but having a highly portable open source OS would give a flexibility and availablility that would make adoption much easier.
Theoretically, all programs have latent bugs, unless they are too simple to do much.
I've seen the code for some MAJOR blue chip companies and I really do wonder how these people stay in business with the rubbish that they put out. For example some of code drops from our clients don't even compile! The reason for all the crap is that it's very easy to cut corners without it being very obvious immediately. Typically, the first thing that gets stopped when things ar getting tight (either time or money) is documentation, quickly followed by testing. Next it's individual features, removed from the requirements 1 by 1.
Since software engineering is still a 'black art' as far as most traditional engineers and project managers are concerned, there isn't the real intuition/understanding of when things are starting to look bad. Without looking at code AND knowing something about it, you won't stand a chance 'intuiting' whether or not things are going well.
Writing software is an expensive business in both time and money. It's also a very young business without the same 'discipline of implementation' as other areas. Until the process matures and people realise that doing it on the cheap gives you cheap software, things aren't going to change and Mars probes are going to continue to produce craters.
Why wait 100 years? I'm ashamed of most programmers *TODAY*. Stupid three week IT majors with a background in ASP.NET or some shit...
Used to be comp.sci was about comp.sci not staying upto date with the latest code monkey script language.
There is still a reason why the majority of *real* work is coded in C. Its a simple language that gets things done.
The dot.com busta VB script kiddies [e.g. three week IT grads] come and go. True comp.sci'ers stick along better.
Tom
Someday, I'll have a real sig.
Yeah but... The Apollo 11 LEM computer crashed several times during the landing.
One line blog. I hear that they're called Twitters now.
Part of it was the fact they had absolute geniouses working on the problem. Think of it, they designed a system in the late 1970's, tested it on the ground, and had it successfully fly for 20 years without a major "oopsie". Or rather, if a major "Oopsie" happened, they had ways around, over, or through it. They spent YEARS developing the flight software for the Shuttle.
Software CAN be done right. It just has to be a priority.
"Learning is not compulsory... neither is survival."
--Dr.W.Edwards Deming
With the moon missions, there were manned craft, and so every line of code had to be checked and rechecked--and hundreds of guys were on the ground watching everything that happened, twenty-four seven, until the astronauts were safely back on the ground.
Now, windows for a Mars launch come much less frequently. There might be a temptation to rush some of the QA and just cross fingers. Speed of light delay means that NASA can't intervene in most situations--problems are resolved one way or another before anyone on the ground even hears about them.
Moon launch hardware had to last for a few days in space--stressful, busy, lengthy days, but a few days nonetheless. We expect Mars craft to spend months in hard vacuum and harder radiation, and then land successfully without human help, on a planet with higher gravity than the moon...
Just some thoughts. The parent is right--Mars missions are hard because it's far away, and you have to travel through space to get there.
~Idarubicin
According to this page only 3 of 26 missions to Venus have been total failures. When you consider that Venus is a much more hostile environment than Mars then you have to conclude that either Mars is just plain unlucky or mission planners are getting something wrong.
-- "Sponges grow in the ocean. I wonder how much deeper the ocean would be if that didn't happen."
However much you may disagree, simple Newtonian dynamics and is all it takes to get a space probe from A to B in the vast majority of cases. It's a well-understood problem domain.
Dragging in stuff like chaotic long-term behavior of n-body systems, while an interesting fact in itself and worthy of study, has very little to do with the engineering problem at hand. Ephemerides for all major bodies in the solar system for the coming hundreds of years are known up to uncanny accuracies (metres) and plotting the trajectory of a probe is simply a matter of numerical integration, to put it bluntly.
Now when someone mixes up metres and feet things go awry. But don't claim stuff like this could have been prevented by hiring more mathematicians. It's simply a case of human error, something that happens in the Real World.
Having a high IQ, my friend, is no excuse for making stupid claims about things you don't know anything about.
Perhaps it explains why there should be a manned mission. The main problem with exploring the unknown is that there are a lot of unknown variables out their and computer technology is not always adaptable for all unknown variables. This is why there is software failure and lost contact. Manned missions give some extra control of the mission and gives the ability to improvise new solutions for unknown problems. Like Fixing a part that is broken by using an other material that is available. Or realigning so it will maintain contact. The big problem with mars is that it takes 20 minutes to send a signal for it do do something different remotely. A human who is well trained will be able to make these decisions and control the new instructions in far less time (within seconds). If it wasn't so expensive to do a Manned mission to mars. I am sure manned missions would have a much higher success rate.
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
What, the programming teams worked in a vacuum to each other? You're telling me that the products of their efforts didn't communicate with each other? The programmers should have noticed and/or documented properly. Personally, if I were a programmer on this project, I would have been VERY surprised if we weren't using ISO units, and I would have questioned it strongly. Anybody who's taken any physics courses knows that even in the US, people use ISO units. It was not a software problem - the software obviously did what it was told to do.
GIGO.
Heh, I was a part of a space failure myself. We were using pretty much off-the-shelf equipment, but it passed NASA spec shake and thermal testing. What probably did it in was radiation...in low earth orbit we figured there wouldn't be much risk of radiation problems.
If we were to do it again, we probably would have had some kind of radiation-resistant reset system, because building the whole thing in rad-hard would be very expensive (our budget was $1500 plus donated equipment!) But having a few rad-hard devices to reset the box in case of a crash would probably have been affordable.
About 100 amateur radio operators contacted our payload, and relayed their GPS coordinates to others using amateur packet radio. At the same time, the GPS unit on board the Spartan satellite transmitted its position to listeners on the ground as well. But had it not crashed after about 17 hours, it is possible that several hundred other amateur radio operators would have used it.
NASA software engineering is actually quite remarkable -- at least for the shuttle program. I read a paper once about how they actually break many of the paradigms of writing code that so many programmers are accustomed to so that the code is absolutely perfect. Deadlines are met well ahead of schedule and nobody works late. They're not allowed to work late, because the pressure or fatigue could cause an error to occur. The code is personally signed-off by the chief software engineer that it won't hurt anyone. Every line of code is fully documented. The code is virtually written twice by two separate teams. This article actually details some of it great length: They Write the Right Stuff. I don't disagree with you that maybe the way they write software needs to be reviewed, but it seems that they already go a long way to ensure that happens.
I haven't read the whole site in a while, but IIRC, it describes the typical problems with software: underscoping the problem (in the 60s, most people assumed that the computer hardware development would be the majority of the effort), code bloat (the computer required much more memory than originally planned), buggy production code, schedule slips, problems caused by cruft. When the project started, they just waded right in to coding with few tools and little awareness of the need for proper engineering practice.
This particular case was made more difficult by the program loading procedure: the program ROM was made one bit at a time by hand threading magnetic cores on to tiny wires then embedding it in a solid block of epoxy. The write-compile-debug cycle could be weeks. If bugs were discovered late in the schedule, the astronauts just had to work around them. The software devleopers did have mainframe-based simulators for development, though.
With the gigabytes of space available for today's software, I'm surprised that any modern space projects get finished at all.
Just look at the rate of failure for early moon missions
It's a hard probelm to send a probe to the Moon or Mars. landing and aerocapture at Mars are dicy things.
There are 10 types of people in this world, those who can count in binary and those who can't.
Here's the problem as I see it: As software and hardware have become more complicated, there's a need to increase testing. Instead, in order to meet NASA's new budgetary requirements, funding in general, and specifically for testing, has gone down. So, it's not possible to completely test all of the hardware AND software, as it should be.
As an analogy: If we were talking about commercial airliners; these probes would never be certified to fly.
I'm not putting all the blame on NASA here; although, it is apparent to me that they need to start reporting what it's actually going to cost. Having said that, Congress is equally complicit; they need to come to the realization that it's expensive to do work outside the atmosphere (they apparently don't understand this...)
Your comment about manned vs unmanned makes absolutely no sense. One could buy a hundred or a thousand unmanned planetary missions for what a single manned mission would cost, and there would still be no guarantee that the manned mission would succeed. Yet we could easily afford to have many of those unmanned missions fail.
I say that the manned space program is one of the major contributing factors to the poor Mars success rate. More specifically, the enormous sums of money that the Shuttle and ISS have siphoned from the far more productive unmanned planetary program and flushed down the drain.
Did you know that a law allowing the use of the Metric system in the United States was signed into law by President Johnson? Andrew Johnson!
How ya like dat?
A quote from a recent Newspaper article:Language is important. The numbers say it, the metrics say it, the successful projects say it, even some
The rest of us will just have to settle for actually doing this work, satellites, laser eye surgery systems, aircraft, subs etc instead of making yet another kludgy VB system to sell the latest in sportswear or whatever.
Zoe Brain - Rocket Scientist