Space Shuttle Software: Not For Hacks
Jeff Evarts writes: "
This article
in Fast Company talks about
the process the Shuttle Group uses to make software. At first it
seems too predictable: a very cool project but no hacks, no
pizza-and-coke all-nighters, etc. Then, however, it goes on to talk about why: They have an informed customer, they talk to that
customer until they have a very clear idea of what is wanted,
they have a budget focused on prevention, and they focus on
fixing the process and not blaming the individual."
As someone who's done more than his share of late-nighters, it was an interesting view into the mission-critical environment. Maybe there are a few software firms out there that would rather spend some of their money on better processes rather than technical support engineers. Maybe a little more market research and a little less marketing, too. A good read."
These guys are "pretty thorough" the way Vlad the Impaler was "a little unbalanced." Still, you have to wonder how they can claim single-digit errors among thousands of lines of code, but I guess the proof is in the rocket-powered pudding. And lucky for them, their target platform was recently upgraded.
and how slowly they are being developed? I don't mean that it's a bad thing -- it's good that Shuttle program allows them to do it at reasonable pace and with reasonable requirements, but if everyone else wasn't under constant pressure, and if everyone's else software wasn't a victim of feature bloat, dealing with poorly documented and even worse implemented protocols, and never-ending stream of bullshit coming from the management, everyone else would write robust software, too. Well, not really everyone -- some "programmers" wouldn't be able to do anything because they have no skill, no education or are plain dumb, but reasonably geeky and educated programmer can pull something like that in ideal conditions -- and those guys _are_ working in ideal conditions.
Contrary to the popular belief, there indeed is no God.
I worked on some mission-critical/life-critical stuff about 2 years ago. It was aircraft related, and since it was basically carrying the data which made the plane fly it was critical by any definition. The processes we followed was absolutely document driven. User specs were examined, questions asked and the user asked to add definition and clarification for several iterations of the document. Then the software requirements etcetcetc were followed, ech document with quite a bit of iteration. Eventually we found that typically documentation and design would take 50% of the project. Testing would take about 30 to 35%, and the actual implementation hardly took any time at all. Now in the commercial world, I find that the process is VASTLY different. Implementation has started shortly after user specs have hit the desk, before design or documentation has begun. As a result, the system we currently have is very patchy in places. Its mission is a lot less critical, but the bugs slow us down tremendously. The bugs are due to the process. The process is requirements driven, not documentation driven. But it seems that the current system I'm working on has about the same complexity as that I used to work on. Only even though we are supposed to be pushing it all out the door faster, the bugs are slowing us to the point where we have approximately the same rate of progress as the mission-critical project!! Lesson: If you do it by the documentation, you will push it out faster and cleaner (and more bugfree!!!)
It's not just space shuttle code that needs extreme reliabilty. The embedded systems in civilian aircraft are not interrupt-driven because of the reliabilty issues associated with interrupt-driven code - interrupts make the software to hard to debug thoroughly (becuase there are so many combiniations and timings of input signals to test), make faults difficult to replicate and have the potential to go wrong on a spurious set of input signals. This sort of problem doesn't really matter too much in a home or corporate computing environment, but it would be a major disaster if a plane carrying a few hundred people were to crash into a city with a population of a few million, just because of a software error. These things need 100.00 per cent reliability, so obviously software hacks are frowned upon.
Here's NASA's own history on bugs in that software:
- So, despite the well-planned and well-manned verification effort, software bugs exist. Part of the reason is the complexity of the real-time system, and part is because, as one IBM manager said, "we didn't do it up front enough," the "it" being thinking through the program logic and verification schemes. Aware that effort expended at the early part of a project on quality would be much cheaper and simpler than trying to put quality in toward the end, IBM and NASA tried to do much more at the beginning of the Shuttle software development than in any previous effort, but it still was not enough to ensure perfection.
Read the NASA history. They had a 200-page known-bug list in 1983, although they did fix most of them during the long downtime after the Challenger explosion.The Shuttle's user interface is awful. The thing has hex keyboards!. Some astronaut comments include
This project should not be held up as a great example of software engineering. Even NASA doesn't think it is.
I work in the Flight Software (FSW) Verification group in Houston.
The shuttle FSW code is written in something called HAL/S. This stands for High-level Assembly Language / Shuttle. The language was designed to read like mathematics is written. Superscripts like vector bars are actually displayed on the line above, subscripts like indices are displayed on the line below. Vectors and matrices can be operated on naturally, without looping.
We are the only ones with a compiler, because we wrote it ourselves.
Here's a sample:
EXAMPLE:
PROGRAM;
DECLARE A(12) SCALAR;
DECLAREB ARRAY(12) INTEGER INITIAL(0);
DECLARE SCALE ARRAY(3) CONSTANT(0.013, 0.026,0.013);
DECLARE BIAS SCALAR INITIAL(57.296);
DO FOR TEMPORARY I = 0 TO 9 BY 3;
DO FOR TEMPORARY J = 1 TO 3;
A =B SCALE + BIAS;
I+J I+J J
END;
END;
CLOSE EXAMPLE;
I couldn't get the subscripts to line up, but you get the idea.
- "Sweet merciful crap!" Homer J. Simpson
1 Space Shuttle Endeavor
1 Launch Pad
1 Houston Mission Control Station
4 Astronauts
Shine on, you crazy diamond.
I can almost hear the moans from the pizza-and-coke crowd whem they read this: "Where's the fun? Where's the creativity?". But they're under the mistaken assumption that putting lines of code into the editor is the only fun thing about developing software.
IMHO, software development is full of fun activities. What about analysis and design? In my experience, that's where the creativity really comes into play. Just talking to the customer, understanding the problem and making a working design is really difficult, and hence rewarding when you pull it off.
And what about the process itself? Software development is a young dicipline, where individuals and small groups really can make an impact. Nobody really knows how to make good software. Maybe you'll be the one to find out? As the man says, in the shuttle software group, people use their creativity on improving the process.
And last, but not least, I bet those guys have a really good feeling when they talk to the customer after delivery. Not like some people I know, who just hide. ;)
If you can't see the fun of these other activities, maybe you shouldn't be working in this field...
A)bort, R)etry or S)elf-destruct?
I'm working on a large NASA project now. I have determined that the purpose of this project is not to produce a working software system, but rather to produce a wall full of loose-leaf binders of incomprehensible documentation that no one will ever refer to again.
The process says we must have code reviews - great! But instead of being an analysis of the logic of my code, it turns into a check against the local code formatting standards - "You can't declare two variables with one declaration, use int a; int b; instead of int a,b;" (yes, that's an actual standard around here) instead of "Hey, if foo is true and bar is negative, you're going to dereference a garbage pointer here!"
The forms are observed, but the meaning is forgotten, like Christians going to church on Sunday then cutting people off and flipping them the bird on the drive home.
"Process" won't save us. Which doesn't mean that a certain amount of it can't help, but there is no silver bullet.
Tom Swiss | the infamous tms | my blog
You cannot wash away blood with blood
I happen to work just down the hall from the guys who maintain and upgrade the shuttle Flight Software (FSW), and I can tell you they have a rigorous design, inspection, and test sequence that they go through before they fly new or modified code. The story around here (which I have no reason to doubt) is that the FSW team was one of the first SEI level-5 certified shops in the nation.
I can also tell you that NASA avoids having to make unnecessary changes to the FSW. For example, the new "glass cockpit" recently discussed here on Slashdot: when these upgrades were designed, they chose to design the interface to the new display modules to exactly mimic the interface to the old intruments. In other words, they are true plug-and-play replacements; one significant reason for this was so the flight software didn't have to be modified.
Likewise, people often ask why the shuttle continues to use such antiquated General Purpose Computers: slow, 16-bit machines designed back in the seventies. There are many reasons, but a big reason is that new hardware would almost certainly require massive changes to the flight software. And rewriting and recertifying all that software would be a huge task. The current FSW works reliably; if it ain't broke...
Huzzah! As I type, we just launched Atlantis. Go, baby, go!
--Jim
Some of my most succesful programs (read, they actually worked or there abouts) came about because I was in a funny mood and decided to actually plan it out. From what I hear about in the real world, some (but by all means not all or even most) programmers look down on clients just because they don't know much about programming. They assume that just because they have a certain expertise over others that they somehow know more than them in general.
The good thing about the way software is written here is that the requirements are written down and sorted out before they even do the planning. How many prgrammers, groups, firms etc. can say that. I will admit, though, that a major problem is changing requirements. Something that just happen in the same way for NASA. It might just be better if people decided to wait a bit before jumping in to the programming. They'll save themselves more time and money in the long run.
I never quite understand why it is an act of macho bravado to work all night and live off pizza. It indicates two things 1) A badly run project and 2) poor maintainability in the code.
In one of my previous incarnations I worked on display systems for Air Traffic Control, where the quality level was also very high, where the performance requirements were exacting and the specifications precise.
Some would think that this means simple and boring... Of course not. Having to display a track from reception at the Radar to the display in 1/10th of a second isn't easy by any stretch of the imagination, and to do it so it works 100% of the time means you have to understand the problem properly rather than coding and patching.
If only more projects worked like that then there would be a lot less bugs in the world.
An Eye for an Eye will make the whole world blind - Gandhi