Space Shuttle Software: Not For Hacks
Jeff Evarts writes: "
This article
in Fast Company talks about
the process the Shuttle Group uses to make software. At first it
seems too predictable: a very cool project but no hacks, no
pizza-and-coke all-nighters, etc. Then, however, it goes on to talk about why: They have an informed customer, they talk to that
customer until they have a very clear idea of what is wanted,
they have a budget focused on prevention, and they focus on
fixing the process and not blaming the individual."
As someone who's done more than his share of late-nighters, it was an interesting view into the mission-critical environment. Maybe there are a few software firms out there that would rather spend some of their money on better processes rather than technical support engineers. Maybe a little more market research and a little less marketing, too. A good read."
These guys are "pretty thorough" the way Vlad the Impaler was "a little unbalanced." Still, you have to wonder how they can claim single-digit errors among thousands of lines of code, but I guess the proof is in the rocket-powered pudding. And lucky for them, their target platform was recently upgraded.
While the caliber of this grooup seems unbeatable, it's too bad NASA doesn't apply this rigid development model to its unmanned space craft. -- I still don't understand how a difference in units (english vs. metric) managed to go undetected!
The only thing I could think of after hearing that such an error caused mulitmillion dollar craft to crash was IDIOTS - any scientist should be using SI units today.
--Aaron Greenberg
Most likely not - but automatic verification of programs using logical constructs is a big growth area.
You can test a program with all possible inputs, and have a clean run. But this does not mean the program is 100% reliable. You must prove the program is correct if you want to be sure it is good enough for circumstances such as shuttle or aeroplane flight.
With all the complexities of semaphore control in parallel computing, you really have to make sure a program enters and leaves critical sessions at the correct times, without anything else running (that has been designated mutually exclusive).
Many expertes believe that some Airbus crashes were caused by incorrect verification.
On a single processor machine, this is much easier, but how many space shuttles do you know of that only have one CPU!
Have a look at some of the links at Dr. Mark Ryan's page (university of Birmingham) for some more info.
I've worked on programs assessed at Levels 3 and 4, and supposedly the folks I work with now are Level 5 (I know they made 4, but I'm not sure the certification for 5 is finished). I grind my teeth sometimes at the layers of process we have to wade through to get things done - but every six months or so, they make changes to (hopefully) make it better.
The SEI's not just working with software - they're developing models for System Engineering and Integrated Product Development, as well as Personal and Team software process models for small and independent-minded folk. Your tax dollars at work!
I love vegetarians - some of my favorite foods are vegetarians.
The main reason Space Shuttle reliability is not a priority in the software industry in general is that the whole focus of the industry has become the quick buck, the rush to the IPO, the dazzling of the user with endless "features" that have minimal utility. The classic example was Windows 3.1. It was colorful, had lots of features - and barely worked.
The marketroids who set timetables for software projects are another problem. Most of them think any arbitrarily complex piece of software can be designed, implemented and tested in about 3 weeks and get impatient when this doesn't happen. In the shuttle program the engineers are in charge and they determine the timetables.
Yeah, I'm a bitter, angry little coder...
--
Nothing to see here. Mooooove along...
Well, everything i could say about time requirements and budgets has been said a thousand times already.. so i'll just go to what annoyed me about the whole article:
suits.
Seriously. Why is wearing a suit such a huge thing in the business world? I can understand if you're a lawyer and need to impress people with your multi-thousand dollar clothing, or an executive who deals with customers and must appease the customers' sense of what's proper in an executive.. but other than that.. WHY?
It's been proven and re-proven that people are more productive in an environment where they're comfortable. In this particular case the idea seems to be to make your coders as un-cumfortable as possible so they can think of nothing but getting the code perfect the first time so they can go home.. but most places (as has been mentioned repeatedly) aren't like that. So why is it that a guy who works in a cubicle and never gets closer to customers than a middle-manager who is in charge of a supervisor who is in charge of the customer service department has to show up to work in a tie?
And the worst part is that the business world seems to think people enjoy this. Sure, it's nice to look good.. if you've got a $2,000 suit you're going to want to wear it on occasion.. but how many of us can honestly say that we feel more productive in it?
Dreamweaver
"If a man hasn't discovered something he will die for, he isn't fit to live" -- MLK, Jr.
When I meet programmers who think that they are cool and tough, I tell them to read Bravo Two Zero by Andy McNab. It's the true story of an SAS (British army special forces) unit that operated behind the lines during the Gulf War. Here in the UK, the SAS is revered by most guys in the way that Navy SEALS are in the US. The book has a lot to teach about programming.
Many people seem to think that special forces troops are so good that they can just be handed a task, left to get it done, and that they will deal with whatever problems arise. Wrong. According to McNab, the True Motto(tm) of the SAS is "check and plan". For example, before approaching an Iraqi military vehicle, they would rehearse opening the vehicle's door: which way the handle turns, whether the handle has to first be pushed in or pulled out, whether the door swings open or slides back, how much force needs to be used, etc. etc. etc. Every little detail is checked like this. And there are backup plans.
Now read the first sentence of the previous paragraph, but substitute "top software programmers" for "special forces troops". You can see my point. Truly good special forces/programmers/professionals all have some things in common: they are focused, disciplined, and methodical. And they don't feel a need to prove how good they are by taking unnecessary chances.
The main article also notes that programming teams such as those used for the Space Shuttle seem good at drawing in women. This is hardly surprising. Women naturally like men who are justifiably confident about what they do.
How well did the eight-man SAS unit perform? They were surrounded by Iraqis, who had armored vehicles. Three were killed. The other five retreated: over 85 km (>2 marathons) in one night with 100 kg (220 lb) of equipment each. About 250 Iraqis were killed along the way, and thousands more were terrorized.
Sara Chan
It is definitely different, and I'm one of those nay-sayers who read and loved the article while thinking "ugh, I could never work there." Obviously, they do things the right way, and the only way it can be done in those circumstances. It's also like the difference between the pioneers who moved west in the US, traversing difficult trails, versus modern yuppies who contract a moving company. Both ways work, and the latter is infinitely safer, saner, easier, smoother, and faster -- but it's also less fun. Like Harry Tuttle said, "I got into this business for the excitement and adventure; get in, get out, move on. Now, your entire place could be on fire and I couldn't turn on a tap without filling out a form 27B stroke 6." I, for one, like working on a million things at once, and seeing a ton of stuff grow quickly (with bugs). If I was the kind of perfectionist these people have to be, I'd still be working out the bugs in the BBS system a friend and I wrote in Pascal in 1984. Hmm... I think I'll go apply for a job at writing software for hospitals.
If I wanted a sig I would have filled in that stupid box.
Our Telcordia subsidiary (formerly BellCore, half of what was once AT&T Bell Labs) is one of those Level 5 organizations - we're all learning from them.
I love vegetarians - some of my favorite foods are vegetarians.
The NASA Space shuttle project is generally described as being at CMM Level 5 (Software Engineering Institute of Carnegie Mellon). The CMM is basically a system to ensure software quality.
The software fits the budget; is what the client actually requested etc...
Many major companies/consultancies try to aim for CMM Level 3, and most defence contracts require it.
It makes the acheivements of the NASA Shuttle program seem all the more impressive.
It doesn't necessarilly fullfill the Hackers development model, however, it does try to ensure Software Quality.
-- "To ask a question is to show ignorance; Not to ask a question means you'll remain ignorant."
it makes the priority clear. unlike most corporate work in which the fear of unspoken criteria is always yeilding random result, the shuttle program is sure of it's priorities. Of course, it means nothing to someone who signs such a thing without caring. I'm sure someone loves to point out that it hasn't stopped the shuttle itself from problems. But still, the answer is, the black magic is that it makes the priority clearly communicated and acknowledged as communicated.
pyrrho
-pyrrho
I haven't either, but it does spark a thought. This is, apparently, a VERY good way of doing things. Why aren't other companies (who don't have NASA contracts) doing this? From what I've ehard, the Love bug wouldn't have worked if there hadn't been huge 'undocumented features' in Microsoft Everything big enough to launch the shuttle through. Pointing out that intelligent quality control can be done is a good thing.
Part of one SW development process I've worked successfully with has QA engineers designing the test plan, based on the spec, while the SW wngineers write the code. When the code's done, you implement and run the test plan.
If a change is made to part of the code, it ought to be reviewd, and the QA engineer should be present. He can then make some new tests that look specifically at the effects of the change. (And run at least a representative sample of the standard tests.)
My experience was at the application level, on a multimedia authoring and playback system. I'd be tempted to apply similar processes to OS kernel development and testing.
Scenario testing -- what you described -- can find bugs the formal tests didn't, for a hundred users can be more devious than one QA erngineer. But you can't rely on it to find bugs at the early stage; it's too random and undirected.
--Timberwoof
I'm in it for the fun, but it's more fun when you win.
What's needed is a "meta-process", a process to develop the software process and keep it directed towards the goal. I would suggest that a democratic meta-process, where developers themselves work together to evolve the procedures they will use, would work better than decrees from clueless management.
Well, that's one set of religions. Others - such as Zen Buddhism - would say that such rules, or "process", are things to ultimately be transcended. The enlightened person, the sage or bodhisattva, does not refrain from killing based on some religious law; he simply acts. The practice of these religions is designed to help lead ordinary people to that state of enlightenment.Perhaps that should be the goal of software development practices, as well - to help lead ordinary programmers into that state where they are enlightened enough to be simply incapable of producing flawed software.
Tom Swiss | the infamous tms | my blog
You cannot wash away blood with blood
Don't forget why the Arian 5 rocket blew up in 1996 , a conversion error caused a software shutdown that lead to the self-destruct of the rocket.
"The internal SRI software exception was caused during execution of a data conversion from a 64-bit floating-point number to a 16-bit signed integer value. The value of the floating-point number was greater than what could be represented by a 16-bit signed integer. The result was an operand error. The data conversion instructions (in Ada code) were not protected from causing operand errors, although other conversions of comparable variables in the same place in the code were protected."
What was the estimate, about $8,000,000,000 of uninsured losses, including 10 years of work for the scientists with satellites on board.
I wonder how many other maiden voyages have started off so poorly, other that the Titanic that is.
and how slowly they are being developed? I don't mean that it's a bad thing -- it's good that Shuttle program allows them to do it at reasonable pace and with reasonable requirements, but if everyone else wasn't under constant pressure, and if everyone's else software wasn't a victim of feature bloat, dealing with poorly documented and even worse implemented protocols, and never-ending stream of bullshit coming from the management, everyone else would write robust software, too. Well, not really everyone -- some "programmers" wouldn't be able to do anything because they have no skill, no education or are plain dumb, but reasonably geeky and educated programmer can pull something like that in ideal conditions -- and those guys _are_ working in ideal conditions.
Contrary to the popular belief, there indeed is no God.
I think the point of that exercise is to promote a sense of well defined accountability and confidence, up and down the management chain. Sure, in theory, the project manager should be ultimately accountable. But all too often she can, post facto, dodge responsibility for failure by (accurately) claiming that other project stakeholders failed to provide their inputs to the project correctly. In Mr. Keller's case, he would not sign the certificate if he felt that failure was a possibility, for any reason. This also gives the decision makers a well defined "emergency brake" that perhaps could have averted a *Challenger* like disaster, where some line managers said STOP, while some higher-ups said GO!
>Likewise, people often ask why the shuttle continues to use such antiquated General Purpose
>Computers: slow, 16-bit machines designed back in the seventies. There are many reasons, but a big
>reason is that new hardware would almost certainly require massive changes to the flight software. And
>rewriting and recertifying all that software would be a huge task. The current FSW works reliably; if
>it ain't broke...
Actually, AFAIK, the main reason is that old 386s are tested, tested and, once more, tested for space use. With newer processors, there are too many unkowns to risk a space shuttle. The line-widths in modern processors are so small that background radiation is beginning to cause problems in space without proper shielding. Probably they are testing 486s and Pentiums right now, but it'll be another ten years before they're ready for extensive space use.
"The Internet, of course, is more than just a place to find pictures of people having sex with dogs." - Time Magazine
I worked on some mission-critical/life-critical stuff about 2 years ago. It was aircraft related, and since it was basically carrying the data which made the plane fly it was critical by any definition. The processes we followed was absolutely document driven. User specs were examined, questions asked and the user asked to add definition and clarification for several iterations of the document. Then the software requirements etcetcetc were followed, ech document with quite a bit of iteration. Eventually we found that typically documentation and design would take 50% of the project. Testing would take about 30 to 35%, and the actual implementation hardly took any time at all. Now in the commercial world, I find that the process is VASTLY different. Implementation has started shortly after user specs have hit the desk, before design or documentation has begun. As a result, the system we currently have is very patchy in places. Its mission is a lot less critical, but the bugs slow us down tremendously. The bugs are due to the process. The process is requirements driven, not documentation driven. But it seems that the current system I'm working on has about the same complexity as that I used to work on. Only even though we are supposed to be pushing it all out the door faster, the bugs are slowing us to the point where we have approximately the same rate of progress as the mission-critical project!! Lesson: If you do it by the documentation, you will push it out faster and cleaner (and more bugfree!!!)
I don't know a software company that wouldn't implement such a strategy to ensure that their software wasn't perfect if they had the budget to do so. As with all things of this nature it comes down to the money vs. quality contest. The better the quality the more it cost to produce but unfortunatly its not an even rise up the scale. It may cost you 2X$ as much to improve quality by 50% but it might cost you another 4X$ to get the next increase of quality of only 25%. Even the article points out that, that the Shuttle software is the most expensive in world and it still is run on old computers. Give me the same scale of budget/time and I'll give you a windows operating system that a fanatical Linux user would be hard pressed to complain about. Or, even better. I'll use the funds to set up an open source group to make Linux as versatile and useful across the board, from beginners to the "Linux guru's".
I don't work with the FSW people, so I'm not sure about the details of their work flow, but I think it's safe to say that new code goes through several readings, probably both at the pseudo-code and code levels.
Schedule is driven by the planned date for launch, and worked backward from there. For example, if you're going to launch a mission at date L, then the crew begins training at L minus X months, which means that the software has to be ready for the SMS at L minus Y months, which means you have to begin design at L minus Z months, etc. I'm not sure what X, Y, Z and related time deltas are, but I believe they probably start planning at least a couple of years in advance.
--Jim
Or this:
clean up later -
too drunk right now */
- eddy the lip
This is the voice of World Control. I bring you Peace.
Sort out that closing italics tag! The front page article only has the first paragraph, and the second paragraph has the tag. All the headlines are italicised!
My god! Where's all my karma going?
However, dropping to your knees and worshipping the brilliant scientist-programmer who wrote the core code your company's business depends on will not make you milions of dollars.
That code still needs to be tested against specifications -- even if the specs are written afterwards -- and (re)engineered so that it can be maintained and expanded as new versions and applications demand. Trust me, it's better to write the code in a comprehensible and maintanable way from the start.
If you have a genius who won't work within the programming *organization*'s process, you're sunk. If your genius sees the process as liberating, freeing his mind to create really good stuff ... then pay him lots of money and stock options.
--
I'm in it for the fun, but it's more fun when you win.
Please, think of the balsa wood and cardboard tubes. For their sake, please don't release such a dangerous tool!
Free Mac Mini. Yes, I'm
My idea of computer logic was the following: one of my friends studies on a course on computer engeneering (The Netherlands). He's shewn me once one of his scratches. It was a difficult program, several factors envolved, etc.
But it fit into one simple "logic" line!
On the other hand, another "simple" programme took almost 3 long lines of "logical formulas".
What I meant, it would be nice to write programmes in this language, but let the computer do his thing writing the code.
Sorry if it sounded too stupid.
better link for the Methode B :
. html
http://archive.comlab.ox.ac.uk/formal-methods/b
For example:
|--Switch1---lightbulb1---|
|--Switch2-/
This represents two switches in parallel, so lightbulb1 will get juice if either Switch is on. So this is the equivalent of OR.
|--Switch1--Switch2--Light1---|
This is AND.
You can add new rungs and include relays, so that a switch3 could be a relay driven off of lightbulb1. By cascading with relays, you can have states, which can represent steps in a process. Switches can be sensors and lightbulbs can be actuators, so you can build a very simple circuit that can control a multi-step process with safety conditions, such as "only activate the forge if there is a blank in place(detected by a proximity sensor), and the temperature is withing certain limits(sensors), and previous steps were completed successfully, and the operators hands are safely out of the way holding down switches 8 and 9." Instead of wiring all this up as actual circuits, you can connect all of the sensors and actuators to the PLC. That allows you to store your programs, it simplifies the wiring, and you don't need to use actual relays, timers, etc. (You'll still use some relays of course if you need the low voltage coming out of the PLC to activate heavy equipment.)
Simple do-it-yourself application: You could connect all your home lighting, along with motion sensors and switches to a PLC, and set up any number of different logical relationships. So a single switch could be "home/away" which could control a large number of lights throughout the house. A single "movie lighting" switch could turn off certain lights, turn others on, dim a few more, turn off the dishwasher, and set a timer to go back to normal in two hours in case you fall asleep.
I don't have one, but I think the cheapest models are probably under $100. They never crash, they can run for years, they're extremely reliable, easy to use, and cheap. If you can program a VCR, then you can program a PLC. Unfortunately, that rules it out as a product for the home market.
"What I cannot create, I do not understand."
...that we could arrange for a situation where the requirements are all fixed and locked down, and documented, before any coding begins? In industry jobs, I've never seen a project that wasn't having some marketing group force "critical" changes the whole time something was being written.
:)
:)
You get what you pay for, and take the time for. These days, most people and companies seem quite willing to settle for "bad, buggy programs now" rather than "better programs, later". Of course, without organization (also common), it's possible to wait and get nothing later, too. Process is expensive in terms of people involved and time, but it's a lot cheaper in the long run than the alternative.
Open-source projects actually follow this - every successful open project I've seen has a definate hierarchy of people managing patches and controling what winds up in the latest sub-point build, and making key architectural decisions so nothing derails them. Oh - and there's no one who'll fire you if marketings last-minute changes aren't rushed through.
I can almost hear the moans from the pizza-and-coke crowd whem they read this: "Where's the fun? Where's the creativity?". But they're under the mistaken assumption that putting lines of code into the editor is the only fun thing about developing software.
Typeing code is not what the job is about (despite what people seem to think). We're in the business of doing cool things for people. The crativity and ideas that flow from the (very smart) people around me are what drives me.
Just sitting coding typing is a bit dull compared to human interaction...
"The reason I was speeding is.....
I think there are folks at NASA who are not satsified with the SYSTEMS engineering done. When the SOFTWARE engineers had the old Apollo hex keypad (as an example)dictated to them as a system requirement, I would say that the software engineering that followed was still pretty impressive. The project was, and still is, an impressive job of software engineering.
captain america
The problem with this arguement is that while many companies think that they can't afford to do it, what they really can't do is afford NOT to do it. Software is becoming more complex - it's the nature of the beast. For the most part, design is not; we are all still using procedures that were brought into being in 'dawn of computer age', with the exception of higher order languages and more focus on OO.
You are correct in that it may be expensive, THE FIRST TIME. This is called a 'learning curve' and the cost is amortized over the number of times you use this technique. You may also say that the process itself is expensive but that is incorrect, or at least only partially correct. The process allows errors to be caught EARLY, which reduces cost. Please don't tell me that you believe a code-compile-fix routing can catch these sorts of errors as early as a well thought out design.
Also, rigourous design allows for flexibility - this may sound contradictory but consider the use of design patterns. They are NOT things that can just be thrown into the code ad hoc; they require thought and intelligence. A good upfront design means the ability to use these tools. Consequently, use of these design patterns allows for a certain level of flexibility in statisfying the lower to medium level nasty customer requests, and certainly helps on the more egregious ones. Does a code now, look later approach allow this? (if you think so, I have this bridge I'd like to sell you ...)
In short, yes, using these techniques is expensive. But they also produce code that cuts development time (i.e., no stuck in debug/extra request phase for 2 years) and once people get used to the process, the extra cost/load is minimal.
I seem to remember seeing this article before, and since the only place I read anything interesting anymore is as a result of hearing about it here... ;)
We're all full up on Crazy here...
their process ensures it will be. The vast majority of software development is performed in an environment where individual "heroes" are the primary reason projects succeed. The Space Shuttle Onboard Software processes will seem to almost all of us to be "common sense", but how many of us work in a place where management mandates these things to ensure quality? Their environment is "ideal" because they have made it so. Unfortunately, many managers' (and too many developers', also) attitudes can be described as "get it done", and it shows!
They were rated CMM level 5 in 1988 - one of the first organizations anywhere rated at that level of software process maturity. Another good description of their processes (and how they created them) is in the book "The Capability Maturity Model - Guidelines for Improving the Software Process" (ISBN 0-201-54664-7) in Chapter 6, "A High-Maturity Example: Space Shuttle Onboard Software".
As far as making software error-free, a quote from the book will help illustrate the difference in attitude they have (it's talking about a graph). "These data include failures occurring during NASA's testing, during use on flight simulators, during flight, or during any use by other contractors. Any behavior of the software that deviates from the requirements in any way, however benign, constitutes a failure. Contrast this level of commitment with the cavalier attitude toward users in most warranties offered by vendors of personal computer software."
The best place to find more about the CMM is their web site at http://www.sei.cmu.edu/
I flew the F-15E for 4 years, and it was common to have to reset a system because of some sort of glitch. Whether the glitch was hardware or software based, I didn't really care. If a system stopped working reliably or failed outright, it was time to troubleshoot. That usually meant first a software reset, a hardware reset, and in the worst case (but still common) a complete power down/wait 30 seconds/power up cycle.
2-3 times per flight is more than I usually experienced, but I think I had to reset at least one system on 50% or more of my flights. That's quite a bit more than 1 every 500 hours. Some aircraft were better than others too... One jet required it's radar to be reset every 15-20 minutes. That problem was eventually traced down to a wiring harness connector...
In addition, there were and still are known software problems in that aircraft. The known ones usually have some sort of workaround (if the heads up display freezes, cycle power on the display processor, stuff like that), but the occasional random crashes or glitches (like occasionally the plane will suddenly think it's flying 100,000 ft below the ground) have no known cause and the only fix is to reset something until the jet behaves itself again.
My last point is that the flight control software in the F-15E is designed to go offline if the aircraft exceeds certain parameters. In that case, the flight controls must be manually reset in one of four ways. There is a quick reset switch, a "hard" reset switch for pitch, roll, and yaw, we can cycle power for those systems, and worst case we can pull and reset the circuit breakers for the flight control system components.
The funny thing is, it works only because the rest of the design is very robust. Most systems have some sort of backup, and the plane flies just fine without any electrical power at all. Once the software problems are known, they're dealt with as simply one more environmental factor until they're fixed. The fix may take over a year, but they are usually fixed eventually.
Before every flight, Ted Keller, the senior technical manager of the on-board shuttle group, flies to Florida where he signs a document certifying that the software will not endanger the shuttle.
Is this supposed to be black magic or something? If something bad is bound to happen, it will happen regardless of how many "certificates" and such were signed.
Or maybe it's about transferring responsibility?
Maybe Mr. Keller could sign a certificate that aliens will contact us next wednesday?
They are going to use old Pentiums (no MMX) with Win95 on the new space station.
coz sure as heck, the kernel developers have lost the plot.
I Think your problem here is that you still subscribe to the fallacy that "Code like Hell" Programming is faster than doing things properly.
It isn't.
Many organisations are starting to find this out and are moving to proper professional engineering practices that improve reliability increase schedule predictability and more importantly reduce costs.
A couple of hundred years ago people built houses & bridges the way we build software - work until it's done. These days we have archaetects and project managers that build houses faster, more reliably and ON BUDGET.
This is the way the wind's blowing. It's a lot less heroic but it's the future.
It's not just space shuttle code that needs extreme reliabilty. The embedded systems in civilian aircraft are not interrupt-driven because of the reliabilty issues associated with interrupt-driven code - interrupts make the software to hard to debug thoroughly (becuase there are so many combiniations and timings of input signals to test), make faults difficult to replicate and have the potential to go wrong on a spurious set of input signals. This sort of problem doesn't really matter too much in a home or corporate computing environment, but it would be a major disaster if a plane carrying a few hundred people were to crash into a city with a population of a few million, just because of a software error. These things need 100.00 per cent reliability, so obviously software hacks are frowned upon.
The obvious canidate would be Bill Joy's TCP/IP implementation. Eveyone runs it:
1. BSD's always used it
2. SYS V incorperated it - thus it flowed to most commercial unixes
3. LINUX borrowed heavily from is (recall that Regents of the University of California boot message?)
4. If the TCP/IP fingerprint of WIN2000 is any indication, they borowed it too.
And it works right every sincle time you use it. So, what process made it? A single genius. All the cool process in the world won't make up for the fact that the single requirement for great software is a great designer/programmer. The required process is simple - whatever that person requires to let their genius loose.
The only way to circumvent this requirement is to do what NASA does and spend probably literally hundreds of $ per line of code.
On another note, the group that I work in (Flight Design and Dynamics) may start looking into moving from our IBM/AIX platform to a Linux platform. Penguins in space! I guess that is a bit offtopic, but oh well.
Reliability obvious gets a big premium when crash is not a metaphor.
Hmm. Talk to the client until you fully understand the problem. What a concept! No doubt this will make some fast and bulletproof code. Now if only they can teach their engineers to convert units correctly....
Ever hear of Boo.com?
;-?
Ceci n'est pas une sig.
Have you ever programmed a half-way complex system yourself? Re-writing it from scratch is often the best thing that you can do, the more often , the better. In fact, there are software engineering models that officially choose to re-write their code often. This is called "prototype-based SE".
The reason is that while you write the code, you invariably notice some decisions that you made earlier were false, but they affected the design so deeply that changing it would be more work than rewriting it from scratch. The alternative is to live with the design flaws; most commercial projects do that because they don't have the time to re-write their code.
The illegal we do immediately. The unconstitutional takes a little longer.
--Henry Kissinger
Well, the point is really this: There is a point beyond which making the software more stable is so much more work that it's simply not worth it. Where this point is depends, of course, largely on what the consequences of failure are. Obviously, if multi-million-dollar equipment is at stake, it is worth being extremely thorough.
The illegal we do immediately. The unconstitutional takes a little longer.
--Henry Kissinger
Unreasonable deadlines and too few programmers are usually the reason for pulling all-nighters, it seems to me. Other environments where those kind of things aren't necessary can be found in the vincinity of banks and insurance companies, so look there if you want relaxed programming jobs.
"I love my job, but I hate talking to people like you" (Freddie Mercury)
US Military test pilots aren't stupid people. Most of them have advanced degrees in aeronautics or aeuronautical engineering -- at the insistance of the military or aerospace firm they work for.
...
Or not... In truth, I suspect the first few questions would really be something like "You're kidding me, right? Do you think I'm crazy? Would you be willing to fly this deathtrap?"
In the flight test programs I was associated with, the software had to meet several hurdles before it got near an aircraft. After unit test, and integration test, there was batch mode tests on a hardware testbed, then man-in-the-loop testing in a $Million simulator. When the test pilots accepted the results of their sim time, a final review was held and flight testing could begin.
Envy my 5 digit Slashdot User ID!
We are the only ones with a compiler, because we wrote it ourselves.
Maybe you could release a free 'Light' version HAL/ER, High-level Assembly Language / Estes-Rocket for the rest of us.
Bullshit.
NASA didn't just have a solid process, they had MONEY. They BOUGHT that quality, by hiring an order of magnitude more testers than you'd find in the commercial world. By budgeting several years of development time rather than weeks or months. By reducing the number of lines of code that any one developer is responsible for.
There's a lot to learn from a highly structured development process like NASA's. But don't kid yourself that the quality they produced is simply because they 'had the right process' or had better management.
But then the Japanese came along with a radical new idea: if there are defective parts coming down the line, then we should figure out why they were created defectively in the first place and fix that. Then the number of defective parts at the end of the line would be less, thus you would need *fewer* inspectors and *less* time at the end of the assembly line. (Ironically, this principle came from an American named Edward Deming; unfortunately American companies were too successful during his lifetime for them to take him seriously :-) So the Japanese were able to build cheaper cars quicker than the Americans while actually having higher quality.
I think that's very analogous to the current argument. Under the current system of coding, you basically hack together something that sorta works, and then use sophisticated debuggers/development tools to figure out which parts are buggy. Using that system, it's true that higher quality requires more cost and time.
But I think the point of this article is that that is the wrong way to approach programming. First, figure out why defective code gets written in the first place (be it poor client specifications, poor management, poor documentation, whatever) and then fix those processes, and you'll turn out quality code without having to spend any more time or money!
As a practical example, I first learned C under a CompSci Ph.D. who was a quality fanatic. In order to teach me to code properly, he would give me projects and then not allow me to use a debugger. Nothing at all. Zilch. Nada. The only thing I was allowed was to place print statements within my program wherever I wanted to see what was going on. As a result, I spend *a lot* of my time planning my code out, and reviewing it over and over again before even compiling it, because I knew that if there were bugs in it, I couldn't just fire up a debugger and take a look.
And secondly, if there were bugs, I couldn't just trace through the entire program or create a watch list of every variable. I had to study the bug and understand it, look at the code and figure out where the bug most likely was, and then use selective print statements to look at the most suspicious stuff. That way, when I encountered bugs, I'd be forced to actually understand what the bug was and then analyze my code to figure out where that error most likely was.
If this sounds like a programming class from hell, believe me, it was incredible! I couldn't believe how much of my code worked the first time it compiled. And when there were bugs, I actually fixed the underlying flaw in the logic rather than just applying a temporary patch. What's more, since the rest of my program was well planned and documented, there were no "hidden" effects: if I found a bug, I knew exactly which parts of the program it affected, and perhaps more importantly, *how* it affected those parts. Thus they were very easy to fix.
Believe it or not, it took me less time to program this way than using debuggers, and the resulting code was much more stable and understood.
If you look at commercial software these days, it's not uncommon for the debugging period to take longer than the actual coding. In other words, there are more quality inspectors than there are assembly workers, and the time the code spends in inspection stations is longer than it spends being produced. It's tough to say that this is the "efficient" method of programming...
If you want to see where this is heading, just turn once again to the car industry: once American companies got their asses kicked by the Japanese, they adopted their techniques, and Surprise! Cars now come out of their factories with higher quality, in less time, and at less cost (adjusted for inflation and new features :-). Who would've believed it? :-)
Noone wants to write buggy code...
Well, mister know-it-all...how do you go about getting really obnoxious amounts of money out of the customer?
A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
At the other end of the scale I saw a talk a year ago by some US army guys. There they were modelling dog flight situations with a very real posibility of the system being used in a real combat senarios. Thing was they were doing all this with an AI system which is about as far as you can get from a verifiable system. Should we be worried
At first it seems too predictable
Duh. The software that these people write is responsible for *lives*. If I had to depend on the code of some "pizza and coke" programmer for my existance I would want the development process to be predictable too.
Most programming isn't sexy. Deal with it
Formal methods are also incredibly complex for any nontrivial program.
.. I studied formal methods in my degree course too.. Z specs and stuff. I understood its purpose, to verify correctness of a system. I think even the professor teaching it admitted that for anything real the amount of proof required would be *immense*. The professor was a very 'academic' professor - a very smart guy - you couldn't help think though that the stuff that was taught was only ever going to be useful in academia. I think he despised the real world for not being easy to define :)
Absofrigginlutely
I do remember hearing though that there was a program somewhere to turn Z specs (formal specification method) into C code.. although I never did use it. Anyone else remember/hear of that or better yet actually use it on something?
The idea being that you do the proof and feed it through the 'thingo' and it churned out C code.
--
Delphis
One of my teachers works at their Independant Verification and Validation facility. Several other do or have worked for them on some level. They are VERY good programmers. Several of my school projects are based on things that NASA has been doing.
I find it funny to hear people talk about very BASIC things in computer science as being "cumbersome" and such. Like much of what was said about Ada.
One should strive to do good computer science, not whine about it. Less bugs, and better performance should be a way of life. Good code should be the only kind that you make.
Otherwise you might as well point and click your programs together...
Eh...
The SEI's web site is at http://www.sei.cmu.edu/.
Even if yo consider a project managment approach, often you will wind up rewriting the code from scratch anyhow. From personal experience I can tell you that realying too heavilty on user input in the proposal/design phase can cause a project to completey lose focus. Many rewrites are the result of "feature creep" associated with pandering to the user's every whim. The most sucessful projects I have seen started with a narrowly defined, strict set of goals. Even at that, the trend seems to be at least one major rewrite by the time the software reaches its third version. Code reuse is highly overrated.
That's because pizza-and-coke all-nighters are a direct byproduct of poor planning, either by the engineer implementing the code, the architect creating the design (if there even is such a person) or the person making the engineer's schedule. And the result is usually hastily written, incompletely tested software that is typical of most product offerings for use on the desktop.
The process of authoring mission critical, man rated software is so far removed from the ad hoc, informal, duct-tape-it-together approach that most programmers use that no direct comparison can be made. I've seen both ends of the software development spectrum and they each have their uses. You can't launch a shuttle with a bunch of last minute kernel patches and some stuff that was written the night before the launch date. But you can't compete in the commercial software marketplace with code that takes 2 or 3 years to specify, design, implement, test, and integrate, either.
Stand in awe of the people who have the skill and discipline to write software of this quality. Learn what you can from their process and try and use the lessons they've learned. Their stuff doesn't break, because when it does, people die. If O/S developers had that same attitude about their code, we'd never see blue screens of death, kernel panics, or any of the other flakiness we tolerate on our desktop machines.
Shut up and eat your vegetables!!!
__________________________________________
God did not appoint us to suffer wrath but to receive salvation through our Lord Jesus Christ --1Thes5:9
US Military test pilots aren't stupid people. Most of them have advanced degrees in aeronautics or aeuronautical engineering -- at the insistance of the military or aerospace firm they work for.
I suspect that, upon seeing the "computer restart" button, the test pilot evaluating the aircraft would start asking a series of questions:
1. What is the failure rate of the computers; i.e., how often will that button have to be pressed.
2. What is the time elapsed between the computer failing and the computer operational, including the reaction time of the pilot or weapons officer? Assume that the pilot and weapons officer are already a) flying the plane, b) lining up on target, c) watching for SAM sites, ground fire, enemy aircraft, and d) coordinating with friendly aircraft.
3. How does the computer controlled, fly-by-wire system function during the timeframe covered in question 2? Will it fly steady (given that many modern fighter airframes are inherently unstable in flight, and rely on active computer control)? Will I have any control over the plane until it restarts?
4. If this happens in a dogfight, what are the chances of recovery and survival?
Or not... In truth, I suspect the first few questions would really be something like "You're kidding me, right? Do you think I'm crazy? Would you be willing to fly this deathtrap?"
Don't be silly. Formal methods prove that a formalized version of a program matches a formalized version of a specification. Very good for nice, clean, precise things like floating point algorithms.
Unfortunately, real programs are written in real computer languages from specifications written in real human language. They also have to interact with real operating systems running on real hardware. Don't forget the nice, messy anynchronous "real world" data.
Formal methods are also incredibly complex for any nontrivial program.
Your assignment for Monday is to prove the correctness of the Linux kernel.
Welcome to the Turing Tarpit, where everything is possible but nothing interesting is easy.
Every time I read a history of a programme and find a line "completely re-wrote the code", I begin having second thougths about how really good the programme is.
There have been several occasions last year where me and a co-worker ended up trashing pages and pages of code to re-write it with the same functionality, but modular and ended up being smaller in some cases.
My company used consultants who wrote terrible code. Let's use this example...there is a program that calcuates x days ago. The consultant's program went and tried to calculate leap years and all of that. Our program that replaced it used system library calls to date, and then simply subtracted the proper amount of seconds. Other ones were hardcoded scripts to run sql on our database, we replaced that with a perl script that took the sql as a parameter.
So there are times where a re-write is better than maintaining the code. I guess the biggest case in point is mozilla versus navigator. Basically I agree that projects were planned and used software engineering principles we would most likely end up with good products. Granted game programs seem to be done best when they're a hack.. But how many times have you seen long term maintenance of games?
"If you insist on using Windoze you're on your own."
Formally proving that an implementation satisfies a specification is possible, but NOT TRIVIAL.
This, coupled with the fact that so few developers can handle writing formal specifications (can you see the average perl hacker writing a spec?), is why it's not "that simple"
Finally, as for your suggestion that everyone use VDM/Z or Larch/CLU, can you grab me a copy of MS Visual CLU? No? How about the CLU gcc front end? No? Well then how do you expect me to compile it? Yes, I know that compilers are available, but reliable ones with decent optimization passes? Even Barbara Liskov seems to have moved on from CLU...
Don't get me wrong, specification languages are definitely cool in the right places, but we've got a ways to go before they become palatable to the average human being
arvind rulez
Here's NASA's own history on bugs in that software:
- So, despite the well-planned and well-manned verification effort, software bugs exist. Part of the reason is the complexity of the real-time system, and part is because, as one IBM manager said, "we didn't do it up front enough," the "it" being thinking through the program logic and verification schemes. Aware that effort expended at the early part of a project on quality would be much cheaper and simpler than trying to put quality in toward the end, IBM and NASA tried to do much more at the beginning of the Shuttle software development than in any previous effort, but it still was not enough to ensure perfection.
Read the NASA history. They had a 200-page known-bug list in 1983, although they did fix most of them during the long downtime after the Challenger explosion.The Shuttle's user interface is awful. The thing has hex keyboards!. Some astronaut comments include
This project should not be held up as a great example of software engineering. Even NASA doesn't think it is.
Surely you are all familiar with the mantra, "There's never enough time to make it really right, but there's always time to fix it."
Frankly the Shuttle Group works right is because it plans before it starts to code. Good planning prevents mistakes that have to be fixed later. Note also that unplanned changes normally = introduction of chaos. I've been programming since punched cards and all the good books on programming and system design warn against jumping into the coding before finishing the design. Yet time and again, I've been whacking out the code before the specs are really finished. (We say they are finished, but the number of changes that have to be made prove us to be liars.)
My experience, and from what I've read, proper planning rarely extends the length of the project. The difference is that more time is spent on the planning, and *less* time on coding and a *lot* less time on debugging. Spend enough time on planning too, and you get rid of the bulk of last-minute changes from the client. If you find out what the client wants before you start coding, then you're a lot less likely to receive change requests when you're deep in the code.
It is, of course, a management issue. Managers are generally the ones who set the schedule. But the programming staff have a responsibility - if they really want to think of themselves as competent professionals - to fight against foolish deadlines and a rush to code.
We also don't follow good programming practices IMNSHO. I've just been reading "The Pragmatic Programmer" and I strongly suggest you all *run* out and get your own copy imeedjutment! This is a book I wish to The Maker I'd had to read back in college days. This book talks about good programming practices - which is something that is rarely discussed in any detail.
Which is why there is so much crappy code out there. (And I include my own, alas.)
Solidifying a contract like that works when the client actually knows what they want. More often they have absolutely no clue of what they want/need, and require the programmer to help them along that stage as well.
With these type of clients (and I've dealt with many) taking the proper long stage of design and discussion doesn't work at all. The client immediately changes their tune after seeing initial results. Not so much to add features, but that the features they actually requested were not the ones they needed, or didn't work within their business practices.
Doug
Venn ist das nurnstuck git und Slotermeyer? Ya! Beigerhund das oder die Flipperwaldt gersput!
>Sadly when you try and apply those standards to commercial quality code, it flops. This is so true; I worked in an SEI level 3 group.. The time I spend there was invaluable in learning to design, write and test code properly. Everywhere I go, there is *so much* resistance to implementing any kind of structured process. Why is there so much reluctance to implement these ideas?
'Course if they started writing space shuttle code like that, it would be "Goodbye World"...
All opinions are my own - until criticized
Somehow I think those comments would look more like this:
;Shuttle Waste Dump
;
;I dunno WHY this works, but it does!
--
The other side is crowded. The dead have nowhere to go.
Every time I read a history of a programme and find a line "completely re-wrote the code", I begin having second thougths about how really good the programme is.
minor correction: "completely re-wrote the code with new insights." Anyone who's done a half-complex software engineering project can tell you there are often subtle design decisions that will not become apparent until an attempt of a full-blown implementation. These design decisions which seemed insignificant originally, might dictate the organizational structure of the program, and such a thing may not be easily altered cheaply afterwards.
Didn't the Linux TCP/IP stack undergo several rewrites?
megumi
Really? You might want a complete re-write for a couple of reasons:
With the ever-faster-growing complexity of programmes, it becomes more and more difficult for humans, even aided with computers, to keep track of the project. But if you teach everyone how the computer logic works, the programming would become only about writing the necessary simple code (ha! hackers, get this!).
I don't understand your point here. What do you mean by computer logic in this context?
There are ways of expressing solutions to problems (different programming languages, CASE tools and so on) which are well suited to particular problems, but there are no magic bullets.
Argh! It mentions multiple versions in the first paragraph, doesn't it! *bangs head against wall*
That'll teach me to read articles in advance!
Sorry!
The almighty process predicts that there is one bug in the system. This must be keeping the programmers from sleeping wondering where the hell it is...
Special Relativity: The person in the other queue thinks yours is moving faster.
This is starting to look like a pattern. I suggest that either timothy be moderated down, or, in the spirit of the article, that the Slashdot story quality assurance process be improved ;)
I doubt that it's in Ada; I think the software predates Ada.
As for something better: in a word, Eiffel.
Unfortunately, there's an old but true saying:
Eiffel is an amazingly clean, simple, and straightforward object-oriented language. It removes both the ability and the need for clever coding hacks, thereby shifting most of the development effort to software design rather than implementation. Most "programmers" aren't designers, though, and they scream bloody murder that they can't code something really cool likewhile (*to++ = *from++);
obOSS: A GPL'ed Eiffel compiler is available at http://www.loria.fr/projets/SmallEiffel/.
Actually, AFAIK, the main reason is that old 386s are tested, tested and, once more, tested for space use.
You're thinking of the PGSCs, the IBM Thinkpad laptops that are carried onboard to help the crew do various non-critical tasks, usually (always?) related to payload operations. These are indeed old 386-based machines, radiation-hardened but still susceptible to crashes related to bit errors in RAM. They run Windows 95, so obviously they aren't used for critical ops. (BTW, linux has been run onboard too.)
The GPCs (General Purpose Computers) that run the flight software are IBM AP101-S computers, cousins to the IBM 360/370 architecture machines of the 1970's. The AP101 is a big-endian, 16-bit, 4 MIPS (appr.) machine that can address up to a megabyte (actually 2^19 16-bit halfwords) of memory. It has been extensively tested for space use, as you note, which is another reason NASA sticks with it. An earlier version of this computer, the AP101-B, flew earlier shuttle missions and has been used in military operations like the B-52.
--Jim
I don't know about that.
The F-16's COG (ctr of gravity) is so far back on the fuselage it is an inherently unstable aircraft. So unstable, in fact, that w/o the computer adjusting 2 control surfaces on the underside of the fuselage 60 times/sec, it would pitch up & down continuously. That's just one reason why the F-16 is a piece of junk, IMHO.
The B-2 couldn't fly at all w/o computer assisted flight controls.
After reading the article it got me wondering what the QA process for kernel mods are? Is there a beat the hell out of the new driver process that goes on, or is it a release into a beta and see if anyone has problems? I assume there aren't official QA testers, but are there any guidelines that before something is accepted into the kernel, it should at least be tested for X, Y, and Z?
This work uses the SEI CMM (capability maturity model) developed by SEI and used in many areas within the civilian and military software development community. Organizations are assigned levels with specific tasks, roles and processes defined to advance to the next level. All results that are compared to the CMM levels must be verified, quantified and repeatable. A good place to start if you are interested in how the military develops robust softare is the Software Technology Support Center @ http://stsc.hill.af.mil/
From what I hear about in the real world, some (but by all means not all or even most) programmers look down on clients just because they don't know much about programming. They assume that just because they have a certain expertise over others that they somehow know more than them in general.
No, I look down on them because they don't know what they want, but they want me to do it anyway.
I'd be relieved if somebody came to me with clear, detailed specs for once. I usually get about three sentences, and, if I ask any questions, it's "uh, I dunno, what do you think?" I'M WRITING THE SOFTWARE FOR YOU!!! You decide! I'm not going to do your job, too! "Well, uh, we need it next week, and I don't have time." (This is where I rip his heart out of his ribcage as he stands there, or at least I'd like to.)
Um, sorry 'bout that. That just gets to me sometimes.
This software is the work of 260 women and men...
Commercial programs of equivalent complexity would have been written by 7 or 8 people.
http://www.esrin.esa.it/htdocs/tidc/Press/Press96/ ariane5rep.html
If more projects worked like that, there would also be a lot less software in the world. Say goodbye to whateever you're running to watch slashdot, you couldn't afford it. (You also couldn't afford the hardware to run it on, because faultless software is of little utility without faultless hardware.)
I would suggest that if every software project werre SEI-5, there would be no internet and people would be doing papers on typewriters.
That question is answered in other replies. But: the whole essence of it all is that it is the process that is the crucial factor. The implementation language is not the deciding factor (although I will admit that some languages will let you create a bigger mess than others), although I suspect that the selection of the implementation language is part of the process.
MSN 8: Now Microsoft even has bugs in their ad campaigns.
My third programming job was my first experience with software engineering. I'd had 4 years of experience at two other jobs -- one where I wrote code for a InterLibrary Loan book lending database, and one where I worked on an e-commerce package. There was not a thing at either place that would qualify as a spec, and there was no process in place for engineering. I didn't know anyone who used specs. I assumed that this was something that was taught by computer science professors, but wasn't actually practiced by anyone.
:)
Then I got a job at the Waterford Institute. Their process wasn't probably as tight as the space shuttle, but there WAS a process, and there were specs. Nice specs. Nearly psuedo-code.
We were programming educational activites for kids learning math. Activities were created by design teams consisting of an educator, an artists, a tech writer, and a programmer. The tech writer would document everything that went on at the meetings, and distill it into spec. The design team would meet regularly over a period of several months, refining the spec until it was solid.
The spec described various states of the software. When a user did something, the state of the software changed, and did something accordingly. I'd never seen software described this way, but it made a big impression on me, and it made things easy to write and debug.
('course, the platform we were writting on was in Java, which kept changing, and in-house developers were writing our own object library, which kept changing too, so your code would work one day, and then wouldn't the next, so everything wasn't perfect. But hey. I was impressed with the specs
Tweet, tweet.
However, the worst severity in a Military FMECA is NOT "loss of life." It is "Mission Failure." Which makes sense. Losing a pilot (and aircraft) is bad, and very expensive. Losing a battle is rather worse. Fortunately for the pilots, a failure that kills the crew also tends to hamper their effectiveness in completing their missions.
To email, do the obvious.
The project that I thought you might be interested in is the development of a space shuttle flight computer emulator for linux described here.
I've seen that project description before, because I wrote it. I'm very familiar with the GPCE project because I'm the principal author of the C++ version. Unfortunately, none of our Dual partners in academia wanted to tackle the conversion of GPCE to little-endian architectures, so for now we can't run on Lintel systems.
--Jim
I agree. Developers in the commercial world are under different constraints and requirements than the shuttle software crew. Commercial shops are motivated by $$$, not perfect code. If "good enough" will sell (and that it does!), then why bother with anything else? If no one cares about 5000 errors, then why spend $35mil (assuming the company has/can/wants to spend that much money on 1 application) writing phone books of specs?
There's also the matter of competition. The reason we have tales of hackers writing deep into the night to get a product out the door is because if they didn't, competition could very well spring up out of the blue and beat them to market.
There are lessons to be learned, however, even for open source developers like ourselves. About every project on, say, freshmeat, has a link to a source tarball on it's web page. But how many of those projects have even a single design document on their sites? Why is that? Design/specification is not a development phase that can be skipped, just because you want to get coding because its fun. Proper documentation is vital before coding all but the most trivial of applications.
--
"And is the Tao in the DOS for a personal computer?"
python -c "x='python -c %sx=%s; print x%%(chr(34),repr(x),chr(34))%s'; print x%(chr(34),repr(x),chr(34))"
I think this was mentioned in the show "From the Earth to the Moon", but it illustrates to how important getting perfection is in the space industry.
If, for example, there are 100,000 parts on the Space Shuttle, getting 99.9 percent accuracy means that 100 parts can break. Getting 99.99 percent means that 10 parts can break.
----------------------------------------- Well damn...so that's what that does...
I saw this article a while back linked from here. Incredibly cool stuff . . . the part about "blueprinting software" and "how we design software in the future" was especially cool. It makes one aspire to code to a higher standard.
That said, something I was curious about that the article didn't answer, and that I don't see mentioned here yet-- what language is all of this done in? Ada would be my guess, or is there something even better than that?
iSKUNK!
Not necessarily, a point that was repeatedly emphasized in the article was that these guys are not ego-driven. Most people who are geeky and educated have pretty huge egos as well. they like to put their signatures in the code. Did you notice that no programmer's name came up in the entire article while a typical article on a project like linux or quake would make gods of Linus and Carmack. You can't have it both ways. And although everybody agrees some tightening up is required but this kind of rigidity which is perfectly justified here would be harakiri in a corporate environment.
A large part of the software in this world started as a hack in some university lab and was then improved upon till it came to a passably useful stage. That is why you have these EULAs which absolve the software makers from all responsibility in contrast with this group's software where they take full responsibility for anything going wrong.
and lastly, the late night coding sessions are not all that bad. I bet a large part of the kernel code for linux and BSD was written that way. But you should have a independent review process that is responsible for catching the bugs. Peer review can do wonders for the code before the final version comes out.
FarHat
At the intersection of computation and biology.
Want to know what a Shuttle GPC looks like? Check outa o/STS39/10064134.htm.
http://www.ksc.nasa.gov/mirrors/images/images/p
--Jim
I did an MSc in Spacecraft Engineering and it looks to me like they've adopted many of the protocols for ensuring the reliability of hardware in spacecraft. However the question to ask is how long would a private company stay solvent if they tried doing this. Everything has it's place, including the pizza munching Xers, if no lives are at stake.
I thought they were the only group to achieve SEI-Level 5. If not, then who else has, I'd love to go and correct one my lecturers.
My Webcam
You could hold this up as an exemplar on how to write code when you have a stable, well specified requirement and lots of resources.
Most development teams don't live in that world and never will. Business users don't change their requirements because they are capricious, their requirements change, they just do, that's life.
I'm not saying that many (any?) development shops get it right but you have to apply an appropriate process for your circumstances.
That said a lot of what they do: team orientation, no-blame culture, focus on process improvement, focus on quality and fixing at source, will always help.
Tom
I think that re-writeing is a good thing check out the following books:
Extremem Programming by Kent Beck
Refactoring by Martin Fowler
Both advocate the use of proper unit testing, which many people do not do, to make sure that any alterations to the structure of the code doesn't change the functionality.
Also worth a look is "Software Project Survival Guide" by Steve McConnell. Has a groovey questionaire so that you can work out how doomed your project is.
These books seems to be the best. I found that the projects I have been involved in have failed because msyelf and other didn't have enough knowledge of how projects go wrong. Also importantly processes to stop the failures.
I have just spent quite a bit of time in the last month reading such book and OO design book and was amazed how obvious it all was.
Note that 2 out of these 3 book mention the NASA programming method.
... where you focus on a working solution, instead of just quickly trying to get something to work and then relying on more money to make it work right.
I thought they were the only group to achieve SEI-Level 5. If not, then who else has, I'd love to go and correct one my lecturers.
When the Capability Maturity Model for Software was published by the SEI there was only one ML-5 orginzation; at the time they were known as the IBM Onboard Shuttle group. Thankfully, times are changing.
According to the SEI's 1999 survey, 61 organizations reported a Maturity Level of 4 or 5. Of those, 40 were Level 4 groups and 21 were Level 5. The survey goes on to mention that as of 15-Feb-2000, some 71 organizations reported that they were Level 4 or 5. Those that gave their consent are listed in Apendix A.
I must disagree. The client is always right, the mess is management, do not blame it on client. Before you ever write a code for a client, you go through what you call the prototype phase, you work with the client and you select all the features you want, you build a prototype, and show to the clients, you prioritize the features according to imporance and what not. You throw away the features that can't make it due to cost, time and such... You agree with the client, you talk to the client and explain to the client how making a change can greatly impact the project. For example an innocent change in UI, might mean rewriting of hundreds of pages of documentation and such. After you have made everything clear with the client, put it down in writing, have them sign it. If client comes back, remind him, show him the signed form, send him out of your office. Now, sometimes clients may have really valid reason to be back, if it is valid, compromise, so long as everyone understand that it will take longer. Notice, that the only impact a client's demand for more features on a program should have is that it will take longer, not that the program will become a mess. The problem with today's world is lack of management, most programmers or product/technical leads are not people's people. They can grok code, but they can't manage and talk to people.
------ Curiosity killed the cat. {satisfaction brought it back | it didn't die ignorant | lack of it is killing mankind
This timothy guy posted an article about the Glass Cockpit installed in the shuttle a little while ago - the comments included a link to this Fast Company article (very good, but I'm sure most of us read it.) It's also from Dec 96 - not very new.
Not as many as should have.
Every time I read a history of a programme and find a line "completely re-wrote the code", I begin having second thougths about how really good the programme is.
With the ever-faster-growing complexity of programmes, it becomes more and more difficult for humans, even aided with computers, to keep track of the project. But if you teach everyone how the computer logic works, the programming would become only about writing the necessary simple code (ha! hackers, get this!).
Would the next generation programmers write in "logic language" instead of C++? Who knows, but it would IMHO make the programmes robuster and even better.
I can see many Dilbert-fans wondering if that is a bug or a feature.
1 Space Shuttle Endeavor
1 Launch Pad
1 Houston Mission Control Station
4 Astronauts
Shine on, you crazy diamond.
I've heard that fighter aircraft have inherently unreliable software with a very low restart time. Next to the trigger there's a hardware reset button. Typically the software will go down 2 or 3 times in a flight.
And you though Windows was bad.
I had the time.
I had the paitience.
Well this is cool. I proves that you can't write perfect software*. However you can come close.
If only everybody would do it this way, not just some cool company.
This probably even produces better software the "open source" way. OpenBSD is the only open software project that comes close, it really is kind of sad. People need to relax to do it right, down with stress!
Well if you met someone who works at some dot com ( well there quite a lot of them here in Stockholm ) they are always really really stressed. That might impress the stockmarket but not really anyone else... That is the reason everybody talkes about "When will the bubble bburst?"and I can tell you this:
The "bubble" ( which consists of overstressed people ) will burst very soon. The more relaxed people will take it easyily.
* Well you can, but Hello World! isn't really THAT
complex.
It's called new wave but it's just the same.
First, I do think the process used for the FSW stuff is probably an excellent choice for that project. This is because of the following factors:
The author of this article seems to be saying that if all software was developed the same way, we'd be living in some sort of bug-free software utopia. The reality is that many software software projects proceed under very different circumstances:
Under these circumstances, you had better have some "hot shot" (!= cowboy) people on your project, or you will have a failed project. The real challenge on most software projects is to write good software in a fairly short period of time. Unfortunately, there are a lot of bad programmers and s/w project managers, and what we end with a lot of the time is shitty or mediocre software.
I thought the author's best point was that software development is unnecessarily crazy, and could benefit a lot from being done in a more relaxed atmosphere. Unfortunately, it seems beyond the ability of any one company to make this happen, given the need to compete in the marketplace. Basically, what would have to happen to make software better is for everyone to demand better software - to make quality a priority over speed. The markets currently are saying that speed is the bigger issue.
it looks like BoM (Bill of Materials) :)))
this field has been intentionally left blank
This software is bug-free. It is perfect, as perfect as human beings have achieved. Consider these stats : the last three versions of the program -- each 420,000 lines long-had just one error each. The last 11 versions of this software had a total of 17 errors. Commercial programs of equivalent complexity would have 5,000 errors.
How can they be sure it's bug free? If the last 14 versions had 20 errors, did they think it was bug free each time - only to find more bugs? At 500k lines of code you can't prove it all mathematically and human checkers are.. well human.
One way to measure how many bugs your code has is to purposefully introduce a bug and tell people to find it. Then you count how many new bugs they found along with the bug you introduced and scale that by the lines of code you have. But this technique won't work if you one have 1 or 2 bugs that people are actively looking for in the first place. So, my question is - how can they be sure it is bug free?
-- Virtual Windows Project
So people don't see lines like this in the code:
#Shuttle Waste Dump
#
#I dunno WHY this works, but it does!
Call on God, but row AWAY from the rocks!
I have witnessed the same thing myself - the process becomes an end in itself. People forget that what we're paid for is software, the process is only useful to the extent that it helps us to create better software.
I can almost hear the moans from the pizza-and-coke crowd whem they read this: "Where's the fun? Where's the creativity?". But they're under the mistaken assumption that putting lines of code into the editor is the only fun thing about developing software.
IMHO, software development is full of fun activities. What about analysis and design? In my experience, that's where the creativity really comes into play. Just talking to the customer, understanding the problem and making a working design is really difficult, and hence rewarding when you pull it off.
And what about the process itself? Software development is a young dicipline, where individuals and small groups really can make an impact. Nobody really knows how to make good software. Maybe you'll be the one to find out? As the man says, in the shuttle software group, people use their creativity on improving the process.
And last, but not least, I bet those guys have a really good feeling when they talk to the customer after delivery. Not like some people I know, who just hide. ;)
If you can't see the fun of these other activities, maybe you shouldn't be working in this field...
A)bort, R)etry or S)elf-destruct?
From a programmers perspective, there are two reasons to pull an all nighter.
1) you are in the "groove", where ideas and code flows naturally. This is very common amoung creative people, artists and musicians and such.
2) forced OT to met a deadline.
The first is a very good thing, and the latter is where the bad reputation of all nighters comes from.
One way of increasing the reliability of software is to use n-version programming, whereby you implement several versions of the software, written by different people, and then create a voter system that constantly compares the data of each program and forwards the consensus one. Even if none of the programs agree, the voter 'knows' that something is amiss and can alert the pilot/engineer/whatever. I'm doing my PhD on this, and I know that NASA has implemented quite a few n-version systems, as well as the more tried and trusted multiple-redundant hardware. I heard somewhere that the space shuttle code costs the equivalent of $100,000 a line (feel free to tell me I'm wrong if you know the 'true' figure) so it might be worth considering. Certainly a number of prominent academics reckon that you can get a 45:1 improvement in a software system by implementing 3 channels as opposed to a single good system. Blah, anyway, that's my $.02 worth.
Lets see,
- half a million LOC (that's small)
- under development for 20 years
- new requirements are avoided at all cost
So it is a small, long lived project with nearly unlimited budget. No wonder they can afford to have such a process in place. But now realistically, how long does it take to set up such a project from scratch. How about having a customer who does not know what he wants. How about deadlines of less than 10 years from now.
I honestly believe that this way of delivering software is optimal for nothing else but long lived, multi billion dollar projects. In any other case you'll end up with something that is delivered years to late, indeed matches the requirements of 10 years ago and is close to useless.
Unfortunately many software companies are in a situation where they can't afford to wait for perfect software. Take mobile phones as an example. Typically these things become obsolete within half a year after introduction. The software process is what determines time to market. Speed is everything. If you can deliver the software one month earlier, you can sell the phone one month longer.
Of course testing, requirementspecs and software designs are usefull for any project but it's usually not feasible to do it properly.
Jilles
They had a better project leader.
The company that I work for has a very similar environment. We have over 60 developers, and probably over 300 people working on the project. Not as critical as a space shuttle, but we manage wireless phone networks. Processing over 1 bill. calls a month for nearly 8 mill subscribers requires the same sort of accuracy. If we screw up, or client looses money and customers.
The big difference is that we're organized, follow a excellent software development process, and we pipeline well. The design processs involves several reworks of design documents and code go through several reviews before it is even tested.
moral of the story...organize up front, less problems later
If you want a pretty good (if a bit outdated) overview of NASA and the shuttle program (warts and all), see if you can find a copy of "Why do you care what other people think?" by Richard Feynman. The second half of the book is about his involvement with the Challenger Accident Investigation. He also had very high praise for the shuttle software group, and adds a few comments/items that weren't covered in this article. Highly recommended!
How primitive! So they don't want an 'ice' or 'peach' flavored powermac to do their numbercrunching.
At the intersection of computation and biology.
"What I cannot create, I do not understand."
I'm working on a large NASA project now. I have determined that the purpose of this project is not to produce a working software system, but rather to produce a wall full of loose-leaf binders of incomprehensible documentation that no one will ever refer to again.
The process says we must have code reviews - great! But instead of being an analysis of the logic of my code, it turns into a check against the local code formatting standards - "You can't declare two variables with one declaration, use int a; int b; instead of int a,b;" (yes, that's an actual standard around here) instead of "Hey, if foo is true and bar is negative, you're going to dereference a garbage pointer here!"
The forms are observed, but the meaning is forgotten, like Christians going to church on Sunday then cutting people off and flipping them the bird on the drive home.
"Process" won't save us. Which doesn't mean that a certain amount of it can't help, but there is no silver bullet.
Tom Swiss | the infamous tms | my blog
You cannot wash away blood with blood
I happen to work just down the hall from the guys who maintain and upgrade the shuttle Flight Software (FSW), and I can tell you they have a rigorous design, inspection, and test sequence that they go through before they fly new or modified code. The story around here (which I have no reason to doubt) is that the FSW team was one of the first SEI level-5 certified shops in the nation.
I can also tell you that NASA avoids having to make unnecessary changes to the FSW. For example, the new "glass cockpit" recently discussed here on Slashdot: when these upgrades were designed, they chose to design the interface to the new display modules to exactly mimic the interface to the old intruments. In other words, they are true plug-and-play replacements; one significant reason for this was so the flight software didn't have to be modified.
Likewise, people often ask why the shuttle continues to use such antiquated General Purpose Computers: slow, 16-bit machines designed back in the seventies. There are many reasons, but a big reason is that new hardware would almost certainly require massive changes to the flight software. And rewriting and recertifying all that software would be a huge task. The current FSW works reliably; if it ain't broke...
Huzzah! As I type, we just launched Atlantis. Go, baby, go!
--Jim
If everyone would simply use VDM/Z or Larch/CLU for all their development work, it would be much easier for us to prove our software is correct, and then all bugs would be a thing of the past.
It really is that simple. Don't these people remember what they were taught at college ?
Notice the article similar to ISO (International Standards Org), everything revolves around the 'process'. The result is determined by the process. I use to work for a company that had a documented process for everything...from software devlopment right through to filling out your wage timesheet! I think the important thing to note is that it all depends on the 'culture' and type of the organization. If people accept this style of operation then it's great. For a oraganization that has to program software that directly deals with lives at stake, there must be a 'process' to ensure the s/w written perfectly (and tested).
I have come across fellow works where they absolutly hate this type of practice... well they probably best suited for development in non-critical life threatening systems.
"C is a great, if complicated language. It's simple, yet can get complicated very easily..."
It's complicated.
It's simple.
It's complicated again.
The article gets worse from there.
--
Peter
Some of my most succesful programs (read, they actually worked or there abouts) came about because I was in a funny mood and decided to actually plan it out. From what I hear about in the real world, some (but by all means not all or even most) programmers look down on clients just because they don't know much about programming. They assume that just because they have a certain expertise over others that they somehow know more than them in general.
The good thing about the way software is written here is that the requirements are written down and sorted out before they even do the planning. How many prgrammers, groups, firms etc. can say that. I will admit, though, that a major problem is changing requirements. Something that just happen in the same way for NASA. It might just be better if people decided to wait a bit before jumping in to the programming. They'll save themselves more time and money in the long run.
I blame the captain when the ship goes down, and I blame the managers (starting with the CEO) when coders are staying up late and projects are overbudget. It's fun to pull an all nighter and be a hero every now and then, but if you have good programmers, it's not their fault when things go south. In my experience, projects fall apart because managers didn't allow enough time for specification and didn't stick to the original specs.
On a separate point, the error rate given (99.9% of all errors are caught) is statistically bogus. First off, they mean 99.9% of the errors that were caught or manifested were found. They don't know how many errors that haven't manifested themselves are still there. It is possible to predict such numbers given bug rates and other data, but that is a prediction, not an actual accuracy rate. The real issue I have with the number is that they are saying that the first 80% are caught before QA gets the code. I call that "making my program work." If I counted every problem I fixed before I handed over my code, then my rate would be pretty good too.
That said, the shuttle group is awesome. Special thanks to them for enabling the most important endeavor of this century.
magic
-ryan
"Any way you look at it, all the information that a person accumulates in a lifetime is just a drop in the bucket."
I never quite understand why it is an act of macho bravado to work all night and live off pizza. It indicates two things 1) A badly run project and 2) poor maintainability in the code.
In one of my previous incarnations I worked on display systems for Air Traffic Control, where the quality level was also very high, where the performance requirements were exacting and the specifications precise.
Some would think that this means simple and boring... Of course not. Having to display a track from reception at the Radar to the display in 1/10th of a second isn't easy by any stretch of the imagination, and to do it so it works 100% of the time means you have to understand the problem properly rather than coding and patching.
If only more projects worked like that then there would be a lot less bugs in the world.
An Eye for an Eye will make the whole world blind - Gandhi