Murphy's Law Rules NASA
3x37 writes "James Oberg, former long-time NASA operations employee, now journalist, wrote an MSNBC article about the reality of Murphy's Law at NASA. Interesting that the incident that sparked Murphy's Law over 50 years ago had a nearly identical cause as the Genesis probe failure. The conclusion: Human error is an inevitable input to any complex endeavor. Either you manage and design around it or fail. NASA management still often chooses the latter."
while it's possible to always have a mistake, having people double check a project from the ground up will almost always find the problems. Nasa's current difficulties arise from scattered teams that all only check their parts rather than having fully qualified teams that go over the entire vehical. The fact that the whole thing is usually designed by committee and in several pieces then assembled at the last minute probally helps facilitate error. The Saturn V rockets and other technology we used to land on the moon had hte capability of being far less relyable than today's technology but we still managed to use them for years without error.
It's actually more cost effective to allow for failures. You build the same sat 5 times and if 4 fail in a cheaper launch situation, you still save money.
From this article:
"Swales engineers worked closely with Space Sciences Laboratory engineers and scientists to define a robust and cost-effective plan to build five satellites in a short period time."
The fact that human error isn't compensated for is the true human error that needs compensation.
I think I just sprained my brain thinking up that one.
Do really dense people warp space more than others?
The problem with errors is that detecting all errors all the time is absolutely impossible. Think back to your intro theory cs class and to Turing Recognizability. Think halting problem. Now, reduce the problem of finding all errors to the halting problem:
if (my_design_contains_any_errors) while(1);
else exit;
Feed this into a program that halts on all input and see what happens. You can't, because we know it is impossible for it to always return an answer. QED: errors are unavoidable. No need to sniff derisively in the direction of NASA's "middle management". Let's see if YOU can do a better job!
Here are some of the highlights:
However, with that being said I really do not believe Engineers are the problem at NASA. Bureaucracy is the enemy at NASA. NASA needs a complete top to bottom overhaul.
What large corporations have been doing is Soviet style central planning. What happens is that they get stuck with mediocre or sucky software that they cannot replace. Eventually, a few smaller companies start up that manage to have good software (out of many that fail in part because of sucky software) which gives them a competitive advantage. These either get bought up by or grow into ossified bureaucratic behemoths with no internal competition.
Sometime a corporation is going to become the Bazaar within, instead of the Cathedral (Cathedral & the Bazaar) and they'll maintain a long term competitive advantage by having internal competition.
I'm not holding my breath, however.
Scaled Composites, for example, has demonstrated a suborbital craft capable of barely reaching space for a cost of around $25 million. In comparison, NASA developed and flew three X-15 prototypes with similar capabilities for a cost of $300 million in 60's dollars (which incidentally was considered a cheap program).
With the small difference that Scaled Composites is benefitting from 30 years of technology advancements. I don't think that an equivalent company of the 60s could build three SpaceShipOnes for $300 million.
"F1 race cars, Racing Sailboats, Nuclear Reactors - NO design is failsafe, and NO design is foolproof."
Not true: there are failsafe nuclear reactor designs that even a genius couldn't manage to melt down, let alone a fool... you just have to design them with safety guaranteed by the laws of physics, not the control systems. General Atomics built a lot of them decades ago, and the Chinese are developing modern versions today.
Good design prevents a heck of a lot of problems. If nothing else, you'd have thought that by now engineers would realise that if you design something so it can be fitted backwards, sooner or later it will be.
Hah! Engineers are the most intelligent bunch of idiots you'll ever find. The problem with engineers is that often their own cleverness and/or familiarity with the item they're designing blinds them to the viewpoint of someone who's "not clever" or totally new to the item. With (for example) the classic non-reversable, yet perversely symmetrical accelerometers, it probably never occured to the engineer designing them that someone could "not know" which end goes up. Sometimes it looks like just plain stupid engineering, like with a particular telephone PBX control system I work with. It has two expansion slots, Slot 1 and Slot 2. When you want to add only one expansion card, where do you put it? Slot1? No, that's too obvious. You put it in Slot 2. If you out a second card in later, that goes in Slot 1. At first I thought it was just an error in labeling the slots on the cabinet, but then I noticed that the circuit board itself is marked the same way! I'm sure there's a perfectly rational reason for it that makes sense only to the engineers who designed the system.
If a job's not worth doing, it's not worth doing right.
The jet liner to which you refer, I think, is the Gimli glider which, through a forehead-slapping number of independent goofs, ambiguities, and misunderstandings made by a frighteningly large number of people, ran out of fuel over Cananda in 1983.
Strike the word complex from the above quotation.
As a software developer that's responsible for developing protocols for various tasks, I've learned that any system needs to be robust against failure and should also fail safe. All too many times I've seen people come up with systems that function well when every part works exactly as it should, but blow up in terrible ways when a single mistake is made. For example, consider using the bronze-gold-silver way of doing revision control versus a real revision control system like CVS et al. The former system works only so long as people copy the proper file from one area to another every time, for as long as the system's in use. I've witnessed developers completely trash a production environment by accidentally copying old files into the gold area.
Mistakes are going to happen and processes won't be followed 100% all of the time. The key is to design systems that expect this to occur and provide ways of dealing with the failure.