Murphy's Law Rules NASA
3x37 writes "James Oberg, former long-time NASA operations employee, now journalist, wrote an MSNBC article about the reality of Murphy's Law at NASA. Interesting that the incident that sparked Murphy's Law over 50 years ago had a nearly identical cause as the Genesis probe failure. The conclusion: Human error is an inevitable input to any complex endeavor. Either you manage and design around it or fail. NASA management still often chooses the latter."
Someday all decisions will be made by machines. We'll just sit back while they do all the work. Then, no more human error.
My sig would have been a lot cooler if
while it's possible to always have a mistake, having people double check a project from the ground up will almost always find the problems. Nasa's current difficulties arise from scattered teams that all only check their parts rather than having fully qualified teams that go over the entire vehical. The fact that the whole thing is usually designed by committee and in several pieces then assembled at the last minute probally helps facilitate error. The Saturn V rockets and other technology we used to land on the moon had hte capability of being far less relyable than today's technology but we still managed to use them for years without error.
It's actually more cost effective to allow for failures. You build the same sat 5 times and if 4 fail in a cheaper launch situation, you still save money.
From this article:
"Swales engineers worked closely with Space Sciences Laboratory engineers and scientists to define a robust and cost-effective plan to build five satellites in a short period time."
``Human error is an inevitable input to any complex endeavor. Either you manage and design around it or fail.''
This is a very good point, and I wish more people would realize it.
For software development, the application is: Just because you can write 200 lines of correct code does not mean you can write 2 * 200 lines of correct code. Always have someone verify your code (not yourself, because you read over your errors without noticing them).
Please correct me if I got my facts wrong.
That's right, blame it all on the Irish. After all, it's not like anyone else ever screwed up...
See what I've been reading.
>>Either you manage and design around it or fail. >>NASA management still often chooses the latter.
This is hindsite at its best, and is the classic comment by beareaucrats who have no concept of what cutting edge design is about. F1 race cars, Racing Sailboats, Nuclear Reactors - NO design is failsafe, and NO design is foolproof. Especially a one off design that isn't mass produced. Even mass produced designs have errors, like in the Auto Industry. It is a simple fact of life that engineers and managers balance Cost and Safety constantly.
What you SHOULD be comparing this against is other space agencies that launch a similar number of missions and sattelites - i.e. other real world examples.
Expecting perfection is not realistic.
Why be different than any other management?
Proud neuron in the Slashdot hivemind since 2002.
Human error is an inevitable input to any complex endeavor. Either you manage and design around it or fail. NASA management still often chooses the latter.
.. but I cant think what it is exactly. Along lines of human manages and designs the error handling dont they?
There's a contradiction in that above statement
That said nothing wrong with building in redundancy and failsafes
In space probes redundancy comes at the cost of number of unique mission goals and financial cost.
Sometimes you just have to eat the failure, thats what insurance is for. We in the public shouldn't always expect NASA to have 100% failure free (non human) missions and then extract harsh punishment on them which invariably gets passed down to engineers and not management decision makers.
Wuith the current attitude NASA of old would have been shutdown in the first couple of years for wasting tax payer money. Luckily there was competition with Soviets.
The fact that human error isn't compensated for is the true human error that needs compensation.
I think I just sprained my brain thinking up that one.
Do really dense people warp space more than others?
All moderately-complex projects have to be built around:
1. change
2. error
Sig for sale or rent. One previous user. Inquire within.
The problem with errors is that detecting all errors all the time is absolutely impossible. Think back to your intro theory cs class and to Turing Recognizability. Think halting problem. Now, reduce the problem of finding all errors to the halting problem:
if (my_design_contains_any_errors) while(1);
else exit;
Feed this into a program that halts on all input and see what happens. You can't, because we know it is impossible for it to always return an answer. QED: errors are unavoidable. No need to sniff derisively in the direction of NASA's "middle management". Let's see if YOU can do a better job!
Since scientists have defined Murphy's Law, no-one should have any excuses for letting anything that can, go wrong!
If you compare the advances to Science and Knowledge due to mistakes rather than deliberate acts, it might come out that everything is a mistake.
Recently I took a class on AI (insemination, not intelligence) and apparently the two biggest breakthroughs by Dr. Polge, in preserving semen were due to mistakes. First, his lab mislabeled glycerol as fructose and they were able to find a good medium for suspension. Secondly, he blew off finishing freezing semen to go get a few pints and didn't make it back to the lab until the next day thus discovering that it was actually better to not freeze the stuff right away.
Mistakes are some of the best parts of science and life in general. It's best to try to make more mistakes (i.e. take risks) than it is to try and always be right. (unless you are obsessive compulsive).
Its important not to loose sight of the harshness of the enviroment these systems are designed to operate in. As a simple example striking a match is an easy task to perform yet people trapped in cold remote areas have died because they were under too much stress to light a fire even though they had the tools. Its also question of consequences when something goes wrong, and space is not very forgiving.
Do not try to read the dupe, thats impossible. Instead, only try to realize the truth
What truth?
There is no dupe
I think the biggest difficulty surrounding large organizations is the lack of communication tools linking the right engineers together. It seems unfathomable that some of these mistakes were able to propegate throughout the entire engineering process and nobody caught them.
Unless you consider the fact that often in large organizations, the left hand typically has no clue what the right hand is doing. I work at Lockheed Martin, and typically I'm involved in situations where one group makes an improvement that then none of the other groups know about, changes/decisions are poorly documented (if at all) so nobody knows where the process is going, people making poor decisions due to lack of proper procedures from management about what to do, teams not being co-located, poor information about which people have the necessary knowledge to solve a particular problem, or any number of things that confuses the engineering process, to the detriment of the product. Most of these situations are caused by a lack of communication throughout the organization as a whole.
This is a serious problem, and it needs to be acknowledged by the people in a position to make a difference.
"Either you manage and design around it or fail. NASA management still often chooses the latter."
I find this remark very unfair. It is a really nasty snide attitude to it, like "we are perfect - why can't you be."
Come on guys, NASA is trying to do some really difficult and ground breaking stuff here. Cut them some slack.
I love how journalists and others like to sit back and criticize these engineers' efforts. They are human, and they will do stupid things. Having been trained as a mechanical engineer (although I mostly do software engineering now), I have some idea of how many calculations have to be made to design even one aspect of a project. I couldn't imagine the complexity of such a system, trying to account for every scenario, making sure agorithms and processes work as planned for ONE mission. No second chances. That we have individuals willing to dedicate the mental efforts to this cause at all is worthy of praise. These people have pride and passion in what they do, and I'm sure they will continue to do their best.
For anyone wanting to yack about poor performance... put your money where your mouth is. I just get sick of all the constant nagging.
this is my sig, be amazed.
Here are some of the highlights:
... not mistakes. Have a look
CC.
TaijiQuan (Huang, 5 loosenings)
Very comforting to know how easy it is to wire the safeties on nuclear weapons up backwards.
Best Slashdot Co
After all this post has GENESIS and outer space in it.
from the article:
"After all, these switches were reportedly developed as a nuclear warhead safety device, so one could just assume that they were properly wired."
Nice to know those safety devices are foolproof.
-Styopa
Maybe we should pass the hat and send every NASA manager a copy of _Systemantics_, for their enlightenment. (Likely the scientists and engineers already have their own copies.)
I guess I have passed the threshold where I even think of these jokes. That actually was funny and I didn't even realize it. er, my mistake...
It's not too difficult to become immune to it talking about semen all day. We even have post it notes and pens with pictures of semen on them.
I guess that is the fate of working in AI. On the upside, it must have an effect on overall fertility. Most of the people in this company seem to have 3 kids minimum!
As the article quotes Murphy's Law: "Every component than can be installed backward, eventually will be."
However, with that being said I really do not believe Engineers are the problem at NASA. Bureaucracy is the enemy at NASA. NASA needs a complete top to bottom overhaul.
Thats a very popular cliche. The fact is with NASA's shrinking budgets, they don't have the resources to design around potential failures. There's old school NASA that desinged the Cassini probe that has redundant systems and is properly designed and tested, and there's new school NASA that makes the cheap Mars probes. Just looking at the Mars probes you'll see why they have moved to this method. If you can make five fault intolerant probes for the same cost as one fault tolerant probe, and odd are that only two of the five work, then its a better idea to build the five crappy probes as you'll probely get twice the science benifit. The problem comes once you start throwing in human lives. Its okay if a manless probe crashes, its only things, but if you apply the same logic to manned missions.. you get the Columbia accident. Its not that NASA intentionally overlooked the problems because they expect people to die, its just that the methodology from the non manned flights have crept into their minds. At least thats my non-expert opinion.
But I guess it sort of applies to your software analogy as well. There have been a few companies who have discovered that its cheaper to have paying customers find the flaws in their software, rather than do any kind of formalized testing before release.
Well.. maybe. Or Maybe not. But Definitely not sort of.
I'm not sure I buy that completely. While it certainly would help to have a single SME go over the entire vehicle, I doubt such a person could exist and complete the checks in a reasonable amount of time. The guy who checks the computer code is probably not going to be an expert in metal fatigue, nor electrical engineering. Even if you could find some sort of uber-genius who had expert knowledge of every system, he or she would have to work serially. If they started at component "1" of 654224166 and went down the line in order, the checks they started with would be out of date by the time they finished.
The bottom line is that if you want someone to do something for you, make sure they UNDERSTAND what they are doing and why. If someone installs a fitting backwards for you, it's your fault because the installer clearly didn't understand what was required.
This is bullshit. Sometimes people can't be made to understand something, and they are in a position above you so you can't do anything about it. For example, I once found a big problem with some inhouse software I was helping develop. I couldn't fix that part because the code wasn't even accessable to me, and I didn't know how to program in that language. So I told the manager - "hey, this is a big thing and when X happens, we're gunna get screwed". But he ignored me. So I sent out a memo detailing the problems. I got ignored. I told the guy who wrote the thing to fix it, got ignored. Finally they told me to shut up or be fired. After awhile, everything got fucked up because of the problem. Who gets the blame? Me - and they said it was because I didn't make them understand. There is no cure for willful ignorance.
I submitted this story last night, and it didn't get posted.
Here's the real reason for NASA and their errors, as quoted by Gordon Cooper a former astronaut.
"Well, you're sitting on top of this rocket, about to be flung into the most hostile environement know to man, and you keep thinking, 'Everything here was supplied by the lowest bidder.'"
Live forever, or die trying.
What large corporations have been doing is Soviet style central planning. What happens is that they get stuck with mediocre or sucky software that they cannot replace. Eventually, a few smaller companies start up that manage to have good software (out of many that fail in part because of sucky software) which gives them a competitive advantage. These either get bought up by or grow into ossified bureaucratic behemoths with no internal competition.
Sometime a corporation is going to become the Bazaar within, instead of the Cathedral (Cathedral & the Bazaar) and they'll maintain a long term competitive advantage by having internal competition.
I'm not holding my breath, however.
NASA does test everything. He didn't mention in the article, but I would be almost certain that the accelerometers were tested, and passed the tests; but that the tests themselves were improper.
There might always be errors which you can reduce with many checks. The key is to have the checks done by someone who has an eye for potential problems. There is a particular skill set/personality that can forsee unknown problems better than say an engineer who is single minded and focussed. You can get a hundred experts to check the same work but often it's the one guy who says "why is that wheel upsidedown" that reveals a completely unanticipated problem.
It is a difficult thing to design something to face failures. It requires a mind set vastly different than that of most "builders of things." Those folks tend to think in the positive: my creation does this, and this, and this, and ... This is true whether the thing being built is a program, a car, or a team of people.
:-)
If you want to see this in action, find your favorite developer and ask the following: "What does your program do, and how doe it do that?" Prepare for a long response
Then ask: "How does it break?" You will most likely get a blank look. You may get a list of things the program doesn't do (missing or removed features), or possibly a list of known bugs, but you will almost never get an answer detailing the failure modes of the program itself. That is, they will not be able to tell you what happens when various assumptions are wildly wrong.
Answering those sorts of questions requires thinking in the negative (not necessarily negative thinking), which is an entirely different mode of thought. It's also much less pleasant. After all, considering the destruction of the beautiful thing you've built is not a psychologically easy task.
And the subsequent controversies...
is here.
(This paper won a prestigious 2003 Ig Nobel award for engineering.
W
-------------------
This is my SIG. There are many like it, but this one is mine.
Summary of The fastest man on Earth:
George Nichols: "The Law's namesake, was Capt. Ed Murphy Jr., a development engineer... Frustrated with a strap transducer which was malfunctioning due to an error in wiring the strain gauge bridges caused him to remark-- 'if there is any way to do it wrong, he will'-- referring to the technician who had wired the bridges. I assigned Murphy's Law to the statement and the associated variations..."
David Hill: "Murphy was kind of miffed off. And that gave rise to his observation: 'If there's any way they can do it wrong, they will.' I kind of chuckled and said, that's the way it goes. Nothing more could be done really."
John Paul Stapp: "we do all of our work in consideration of Murphy's Law. [defined as] the idea that you had to think through all possibilities before doing a test."
Dr. Dana Kilanowski: "at the time I believe Stapp said something like, 'If anything can go wrong he'll do it.' A couple days later there was a press conference in Los Angeles and Stapp said something like, 'it was Murphy's Law -- if anything can go wrong, it will go wrong.' [...] I have heard that Murphy claimed he invented Murphy's Law, but Stapp is the one noted for his witticisms, his haikus, and his plays on words."
Ed Murphy: "I didn't tell them that they had positively to orient them in only one direction. So I guess about that time I said, 'Well, I really have made a terrible mistake here, I didn't cover every possibility.' And about that time, Major Stapp says, 'Well, that's a good candidate for Murphy's Law'. I thought he was going to court martial me, but that's all he said. [Stapp reeled off a host of other Laws, and said] 'from now on we're going to have things done according to Murphy's Law'."
Chuck Yaeger: "Look, what you're getting into here is like a Pandora's Box. Goddamn it, that's the same kind of crap...you get out of guys who were not involved and came in many years after."
And in the end it wasn't as extreme a failure as Genesis:
According to Nichols the failure was only a momentary setback --"the strap information wasn't that important anyway," he says -- and regardless good data had been collected from other instruments. The Northrop team rewired the gauges, calibrated them, and did another test. This time Murphy's transducers worked perfectly, producing useable data. And from that point forward, Nichols notes, "we used them straight on" because they were a good addition to the telemetry package. But Murphy wasn't around to witness his devices' success. He'd returned to Wright Field and never visited the Gee Whiz track ever again.
...it's because humans have this weird deal with society. We wind up with the greediest, lamest most megalomaniacal people for governmental/corporate "leaders". 999 out of a thousand are this way, it just happens, we notice it.
These people are quite *insane*. they may be brilliant, but still bonkers. They have the most power and money of everyone on the planet. They hire the smartest people they can find, and reseaqrch advanced weaponry. All governments spend a huge amount of time and money and resopurces on this. they hire the smartest scientists and engineers they can find for this task. then the hire the people who psychologically and intellectually are the most prone to use these devices that the scientists and engineewrs create. these people are given more power than "ordinary" citizens, they are tasked with killing people and breaking peoples things, using these advanced machines. This weaponry, consisting of mechanical machines augmented with electrical and chemical advances, are *exactly* designed to "harm humans" and they DO harm humans with this machinery. Happens every day around the globe, by the thousands. Literally thousands of humans a day are killed, and many more horribly mutilated and injured. And the way the system has evolved, it is rigged to always have the megalomaniacs wind up "in charge" and all populations have a certain percentageof "ask no questions" order followers.
So, stuff happens,evil wicked nasty horrible screaming stuff. This leads to this "fear" which isn't in the least bit an irrational fear for anyone sane to have. It's because it's reality.
Lately, we can read that they want to automate and robotosize this even further, and to take these machines as far as they can push it with near unlimited budgets and millions of man hours of advanced research. It is not a "tin foil hat" phenomenon for folks to notice that. We also have a veifiable past track record to show that yes indeed, these megalomaniacs tame scientists and engineers and order followers screw up, we get what is called "unintended consequences" and "collateral damage", as if the intended consequences and planned-for damages aren't bad enough. So we as ordinary humans all around the globe who really do not have a beef with joe over there all get to have these "benefits", and we notice that we don't want those sorts of benefits, but there's not a thing we can do about it, because this advancing technology system is rigged in favor of those who like and enjoy and profit-from doing harm.
You see, we DO have a lot of at least technically "competent people programming the machines", the problem is, they ARE designed to harm. And it's set up to be self perpetuating/advancing and is based from the git-go on forced wealth transference, ie, "theft" and it goes down hill from there into every worse things..
The Origin of Murphy's Law.
Couldn't Murph just be an extrapolation of the fact human's tend to remember the bad, especially the really bad, over success?
-----
The jet liner to which you refer, I think, is the Gimli glider which, through a forehead-slapping number of independent goofs, ambiguities, and misunderstandings made by a frighteningly large number of people, ran out of fuel over Cananda in 1983.
Forget Murphy's Law, why cant Moore's Law win out at NASA?! I want to see the time it takes to get to Mars HALVE every ten years... Hmmm, i guess then Murphy's Law would be second in this case tho so it might be extremely dangerous for the crew.
...and it should be known by now
genetic algorithms
`nuff said...
Note the "law" doesn't just torture NASA exclusively, it just rears its head very visibly in their case.
Strike the word complex from the above quotation.
As a software developer that's responsible for developing protocols for various tasks, I've learned that any system needs to be robust against failure and should also fail safe. All too many times I've seen people come up with systems that function well when every part works exactly as it should, but blow up in terrible ways when a single mistake is made. For example, consider using the bronze-gold-silver way of doing revision control versus a real revision control system like CVS et al. The former system works only so long as people copy the proper file from one area to another every time, for as long as the system's in use. I've witnessed developers completely trash a production environment by accidentally copying old files into the gold area.
Mistakes are going to happen and processes won't be followed 100% all of the time. The key is to design systems that expect this to occur and provide ways of dealing with the failure.
Murpheys law isnt actually "If anything can go wrong, it will go wrong", This is Finagle's Law of Dynamic Negatives.
Murphey orginally said "If there's more than one way to do a job, and one of those ways will result in disaster, then somebody will do it that way."
It's easier to fight for one's principles than to live up to them.
Human error is an inevitable input to any complex endeavor. Either you manage and design around it or fail.
There are times when the only choice is to use C because of certain very tight constraints, but C and C++ are Murphy magnets. You know there's something wrong when you have not just an entire book but whole SERIES of books devoted to hidden traps that are entirely due to the design of the language itself.
Using them when you don't have to on the grounds that you're so smart that you never fall into those traps is the sort of fundamental foolishness that makes the Murphy gods chuckle with glee.
"Those who have never entered upon scientific pursuits know not a tithe of the poetry by which they are surrounded."
Check out The Columbia Tragedy: System Level Issues for Engineering, a free video from MIT World. In this video Sheila Widnall, MIT Institute Professor (Engineering Systems Division, Aeronautics) talks about her work on the investigation board. She talks about NASAs "culture of invincibility" where well-intentioned people became "desensitized to deviations from the norm". That is not only something often goes wrong as the Murphy's Law says, but people learn to ignore this until the disaster strikes.
As Sheila noted, if an engine fell off from the shuttle, people tended to notice, act and do something about it, but when it was something small, like foam, they ignored the issue, even though it was clear that this anomaly warranted investgation, testing, etc.
Interesting presentation that shows not only NASA sometimes ignores the potential for human error, but ignores the actual errors when they happen.
Future Wiki -- If you don't think about the future, you cannot have one.
- Manager issues a stupid order.
- Subordinates obey order out of fear.
- Manager gains confidence that stupidity is a valid method.
- Stupidity gains an increasing foothold until a catastrophe occurs.
OR- Manager cuts another corner or cost.
- Nothing immediately bad happens as a result.
- Manager gains confidence that cutting is a valid method.
- The cuttings increase until a catastrophe occurs.
Managers are among the most moronic of the "educated" Western class[You have a stable society when some nut guns down a schoolyard and the law doesn't change.]
And supposedly God created humans so we should all blame him. Your windows xp crashed? Act of God.
Actually that's why I don't believe in the Christian God. God is infalleble but God created people. hmm...
Not specifically related, but you tend to see the same problems in medicine.
You have several specialized departments working with each other, each with distinct and massive protocols that don't mesh well.
Within certain regards, it almost seems the system is designed to fail by not designing for failure. Each department wants to give the appearance of toeing the company line even as internal systems fail. This reduces error reporting and functional problem solving because "it isn't SOP". Everyone is more concerned with placing blame than solving the problem.
And when the problems become so massive you can't ignore them, a new policy is issued, adding to the multi-volume sets of policies; which are internally conflicting, and effectively enforces mediocrity from all departments.
Add to this a shortage of people, people working under stressed conditions when lives are at stake, and you understand how a multitude of problems go undiagnosed simply because they are unknown. How do you know you are spreading nosicomial infections unless you check for them? And from what department are you going to pull the people to check, and how can you isolate the problem when each department is invested in covering their ass?
I haven't read "The Underground Text of Systems Lore", but I imagine a lot of the same observations would apply (I actually looked at it as applying chaos theory to system management in the ER, but the same ideas are expressed in "The Cluetrain Manifesto", and other publications).
Highly organized systems ultimately catabolism themselves unless they can restructure easily and effectively. NASA has not, medicine has not, and most political systems have not.
Something to think about when you in the OR at 2AM.
One of the biggest problems that NASA faces is its control by politically appointed bean counters, instead of real engineers. While I was subcontracting to NASA (in another lifetime), I saw tens of millions of dollars spent upon teleconferencing equipment, while engineering emails that raised questions (prior to launch) about insulating tiles on the SST went unanswered. The rest is, unfortunately, history.
After seeing this, wouldn't you say the same?
sig not found