Richard Feynman, the Challenger, and Engineering
An anonymous reader writes "When Richard Feynman investigated the Challenger disaster as a member of the Rogers Commission, he issued a scathing report containing brilliant, insightful commentary on the nature of engineering. This short essay relates Feynman's commentary to modern software development."
http://duartes.org.nyud.net/gustavo/blog/post/2008/02/20/Richard-Feynman-Challenger-Disaster-Software-Engineering.aspx As a side note, could someone make a grease monkey script to make all links frmo /. run through coral? it just makes sense
Nothing great was ever achieved without enthusiasm
The Kansas City Hyatt Regency walkway collapse was an engineering problem. The contractor asked to take a shortcut (instead of threading a nut up a three story threaded rod, they asked to cut the rod and offset it several inches) and the engineers rubber-stamped it without checking what the ramifications would be. The engineering part was not originally flawed, but it was when they approved the change order.
I called it a mighty Sperm Whale, she called it Finding Nemo.
There is a point you miss there I think. It is the top-to-bottom design philosophy vs the bottom-to-top. The first one gives objectives first then designs every part so that it fulfills the general objective. The latter focuses on designing simples elements and assemble them as more complex elements with defined capacities and known weaknesses.
This article states that the second approach is inherently better than the top-to-bottom approach. This is clearly an engineering problem. I am not sure I agree with the conclusions and acknowledge that most of the Challenger disaster was due to unwelcomed pressure, but I don't think you can dismiss the whole issue as not concerning engineering.
The Wise adapts himself to the world. The Fool adapts the world to himself. Therefore, all progress depends on the Fool.
I agree, and tried not to summarize at all. Mostly I just tried to link what Feynman said to software, rather than make a fool of myself paraphrasing him. That's also why the entry is really short, and basically tells people to go read the source :)
cheers.
Apparently you've never taken engineering ethics. The first class I had to take as a general engineering major. Needless to say, I changed majors but still got a hell of a lot out of that ethics class. The parent was right. These were all cases of cutting corners, either in terms of cost or time. Managers wanted it done quickly and cheaply, whether that meant mixing concrete improperly, or buying sub-par materials, or just ignoring what the engineers are telling them. It always came down to about 95% managerial and the rest engineering error.
Absolute power corrupts absolutely. indymedia
http://www.networkmirror.com/LBKPk3ml3LEozZTj/duartes.org/gustavo/blog/post/2008/02/20/Richard-Feynman-Challenger-Disaster-Software-Engineering.aspx.html
"I'd rather be a lightning rod than a seismometer." -Ken Kesey
I don't have my copy of Visual Explanations handy, but I've read it and I was at a talk Tufte gave on this subject, and my recollection of it is rather different. Without directly criticizing Feynman, Tufte actually comes up with a significantly superior analysis of the root cause of the disaster. Feynman spread he blame around many places, finding bad science, bad engineering, inaccurate statistics, poor procedures and documentation, politics influencing design, and most importantly and famously, a disconnect between management and engineering leading to overconfidence. Everything he found is right. But Tufte took the analysis one step further and came up with a completely convincing "one point where it all went wrong." That point was the inability of the booster rocket contractor's team to effectively present information.
The day before the Challenger's final launch, the team that designed and manufactured the booster rockets called Mission Control and said that they thought the launch should be aborted because an O-ring on a booster would be likely to give out due to cold and cause the Challenger to explode. This team was not previously known for being overly cautious; in the previous history of the shuttle program, they had never before recommended aborting a mission. The next day, the challenger launched and the booster rocket blew up exactly the way the team that made it said it would.
This seems like an inconceivable oversight on the part of Mission Control. When the team that designed the rocket told them it was going to blow up, how could they possibly go ahead and launch? The hubris, the pride, the thick-headed showmanship.
Well, Tufte dug into this and found out exactly what happened. Mission control told the rocket team to prepare a presentation about why they thought it would go wrong. The team did so and presented that to Mission Control. Tufte interviewed many people about the specifics of that meeting and actually managed to reassemble the original slides shown during the talk. And anyone viewing the information presented by the booster rocket team to Mission Control will have trouble faulting Mission Control, because the presentation was absolutely incomprehensible.
The booster rocket team's argument was supposed to be that for each previous launch, the amount of subsequent damage found in the O-rings was inversely proportional to the temperature at launch. They had all the data. They were all scientists and engineers. Tufte used their data to construct a graph of O-ring damage vs. launch temperature. Showing that graph and the weather forecast for the launch day to anyone in charge would have gotten the mission cancelled in a second. But the team, that was there to argue that low temperatures correlated with O-ring damage, never presented a single intelligible piece of data demonstrating that, even though they had all that data with them. Instead, they showed a chart of O-ring damage vs. launch date, and another chart several pages later with temperature vs. launch date.
I've read Adventures of a Curious Character and have the utmost respect for Feynman. Every problem Feynman outlined in his analysis was a real problem that NASA should fix. But none of it really pinpointed the exact cause of the disaster. Feynman mostly chalks the failure to postpone launch to management's disconnect from engineering, from their mistakes and lack of understanding and therefore overestimating the safety of the shuttle. This puts the blame in the wrong place. The managers were no where near being so overconfident that when the engineers who designed the part that failed knew it would probably fail in exactly that way and tried to halt the launch, they'd just brush them aside and go ahead with it. They listened carefully; the engineers had data that would make a great case, but it was presented so incompetently that no one at that meeting would have thought they had a case at all, they simply appeared to be overly cautious, because they did not present any data demonstrating their point.
Can anyone tell me how to set my sig on Slashdot?
If you like Faynman here are some of his lectures. http://vega.org.uk/video/subseries/8
Marcus Ranum has an interesting talk (MP3) in which he discusses Feynman's Challenger commentary at some length in the context of designing reliable/secure software systems.
The talk gets off to a bit of a rough start (see Ranum's comment below), but contains much insight and makes a lot of sense before long. Highly recommended for those in the software development field, where the approach is often 'throw it together, then poke at it and patch it until it stops obviously breaking'; the rigour Feynman & Ranum describe may be overkill for some systems, but exposure to this other approach can help make most of us better developers. I found it helpful, anyway—your mileage may vary.