Richard Feynman, the Challenger, and Engineering
An anonymous reader writes "When Richard Feynman investigated the Challenger disaster as a member of the Rogers Commission, he issued a scathing report containing brilliant, insightful commentary on the nature of engineering. This short essay relates Feynman's commentary to modern software development."
The problem with the shuttle disaster (both of them, really) is external pressures that are not in anyway at all scientific. The pressure from your manager at Morton Thiokol to perform better, faster and cheaper. The pressure from the government to beat those damned ruskies into space at all costs.
So this is really a case of engineering ethics, when do you push back? As a software developer, I never push back. Me: "There's a bug that happens once every 1,000 uses of this web survey but it would take me a week to pin it down and fix it." My Boss: "Screw it--the user will blame that on the intarweb, just keep moving forward." But could I consciously say the same thing about a shuttle with people's lives at stake? No, I could not.
So when an engineer at Morton Thiokol said that they hadn't tested the O-Ring at that weather temperature that fateful day and that information was either not relayed or lost all the way up to the people at NASA who were about to launch--it wasn't a failure of engineering, it was a failure of ethics. External forces had mutated engineering into a liability, not an asset.
And there's a whole slough of them I studied in college: * Space Shuttle Columbia disaster (2003)
* Space Shuttle Challenger disaster (1986)
* Chernobyl disaster (1986)
* Bhopal disaster (1984)
* Kansas City Hyatt Regency walkway collapse (1981)
* Love Canal (1980), Lois Gibbs
* Three Mile Island accident (1979)
* Citigroup Center (1978), William LeMessurier
* Ford Pinto safety problems (1970s)
* Minamata disease (1908-1973)
* Chevrolet Corvair safety problems (1960s), Ralph Nader, and Unsafe at Any Speed
* Boston molasses disaster (1919)
* Quebec Bridge collapse (1907), Theodore Cooper
* Johnstown Flood (1889), South Fork Fishing and Hunting Club
* Tay Bridge Disaster (1879), Thomas Bouch, William Henry Barlow, and William Yolland
* Ashtabula River Railroad Disaster (1876), Amasa Stone So I agree with Feynman's comments in relationship to engineering and the further comments to software development. But I don't find them to be a fault in the nature of engineering, just a fault in our ethics. What does capitalism and competitiveness drive us to do? Cut corners, often.
My work here is dung.
For a second there I thought I read "Rogers Communications" and "brilliant" and "engineering" in the same sentence. I thought I had been kicked to an alternate universe where I wouldn't be able to escape. I am glad to be back.
[alk]
To be fair, the Challenger disaster actually preceeded NASA's slogan and procurement policy of "faster, better, cheaper" by a bit. More to the point, Feynman's article should be a cautionary tale to ANYONE in a engineering field. It isn't a matter of one field being subject to unscientific pressures and another field being immune. No technology or industry is immune from the pressures and problems that caused the challenger disaster. Anyone who claims to be well adapted to safety concerns enough to not spend lots of time and effort on fixing them is foolish. The nuclear industry still has to practice strong QC on parts, procedures and maintenance and CONTINUE that practice. Same with commercial aviation, acute medical care, etc. Constant vigilance is rewarded only with another uneventful day. That is the fundamental problem. Vigilance is expensive and time consuming. these are not pressures from the profit motive. They apply to government as well as civilian ventures.
(I will refrain from a four-step Profit post). Standard technique: latch on to an essay by a brilliant and insightful person. Extend the insights of that person slightly into a different field with usual compare-and-contrast, brand-extension writing techniques. Claim that resulting essay (and self) are as insightful as the original essayist.
It doesn't work 99.994% of the time, generally because very few people are as insightful as the original brilliant person.
sPh
The blog post makes a nice contribution by linking to Feynman's original thoughts (for example, here: http://www.ranum.com/security/computer_security/editorials/dumb/feynman.html ), ones I haven't read for a long time (and was happy to be reminded of). However, the author makes the mistake of thinking that the original thoughts need to be interpreted and summarized for the reader. Feynman's words by themselves are simple to understand, are concise, and contain just the tone for which geeks go gaga. Anyone interested in the subject will be able to make his or her own judgements about the engineering and politics involved in the Shuttle development, engineering in general, and the extensions to software development.
Offtopic, but I highly recommend Surely You're Joking, Mr. Feynman, the autobiography he narrated on his deathbed. It's got some great stories in it, like when he surreptitiously went around picking locks at Los Alamos or his personal recollections of the Trinity nuclear tests.
The biggest problem is most software developers are NOT chartered professional software engineers, so have no personal, professional and legal responsibility for their work. That is why IT is full of cowboys and trust is nearly none existent. Software Engineers must become a chartered only profession, so that people who are not chartered are not allowed to practice.
To qualify as a Professional Engineer we should place good practice above short term gains. Professional Engineers should be truthful and objective and have no tolerance for deception or corruption. Professional Engineers only work in areas were they are competant. Professional Engineers build their reputation on merit and their skills through continual learning and the skills of their charges through ongoing mentoring.
We wouldn't have to put up with the shoddy work of cowboys, because they wouldn't be allowed to practice. We wouldn't have to put up with orders that counteract professional ethics or good practice, because legal responsibility trumps commercial pressures. The professional wouldn't be undermined by fast to market but poor quality work. We could place trust in third party tools, software & services and we would not have to put up with EULA that diavowed responsibility for damage.
I've been in software quality and testing for 14 years. I've worked at very large corporations as well as startups. There is a WIDE gap in software development process in our industry. Many people like to call themselves software engineers when they are developers. There is a huge difference. Engineering is a discipline that follows well-defined rules, and it usually takes time. But I think the very important thing to point out is that some software requires engineering - other software does not. If I go into a startup company that is trying to develop a blog/wiki site and try to implement a NASA-like software development methodology, they will fail. Likewise, software to control a heart monitor should be engineered and closely controlled. Sometimes quality and perfection is the goal, other times it might be time-to-market that is critical. You have to fit the process to your business. A bridge is a bridge, and they should all be engineered pretty much in the same way. You can't say the same thing about software.
I think that this is a very key point to software development. I have seen companies who spent entirely too much time and money trying to eliminate all defects from their software when it wasn't the critical part of their business. Yes, we should always strive to eliminate defects, but you can't get them all. You have to know when to pick your battles, and when to accept the risks. If we're talking about life-or-death software, or security, or other very critical things - you need to focus on those.
There's a grid I have seen used that is a great tool when doing projects.
Schedule, Cost, Quality, Scope.
1 can be optimized, 1 is a constraint, and the other 2 you have to accept. Period. It is a more useful version of the "fast, good, cheap - pick two"
My beliefs do not require that you agree with them.
I work in the aerospace industry, specifically an airline, as a manager of an Engineering subgroup. (if "manage" is what you call what I do)
One of the first things I have a new hire do is read Feynman's appendix to the Challenger Report. Primarily to instill a respect for dealing with data, not desires or pressures, and to (re)enforce the concept that "it worked last time", does NOT make it right or safe to do the same thing again.
The pressure / desire from above or parallel organizations within the company is constant, and usually precipitated by the latest operational interruption. All to frequently the refrain is along the lines of "but last time you authored a deviation, this is only a little bit more". When I feel the pressure is starting to cause situational ethics creep, I pull out Feynman's appendix, and read it myself, or have the affected person on my staff read it.
It is amazing how effective it is in restoring sanity, and a healthy respect for the ability of the hardware to kill you (and / or your customers).
Richard Feynman gave many things to this world, and especially certain segments of it. It's my opinion however that one of his best and most unsung gifts was the Challenger Report Appendix. It should be required reading for ANYONE who will ever touch or direct action on hardware that could even remotely present a potential for injury or death.
The message was not rocket science, but as the Columbia accident proved the rocket scientists still can't get it right.
Never ascribe to malice or conspiracy that which can be adequately explained by ignorance or stupidity.