Slashdot Mirror


Failure Is Always an Option

Logic Bomb writes "The New York Times has a short but elegant op-ed regarding the different perspectives of engineers and managers and the role that plays in accidents like the space shuttle Columbia disaster. It's the sort of article you'll nod all the way through, then print and leave anonymously on your supervisor's desk. Any tech managers in the Slashdot crowd might have some interesting comments on how the right balance is struck." Henry Petroski has written several good books on engineering and failure.

18 of 479 comments (clear)

  1. In software terms by (54)T-Dub · · Score: 5, Interesting
    In the case of Columbia, engineers who worried about damage that the spacecraft may have suffered during launch were ineffective in getting it properly inspected before reentry.

    In the case of my last software project, engineers who worried about bugs that the software may have suffered during design were ineffective in getting it properly inspected before launch.
    When engineers and managers clashed over the 1986 Challenger launch, the managers pulled rank.
    .....
    The Columbia Accident Investigation Board has recommended that NASA establish an independent Technical Engineering Authority. This would put responsibility for technical matters where it rightly belongs -- with the engineers who, because they know how the space shuttle was designed, also know best how it can fail.
    "No boss, I have no idea where that article printed out 15 times and strewn across your office came from........ It looks like a good article though."
    --

    "I can not bring myself to believe that if knowledge presents danger, the solution is ignorance" - Isaac Asimov
  2. NASA's Vietnam (From today's Wall Street Journal) by Anonymous Coward · · Score: 5, Interesting

    By Homer Hickam

    When I go to the Cape and watch the Shuttle being launched, I still get a lump in my throat watching it soar. Even though I no longer work for NASA, its thunder affirms my dreams for spaceflight. Still, when I put emotion aside, I can't ignore my engineering training. That training and my knowledge as a 20-year veteran of the space agency (and also a Vietnam vet) has led me to conclude that the Space Shuttle is NASA's Vietnam. A generation of engineers and managers have exhausted themselves trying to make it work and they just can't. Why not? Because the Shuttle's engineering design, just as Vietnam's political design, is inherently flawed.

    Much has been made of the report produced by the Columbia Accident Investigation Board (CAIB). I've read newspaper articles that called it "scathing." Hardly. Its tepid recommendations probably had Shuttle managers who made poor decisions dancing with relief. It gave them a pass by proclaiming "culture" made them do it.

    I don't believe there's a NASA culture. There is, however, a Shuttle cult. It is practiced like a religion by space policy makers who simply cannot imagine an American space agency without the Shuttle. Well, I can, and it's a space agency which can actually fly people and cargoes into orbit without everybody involved being terrified of imminent destruction every time there's lift-off. With some reservations, written in the politest language, the CAIB recommended to keep Shuttles flying but with more inspections, more bureaucracy (an outside safety agency), and more money. But piling on more inspections, people and dollars won't make the Shuttle safer. Neither will the safety sensitivity training that will probably be dumped on top of the overworked, disillusioned NASA engineers. My God, they've already dedicated their very souls to keep the Shuttle flying safely! The truth is, no amount of arm-waving about "culture" can fix a flawed design.

    Take a look at the Shuttle stack and what do you see? A fragile spaceplane sitting on the back of a huge propellant tank between two massive solid rocket boosters. The Shuttle has to sit right in the middle of all the turmoil of launch because we once believed it would be cheaper to bring back those engines and rebuild them than to build new ones. That has not proved to be the case -- far from it -- but it has left us with a crew sitting in the most vulnerable position possible in terms of design. Simply put, had that spaceplane been on top of the stack, the destruction of Challenger and Columbia wouldn't have occurred. The CAIB ignored this flawed design and that makes their conclusions suspect: no amount of inspections or condemning another NASA generation to worry over this thing will solve it.

    So let's get practical. We can't just shut the thing down. We need the Shuttle to finish the space station and also to keep the Russians and Chinese from dominating space. I'm not willing to see that occur while we dither. Human spaceflight is important to this country. But the Shuttle is as safe as you're going to get with what's in place today. Let's put some tough engineers in charge, fly it 10 more times over the next four years with hand-picked crews to finish the space station and meet our international obligations. Then close the program and replace it with expendable launchers and a shiny new spaceplane. And, this time, put it on top.

  3. Fail? by Matrix272 · · Score: 5, Insightful

    Was it Thomas Edison that said, "I haven't failed. I just found 10,000 ways that didn't work."?

    --
    "It's better to have a gun and not need it than need a gun and not have it." ~ Christian Slater, True Romance
    1. Re:Fail? by gujo-odori · · Score: 5, Insightful

      Say what?!

      If engineers find problems in a project, you think the answer is to "innovate our management skills?" Is that really even English?

      Don't look now, but Dilbert and The Way of The Weasel is making fun of PHBs, it's not a management blueprint? Well, actually, it is a rather good guide to managing successfully, but the key is to read what's in the book and *not do* those things.

      If engineers on a project, whether it's hardware, software, or something else, come to management and say "We have found problems with this project that will have a negative impact on its quality or possibly cause it to fail" the answer is not to sweep those concerns under the rug or blow them off. The answer is "OK, what do you need to fix those problems so that this project will succeed and reach its full potential?" When they tell you, obviously, there may be a cost/benefit tradeoff between some of the items, but basically you have to send them out to fix the problems so that the project will succeed.

      If engineers tell you "You can't do X for Y amount of money, it's just not possible," you should listen to them. Knowing what can be done, and for what price it can be done, is their job.

      If the engineering team comes to you and says "This project is so broken that it can't succeed, the best thing we can do is scrap it and do a total redesign," then you had better listen good. They are probably right, and the ass they save will be your own. The money sunk into the project is gone; don't make it worse by throwing good money after bad.

      Being committed to quality and excellence in a project are not "old thinking and old views that hold us back." They are the things that make projects successful. That's my company has a successful product, is growing fast, and is making money. Is yours?

      Aside to those who modded the parent Insightful: I never believed it before, but now I'm convinced that (some of) the mods really are on crack.

  4. Safety always has a price by shoppa · · Score: 5, Insightful
    The NY Times editorial has a good perspective in the manager vs engineer battle, but in the end we will never have a pefectly safe mode of travel (on or off earth) because Safety Costs Money.

    Now that money may be in the form of lower gas mileage in a car, or in the form of hundreds of unmanned test flights before putting a human in, or obscene safety margins.

    But to pretend that anything is ever perfectly safe is to ignore the fundamental economic issue that at some point you have to stop putting money into safety concerns and just fly the damn thing.

  5. This is annoying. by Prince_Ali · · Score: 5, Insightful

    On a project the size of the space shuttle thousands of safety concerns will be brought up. Not everyone of them can be fully investigated. They have to pick and choose based on what is most urgent. Yes, there will be accidents, but otherwise the shuttle would never get off the ground. Hindsight is twenty-twenty, and you can say they should have investigated further all you want, but the fact is that there were many other concerns that seemed just as urgent, and some that seemed even moreso.

    1. Re:This is annoying. by Niles_Stonne · · Score: 5, Interesting

      Did you read the Investigation report?

      Hindsight is 20/20, but that doesn't mean that we should wear blinders when looking towards the future!

      The Management team _actively_ canceled requests for information pertaining to the impact. See page 153 of the PDF.

      The management team also didn't follow their own procedures, they didn't meet every day (they were supposed to).

      I was impressed by the engineers at Boeing (I think that was the company) who elected to research the impact and footage of it over the weekend even when management told them not to.

      Read the report. Section 6.3 (DECISION-MAKING DURING THE FLIGHT OF STS-107) is extremely interesting and points out Eight seperate missed opportunities to find out more information about the problem.

      There were also some engineering related issues - the engineers using test software that wasn't designed to analize an impact nearly that large, and other issues - but it really comes down to a lack of the management team accepting that there could be a real, out-of-family problem on the mission.

      --
      Sticks and Stones may break my bones, but copyright will always protect me.
  6. Full Text by zippity8 · · Score: 5, Informative

    Failure Is Always an Option
    By HENRY PETROSKI

    URHAM, N.C. -- Scientists seek to understand what is, the aerospace pioneer Theodore von Karman is supposed to have said, while engineers seek to create what never was. The space shuttle was designed, at least in part, to broaden our knowledge of the universe. To scientists the vehicle was a tool; to engineers it was their creation.

    With the release of the report of the Columbia Accident Investigation Board, there is a new focus on the "culture" of NASA. Engineers have played a prominent but not a controlling role in that culture, both in the design of the shuttle and in the planning of its missions. When the report speaks of NASA's "broken safety culture," the particular failure it cites is "a consistent lack of concern" that Columbia may have been damaged by debris at takeoff. But perhaps NASA can be better understood by examining the culture that arises from the inevitable -- and healthy -- tension among scientists, managers and engineers.

    A common misconception about how things such as space shuttles come to be is that engineers simply apply the theories and equations of science. But this cannot be done until the new thing-to-be is conceived in the engineer's mind's eye. Rather than following from science, engineered things lead it. The steam engine was developed before thermodynamics, and flying machines before aerodynamics. The sciences were invented to explain the accomplishments -- and to analyze their shortcomings.

    The design of any device, machine or system is fraught with failure. Indeed, the way engineers achieve success in their designs is by imagining how they might fail. If gases escaping from a booster rocket can lower efficiency or cause damage, then O-ring seals are added. If the friction of re-entry can melt a spacecraft, then a heat shield is devised.

    Much of design is thus defensive engineering: containing, shielding and fending off anticipated problems on the drawing board and computer screen so that they cannot bring down the design when it flies. Obviously, total success can only come if every possible mode of failure is identified and defended against.

    Engineering is also very much about numbers. O-rings must be sized; the thickness of heat shields specified; the weight of insulation calculated. Often, the numbers work at cross purposes, as when increasing shield material decreases available payload. Engineering design is ultimately the art of compromise.

    What results from the design process is a thing that has unique characteristics. It can withstand the conditions for which it was designed as long as it maintains its integrity. There is usually some leeway allowed, for engineers know that operating conditions cannot be predicted with absolute certainty. Until it fails, how far beyond design conditions a system can be pushed is never fully known.

    But engineers do know that nothing is perfect, including themselves. As careful and extensive as their calculations might be, engineers know that they can err -- and that things can behave differently out of the laboratory. On the space shuttles, O-rings got scorched, heat tiles fell off, foam insulation broke free. To engineers, these unexpected events were incontrovertible evidence that they did not fully understand the machine.

    Engineers do not feel comfortable with things they do not understand. It is at this point that they begin to act more like scientists. In the case of the scorched O-rings, the engineers studied burn patterns. They looked for a correlation between damage and temperature, and they warned about launching when the temperature was outside the bounds of their experience and scientific study.

    If engineers are pessimists, managers are optimists about technology. Successful, albeit flawed missions indicated to them not a weak but a robust machine. When engineers and managers clashed over the 1986 Challenger launch, the managers pulled rank. In the case of Columbia, engineers who worried about damage that the

  7. Re:NASA's Vietnam (From today's Wall Street Journa by (54)T-Dub · · Score: 5, Interesting

    The problem is that people are afraid that if the shuttle stops flying space exploration will stop. Public support will wane and funcing will slow. I happen to disagree but there are many in the space program who do not.

    --

    "I can not bring myself to believe that if knowledge presents danger, the solution is ignorance" - Isaac Asimov
  8. You still can't prove a negative. by rdewald · · Score: 5, Insightful

    I have spent the last few days reading the entire CAIB report and I have to agree that Mr. Petroski is right on target with his observations.

    Simply put, the problem was that the engineers concerned with the safe re-entry of the orbiter after the foam strike were put in the position of having to prove a negative. Management wouldn't pay attention to them until they could prove that the strike was *not* safe.

    They couldn't prove or disprove the notion that the foam strike had caused critical damage until they got the images, but they couldn't get the images without first proving they needed them to assure the safety of the re-entry.

    There had been a number of previous foam strikes, many of them involving this same piece of foam (the left bipod ramp), and all of those shuttles had landed okay, so management believed that this foam strike was similarly okay just because they had gotten away with it so far.

    No science. No analysis. Just an assumption that if they had gotten away with ignoring this problem so far, they could continue to ignore it. The schedule was king, not safety.

    Engineers know well that "getting away with it" is not evidence of reliability. Managers, at least in my experience, tend to be proportionately successful in their careers to the extent that they can spin "getting away with it" into a career advancement tool.

    This is really why the orbiter was lost. This is really why the astronauts died.

    Denial is deadly.

    --
    The best way to do is to be.
  9. Re:NASA's Vietnam (From today's Wall Street Journa by Guano_Jim · · Score: 5, Insightful

    This is the same Homer Hickham about whom October Sky was made, I'm assuming?

    It would be nice if more people listened to engineers instead of politicians when it came to science projects, wouldn't it?

  10. Love the title bar... by SillySlashdotName · · Score: 5, Funny

    I opened this at work, and the title bar reads:

    "Failure is always an option - Microsoft Internet Explorer"

    Gotta love it!

    --
    Acts of massive stupidity are almost never covered by warranty. --me.
  11. Old story same sad ending by Crashmarik · · Score: 5, Insightful

    This is always the case it has been for a very a long time. The problem is not NASA's culture so much as the culture of the society around NASA.

    The article Misses the big points. When the Challenger blew up blame was apportioned to the engineers that built it not the congressmen who insisted the engines be built in utah. When software is shipped before its ready, blame goes to the programmers that were working 90 hour weeks not the sales people that promised the customer whatever they wanted to hear. When a heartvalve fails blame goes to the inventors that made a device that saved lives, not the insurance companies that wouldnt pay for a proper solution.

    Yes managers are willing to take risks, its rare they ever have to pay the price for failure.

  12. Management's decision not to image by Animats · · Score: 5, Interesting
    The NASA manager who stopped the USAF from imaging the shuttle for damage, Linda Ham, is apparently still on the NASA payroll, although she's been shipped out of Houston.

    It's worth thinking about what would have happened if the damaged Shuttle had been images by USAF ground cameras, and it became clear that re-entry was going to be a disaster. The shuttle and crew would have been stuck in orbit, with worldwide publicity, while NASA tried to come up with a fix. They probably wouldn't have succeeded. On-orbit rescue using Atlantis has been discussed as marginally possible, and on-orbit patching has been suggested, but most likely, they wouldn't have worked.

    Think of the PR fallout. Seven astronauts stuck in orbit for most of a month, with constant TV coverage, followed by their deaths on worldwide TV. That would have been career-ending for most of NASA's top management. Letting them crash saved the jobs of top people at NASA.

    Worst case, a rushed launch of Atlantis could have resulted in losing two shuttles. That would have ended the Shuttle program.

    1. Re:Management's decision not to image by jeffy124 · · Score: 5, Insightful

      No, I beg to differ.

      Assume NASA did attempt to evaluate the damage and it revealed the Columbia to be a death trap. Yeah, there will be media coverage had it become necessary to send up a repair crew or something.

      But there would be an Apollo 13 type effort. Atlantis could go up with a minimal crew and pick up the Columbia crew. Maybe do it in two flights. Leave the Columbia in space until repair becomes possible. Not possible? They'd find a way.

      Or, engineer a solution on the ground and figure out a way to get that solution up into space and istalled. Again, an Atlantis crew would head up with the necessary materials and perhaps be the ones to do the repair job. Sounds like the Hubble, doesnt it? Also impossible? They'd find a way.

      Engineers are quite capable of great things, and you seem to be underestimating the potential of great thinkers. When JFK made his "before this decade is out" challenge, everyone at NASA thought "No way! You've got to be kidding." But then the people who would do it got thinking of ways they could and they came through.

      --
      The One Rule Of Chess You'll Ever Need: Don't play someone who carries a kit in their bookbag.
  13. Managers take all the credit too! by Newer+Guy · · Score: 5, Interesting

    I've been involved in engineering literally all my life. My dad was an engineer and as a small child I remember going to work with my dad and being in awe of all the stuff he had to 'play' with. I never wanted to be anything else! Unfortunately, in the scheme of things we are the workers, the ones who toil withput credit. The managers take all of that. In the 1980's as a contract engineer I built a Boston FM radio station from scratch (WFNX), yet they didn't even see fit to invite me to its sign on party! When I asked why, I was told: "You were paid well for your work, isn't that enough?". They actually believed they paid me too much to make their property worth many millions morethan it was before. Needless to say from that time forward, I did only precicely what they paid me to do (and what they asked me to do), nothing more. Part of the problem is we ALLOW ourselves to be treated in this way! The plumber, electrician or auto mechanic don't. Why do we? I think one answer is UNION. They realize there is respect and safety in numbers. Are we too good, too elite to do the same?

  14. Pragmatic vs. perfect safety by coyote-san · · Score: 5, Insightful

    NASA isn't getting criticized because it doesn't have perfect safety, it's getting nailed because it has TWICE ignored clear evidence of significant problems and failed to perform even cursory investigations until after the loss of an orbiter and crew.

    There was clear evidence of problems with the O-rings before the Challenger was lost. NASA had somebody produce some really cryptic plots, but nobody bothered to really investigate whether the cooler weather on some of these launches might have an influence. It takes a real genius to reduce this to dipping an o-ring into a glass of ice water, but any competent investigator should have been able to reduce the data to plots of damage vs. various independent variables such as temperature at launch or overnight lows.

    With Columbia, the arrogance of management is far more stunning. It KNEW that the insulation had flaked off, it KNEW that the insulation had caused surface damage in the past, and it KNEW that some areas on the leading edge of the wing are much more vulnerable to damage than others because of access points. It could have test fired foam at wing mockups at any time, just to have hard proof instead of just hunches that the foam could never cause significant damage to an orbiter... yet it did nothing.

    This testing is expensive, of course, but it's really not that much when compared to the cost of a normal launch (isn't that approaching a billion dollars per launch now?), or the various costs associated with the loss of an orbiter and crew. It's akin to failing to spend $10 to check something on your car even though you knew that a mistake would mean that the car would erupt into a fireball and kill everyone inside if you're wrong.

    --
    For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
  15. Tufte's commentary is apropos by YetAnotherName · · Score: 5, Interesting

    When engineers and managers clashed over the 1986 Challenger launch, the managers pulled rank.

    What a dark, yet utterly true statement. Do the NASA and contracting company managers sleep well today knowing that in 1986 their decisions cost lives?

    Edward Tufte, author of some amazing books on information display, wrote in Envisioning Information on the Challenger disaster. Looking at the materials prepared by engineers, he saw that they had correctly correlated temperature with O-ring failure. Yet their materials, hastily prepared during the 11th hour, failed to convince managers to abort the launch. Tufte shows a design of a simple graph that shows temperature on the abscissa and burn-through on the ordinate, and any manager could draw a line through the points and extrapolate out to the bitter cold Florida day that cost the shuttle.

    Having my own share of bad managers, I have to wonder, would it have made any difference?