Slashdot Mirror


Ten Technology Disasters

Ant writes "What do a 17th-century Swedish warship, an opulent Chicago theater and a Kansas City hotel "skyway" have in common? All met catastrophic ends and they have important lessons to teach today's innovators."

19 of 327 comments (clear)

  1. Concorde? by reaper20 · · Score: 4, Interesting

    It took just one more little mishap to make a disaster: a titanium "wear strip" fell off a Continental DC-10 in the path of an Air France Concorde leaving Paris. When the Concorde's tire hit the strip, a chunk of rubber tore off and smashed into the wing, punching a 600-square-centimeter hole in its skin and causing fuel to leak and ignite.

    Disclaimer: I know nothing about airplane safety or testing, but this one set off my common sense alarm.

    So, the tires on Concordes require to be changed alot - a chunk of titanium breaks of of another plane, and hits a tire on a Concorde, causing the accident - anyone else think that "Well gee, I don't think any kind of tire is designed to withstand titanium chunks slamming into them." Considering the condition of some of the commercial jets I've flown in, I'll take my chances with the Concorde. I'm sure there is more to it than just this, I thought it odd though.

    Though not a "disaster" per se - the Navy's dead Windows NT ship is tops for the funniest in my book.

  2. Carry through is important! by FortranDragon · · Score: 4, Interesting

    I live near KC and I remember when the skywalks collapsed. As the story unfolded after the tragedy, it became readily apparent that everyone just assumed everyone else was doing what they thought they should be doing or that their shortcuts were fine with everyone else. :-( Communication and checking up on how things are actually progressing versus the plans can be a real matter of life or death.

    Next time as a programmer you bitch about checking up on QA (assuming you are lucky to have a QA department) or on the users, just remember that your mistakes very rarely kill people. You've got it _easy_.

    Also, on a side note, the local KC TV news organizations try hard to prevent people from getting to their archives of what happened. They don't want to present Kansas City in a "bad light". This is also very stupid. If we can't easily learn from our mistakes we are going to make more of them. 'Protecting' KC's reputation just makes Kansas Citians look more retarded than the screwup that was Hyatt Regency Skywalks. :sigh: Yeah, mistakes were made, so let's own up to them and learn something so we don't do it again.

    --
    "All the darkness in the world can not quench the light of one small candle."
    1. Re:Carry through is important! by K8Fan · · Score: 4, Informative
      I live near KC and I remember when the skywalks collapsed. As the story unfolded after the tragedy, it became readily apparent that everyone just assumed everyone else was doing what they thought they should be doing or that their shortcuts were fine with everyone else. :-( Communication and checking up on how things are actually progressing versus the plans can be a real matter of life or death.

      I lived in KC at the time, and I recall that there were more screw-ups than this short summery mentioned. The metal fabricator also changed the design of the beams. As designed, they were to be made of two "U" shaped channels welded together with a seam on the left and right sides of the beam. They didn't have those bits in stock, so they used two shallower "U" shaped pieces and welded them together at the top and bottom of the beam...and then drilled the holes for the threaded rod right through the welds!

      Everyone involved was criminally culpable...and (to my knowledge) went to prison.

      Also, on a side note, the local KC TV news organizations try hard to prevent people from getting to their archives of what happened.

      A good friend of mine was the first emergency physician on the scene at the Hyatt and performed the triage. He was recently interviewed by the BBC for a documentary about the Hyatt. They supplied footage to the BBC, but no...they don't have any reason to supply footage to random people.

      --
      "How perfectly Goddamn delightful it all is, to be sure" Charles Crumb
    2. Re:Carry through is important! by K8Fan · · Score: 4, Informative

      Teach me to actually re-read the thing when I preview it. What I meant to say was:

      Everyone involved was criminally culpable...and (to my knowledge) *NOBODY* went to prison.

      --
      "How perfectly Goddamn delightful it all is, to be sure" Charles Crumb
  3. What about Banqiao and Shimantan dams by btempleton · · Score: 5, Informative

    A story that claims to be reporting on the greatest tech disasters, in particular the lesser known ones, and it fails to mention Banqiao and Shimantan in 1975?

    I mean, not only was this the greatest technological disaster in human history with 80,000 to 230,000 dead depending on whose numbers you believe, but it also is sufficiently unknown that the author of an article on disasters doesn't appear to know of it!

    --
    Has it been over a year since you last donated to the Electronic Frontier Foundation
    1. Re:What about Banqiao and Shimantan dams by nels_tomlinson · · Score: 5, Informative
      A story that claims to be reporting on the greatest tech disasters, in particular the lesser known ones, and it fails to mention Banqiao and Shimantan in 1975?

      Since the original post mentioned this as if we should be familiar with it, here're the details: A big dam in China failed, in large part because the Communist ideologues over-ruled the hydrologists. Many thousands died, but of course that's all right because the houses of the Party cadre were built on high ground. Click on that link for the fine print.

  4. RISKS - assesment community by DaveWood · · Score: 5, Informative

    No discussion of the topic could be complete without mentioning RISKS. The RISKS Digest has been discussing risk factors associated with technology and engineering (and to some extent generally) on the internet since 1986.

    Every engineer should spend time reading there. Any _good_ engineer should subscribe.

    -David

  5. When the corporation goes unregulated... by stefanlasiewski · · Score: 5, Insightful

    This is what happens when you have a system that allows the corporation to run amuck.

    The lowest bidder cannot be trusted to create products that are safe.

    In these cases, it is good to still have some government oversight.

    --
    "Can of worms? The can is open... the worms are everywhere."
  6. Re:Well, I read it, and I can't see any patterns.. by Registered+Coward+v2 · · Score: 5, Informative
    So does anybody know of a good reference work out there which actually has some worthwhile analysis on stuff like this? Didn't Feynmann write something up after Challenger?


    Yes, it appeared as an appendix to the Roger's Report. He also discussed it in his autobigraphy either "Surely your joking..." or "What do you care...", I can't remember which. The appendix is a good read, and can be found here:
    http://www.ralentz.com/old/space/feynman-re port.ht ml
    or any of a number of other googleable links.

    --
    I'm a consultant - I convert gibberish into cash-flow.
  7. Forget Ye Not the Therac-25 by ewhac · · Score: 5, Informative

    Even if you never get near embedded systems of this type, you can't call yourself a responsible software engineer until you read and learn from An Investigation of the Therac-25 Accidents.

    Executive Summary: Company introduces next-generation radiation therapy machine, replacing hardware-based overdosage safety interlocks with software-based mechanisms. Software fails. People are killed.

    Schwab

  8. Navy's Dead ship by reflexreaction · · Score: 5, Informative
    An article on the NT problem is available here.

    From the article
    The Yorktown lost control of its propulsion system because its computers were unable to divide by the number zero, the memo said. The Yorktown's Standard Monitoring Control System administrator entered zero into the data field for the Remote Data Base Manager program. That caused the database to overflow and crash all LAN consoles and miniature remote terminal units, the memo said.
    And a little bit later in the article
    "If you understand computers, you know that a computer normally is immune to the character of the data it processes," he wrote in the June U.S. Naval Institute's Proceedings Magazine. "Your $2.95 calculator, for example, gives you a zero when you try to divide a number by zero, and does not stop executing the next set of instructions. It seems that the computers on the Yorktown were not designed to tolerate such a simple failure."

    GO ARMY!!!!!!!
    --

    We had to destroy the sig to save the sig.
  9. or Halifax. by s20451 · · Score: 5, Interesting

    In 1917 collision between two ships in Halifax harbor -- one carrying close to 3000 tons of high explosive -- resulted in an explosion which levelled much of the city and killed 2000 people, in what was one of the largest non-nuclear manmade explosions in history.

    --
    Toronto-area transit rider? Rate your ride.
  10. No Common Thread...but... by efuseekay · · Score: 5, Interesting

    every engineer has their own stories of how they SNAFU-ed. I have mine (one of the reasons why I wuss-ed out and now do theoretical physics instead :)).

    Usually, the problem is :

    (a) Pushing Envelope without prior analysis (Vasa)
    (b) Not exercising Due Diligence in design (Tacoma Narrows)
    (c) Failure of communication between departments (Mars Climate Orbiter : remember the units SNAFU?)
    (d) Insufficent redundancy design (Iroquis Fire)
    (e) Failure to recognize likely failure modes (Concorde, Titanic)

    and others of course.

    I've once fucked up an expensive spacecraft component because of (c). I worked on the mechanical design of the component housing, some electronics guy worked on the electronics detector sitting inside my housing. We have an innovative design whereby some of my mechanical supports were designed to keep some of his electronics ICs in place without the PCB board. The SNAFU : both of us thought the other is suppose to apply anti-vibration gell (layman's term here, we call it RTD...).

    So the part was fab-ed, electronics put in, and the whole thing was sent to a vibration table for testing..

    Result : a loose IC, clanking around the housing for 2 minutes at about 600Hz. The whole thing was toast.

    --
    Mode (3) smart-aleck mode. Press * to return to main menu.
    1. Re:No Common Thread...but... by statusbar · · Score: 4, Insightful

      Those are all good points.

      Another problem I have seen was where TWO different bugs mostly functionally cancelled each other out causing new intermittent problems.

      I made a realization regarding strict-type checking languages versus dynamic typed languages.

      Typically, people who are used to java and c++ complain about languages like python - saying that the compiler should catch static type problems at compile time and that languages that do not do this are inherently unsafe.

      Then I realized that ALL of these people must not be running any real tests on their code! If they were running real tests on all your code (every line must be executed in your tests), then these dynamic typing errors would be easily caught ! those would be the easiest bugs to find.

      Too often I have seen C and C++ coders compile their project.... No errors! Ship it! :-)

      Another issue I have been thinking about is the relationship between code reuse and unexpected behaviours. Code reuse (and object class reuse) is fine as long as all of the functionality and limitations of the object/code are known.

      However for more complex class hierarchies I have seen people say '"I'll just inherit from this class publicly and change the public interface to match what I need for this project." - And then they are surprised when other pre-written code interacts funny with it. I'm not saying object-oriented is bad - I'm saying it is so common for programmers to break the basic concepts of OOP.

      I had one manager who was adamant that for any medium sized project there ought to be NO time spent on making the code re-usable. Every line of code should be directly related to specific aspects of the customer's requirements/specification document. At first I thought he was crazy.

      But after I saw some projects expand into massive class hierarchies just for the sake of the illusion of increasing the reusability of the code in other projects, I am starting to side with him a bit more.

      Extreme Programming has at least some very good points about it. ie: don't add features until you know you need them. Otherwise they probably won't be tested properly and won't be a good match for the new use. You can't predict every environment that the code may be reused in. It is harder to do than it sounds.

      So for high reliability systems I think one should have simple non abstracted code that can be measured, prodded, and always predictable. Then you can fashion your unit tests accordingly.

      --jeff++

      P.S.: scary thought/rant for today: How much C++ code do you see that is striving to be exception safe so that memory full errors will be caught properly? How many C++ coders understand that the default linux kernels and libraries will almost NEVER cause malloc() to return 0 and will almost NEVER cause operator new() to throw? Only virtual memory space is allocated. Real memory pages are only allocated as they are being used. Once all physical and swap pages are used, blammo goes your app (and possibly other apps on your system). In semi-critical systems, this is a real problem that is often overlooked.

      Where is the real problem in this case? Part of the problem is that the c++ environment running on the default linux kernel does not conform to the standard.

      The other part of the problem is that it is little known. If it were commonly known, people would be able to design around it (or change the kernel options). So people rely on what the documentation says, instead of properly testing the software limits.

      --
      ipv6 is my vpn
  11. Re:disasters course by Tumbleweed · · Score: 4, Funny

    > The engineering undergraduate program at Queens University actually
    > has a disasters course as one of the non-technical electives.
    ...
    > Supposedly this engenders a greater sense of responsibility into the
    > engineers to be.

    Perhaps, then, this should be a required class instead of an elective one. *shrug*

  12. funny? by passion · · Score: 5, Interesting

    the Navy's dead Windows NT ship is tops for the funniest in my book.

    Many psychologists have suggested that the emotion of humor has evolved as expressing relief from danger.

    I find it truly frightening.

    --
    - passion
  13. Eng. on board and Devel. say it was not NT by AHumbleOpinion · · Score: 4, Insightful

    http://www.sciam.com/1998/1198issue/1198techbus2.h tml

    "Others insist that NT was not the culprit. According to Lieutenant Commander Roderick Fraser, who was the chief engineer on board the ship at the time of the incident, the fault was with certain applications that were developed by CAE Electronics in Leesburg, Va. As Harvey McKelvey, former director of navy programs for CAE, admits, "If you want to put a stick in anybody's eye, it should be in ours." But McKelvey adds that the crash would not have happened if the navy had been using a production version of the CAE software, which he asserts has safeguards to prevent the type of failure that occurred."

  14. Re:Three Mile Island, Chernobyl. Is Tennessee next by Melantha_Bacchae · · Score: 5, Interesting

    Phrogger wrote:

    > First Three Mile Island. Then Chernobyl. Is Tennessee next?

    Sorry, Tennessee would have to get in line. One of the most spectacular examples of stupidity causing a nuclear accident was at a plant in Tokai-mura on September 30th 1999, and it is the greatest nuclear plant accident in Japan's history. Basically, they dumped all the safety precautions and mixed themselves up a batch of acidic nuclear soup in a big steel bucket and stirred. Instant hot fission! You can read the World Nuclear Association's writeup here (it has a nifty table of different levels of nuclear catastrophe that is a must read):

    http://www.world-nuclear.org/info/inf37print.htm

    The interesting thing is, Toho was filming on location at the Tokai plants for a Godzilla attack in the then upcoming "Godzilla 2000 Millenium". They were probably done with filming by the time the accident actually occured. In December 1999, the movie opened, with Godzilla heading over to attack the plants.

    This wasn't the first one of Toho's monster movies to "come true", only one in a long history. Here are two other famous ones:

    "Gojira" 1984: the Russians have a nuclear accident in the movie (in the original Japanese version, US version makes it a deliberate act). In 1986, the Russians had a real accident: Chernobyl.

    "Mosura 3: King Ghidora Raisu" 1998: the King of Terror (King Ghidora) begins his attack on Tokyo by flying through the twin towers of a skyscraper. Office workers flee while talking on cell phones. The US version ... well there was no US version, except the real life one on September 11th, 2001. Tristar, why was "Rebirth of Mothra 3" never released so we could have been warned as Mothra clearly intended?

    Sonora:"New Godzilla reading. He's moving inward toward Tokai."
    Shinoda: "The nuclear plants, I knew it.
    Sonora: "Afraid so."
    Yuki: "Well, that's just lovely. Another Chernobyl."
    "Godzilla 2000" (US version dialog)

  15. Re:What about Texas City? by Gordonjcp · · Score: 4, Insightful

    A couple of things about the article:

    Firstly, that's not really what "heterodyne" means. Heterodyning is when you mix two signals to produce another at a different frequency. This is how pretty much all radio receivers work (yes, I know there are other ways. Go in a shop and buy a commercial super-regen radio, and I'll change that sentence). It's not a "glitch", it's more a constant physical property.

    Also, the problem was not directly caused by the radio equipment, but by what was said. Yup, it's an unpopular view to take, but it was just plain human error. No blaming the machines here. Why? Well, it goes like this...

    The day of the accident, there was very heavy fog around Tenerife. Visibility was extremely poor, and it was impossible to see the opposite end of the runway. Another factor was that normally, you only fly off from one end of the runway, depending on wind direction. If the surface winds are calm, it's the tower's call as to which runway is in use (denoted by the heading you're facing when taking off, in 10-degree steps, ie. Runway 25/Runway 07). *Both* runways were in use, so aircraft could line up at both holding points, to help reduce queueing.

    Now, the Pan-Am pilot was first out, so lined up at the takeoff point, and began his takeoff run. There was some confusion about whether or not the KLM aircraft was to taxi from the hold to the takeoff point, due to both the controller and the Dutch pilot having english as a second language. This wouldn't have been a problem for the most part, because even if the KLM had been at the takeoff point, the Pan-Am would have cleared it with plenty room, even though it shouldn't have been on the runway.

    The key is in what the Dutch pilot said - "We are now at takeoff". This is indeed a common phrase, generally meaning that the aircraft is sitting at the takeoff point and awaiting clearance. However, in Dutch, the prefix "at-" is equivalent to the English "-ing" suffix - the pilot had just effectively said "I am now taking off". It's an easy mistake to make if you speak more than one language. Even a language you don't often use creeps into things you say in your first language. Just watch it doesn't have consequences this serious!