Slashdot Mirror


Examples of Programming Gone Wrong?

LightForce3 asks: "I'm a beginning CS student, and in my studies I've come across examples of programmer error causing very large problems, such as the Ariane 5 failure and the Therac-25 accidents, often as tales of caution to beginner programmers such as myself. My (morbid?) curiosity has been piqued, and I'm looking for other examples of programmer error leading to serious problems. After all, it is better to learn from the mistakes of others than from your own, right? ;) What programming-related accidents, incidents, and failures, both well-known and obscure, do Slashdot readers know about, and are there any good resources for researching these?"

23 of 626 comments (clear)

  1. The book "Fatal Defect" by spanky555 · · Score: 5, Informative

    This book is devoted to just that. It's what you're looking for...go get it and read it.

  2. Mars Orbiter Lost Over Metric Conversion by Kircle · · Score: 4, Informative

    http://slashdot.org/articles/99/09/30/1437217.shtm l

    --

    -- Kircle

  3. Re:Challenger by agentZ · · Score: 5, Informative

    What happened to Challenger wasn't a programming mistake, but rather a case of not following policy. The solid rocket boosters were never designed to operate in cold temperatures. The result of working outside of design specs was catastrophic failure, yes, but that wasn't the result of a programming error.

  4. How about the AT&T Switch failure in NY? by Bolen · · Score: 5, Informative

    A Central Office (CO) switch is basically a mainframe-class computer programed in assembler. A few years back, a newly-installed switch failed due to a bug in the code, causing a cascading failure of the phone system for a few hours.

  5. Re:Challenger by Pyromage · · Score: 5, Informative

    Incorrect: This was not a programming issue. Nor was it a software issue at all. The problem was the O-ring seals in the SRBs (Solid Rocket Boosters). The manufacturer stated that they should not be operated under 53 degrees, and NASA overrode the recomendation and launched anyway. The expected happened.

    NASA hasn't ever had a hardware problem. Or a software problem. Ever. Every problem can be directly tied to one specific person being a fscking moron. The closest you could come is that Mars probe that crashed because of mismatched units. And that was just poor communication among the software guys.

  6. RISKS Digest by BinBoy · · Score: 4, Informative

    The RISKS Digest is a mailing list and usenet newsgroup that describes all kinds of situations where technology has gone wrong. Many of the stories involve programming errors.

    Google's RISKs Archive

  7. Failures by Jordan+Graf · · Score: 4, Informative

    MIT runs a class called 6.033: Computer Systems Engineering. These lecture notes contain a list of projects that had great sums of money spent on them only to be abandoned. Also the reading list has a bunch of papers that discuss the "big splash" failures like Therac 25.

  8. Re:One Word by Helter · · Score: 5, Informative

    Come on now, that's the lazy way!

    How about citing an actual example of windows code bugs causing big problems? I'll go first. The USS Yorktown had to be towed back to harbor when the NT system that was automating most of the ship crashed.

  9. Re:the harrr-rrrrror by s20451 · · Score: 5, Informative

    US shooting down Airbus 320

    You're referring to the destruction of Iran Air flight 655 by the USS Vincennes near the Strait of Hormuz, on July 4, 1988. For one thing, it was an Airbus A300 (bigger and older than an A320). The failure there was mostly in human decision making, not in the AEGIS radar system, which faithfully reported that the airliner was travelling at 450 knots on a steady bearing towards Vincennes, roughly four miles outside the commercial air corridor, and not broadcasting IFF information (which of course they wouldn't, as a foreign civilian airliner). It was the officers of Vincennes who interpreted this information as a threat, misidentified the target as an Iranian F14, and destroyed it.

    --
    Toronto-area transit rider? Rate your ride.
  10. Re:already.. by gmajor · · Score: 5, Informative

    http://slashdot.org/articles/02/05/02/1525210.shtm l?tid=128 - Debug your code, or else!

    Using google's serach engine provides better results for slashdot.org that slashdot's own search engine :-)

  11. Re:That's kind of silly by pongo000 · · Score: 4, Informative

    Wouldn't setting it to something like 0 be better?

    In most areas of the world (unless you're flying over the Dead Sea, or Death Valley, or New Orleans), if your altimeter reads 0, you're probably already dead. Altimeters used for navigation read MSL (height above mean sea level), not AGL (height above ground). There are radar altimeters that read in AGL, but these are used for close-to-ground maneuvers like landing.

  12. not true by PissedOffGuy · · Score: 4, Informative

    the database they were using faulted on a divide by zero. nothing to do with NT.

    1. Re:not true by PissedOffGuy · · Score: 5, Informative

      what are you talking about? the navy inquiry found no fault in NT. here, you try this: write a program that divides by zero and run it on NT. as with any other good OS, the program shuts down and the OS keeps going. user mode code cannot cause a blue screen, makes sense.

      in the navy's case the crashed program was enough to call the computers "down", and that makes sense too. the only thing that doesnt make sense is the attribution of blame to the OS for an app problem.

  13. How is an app the fault of NT? by deadsquid · · Score: 5, Informative

    Much as I dislike NT, especially in critical environments, this problem had nothing to do with NT. It had everything to do with bad coding.

    As we all know, information systems are only as smart as people make them. In the case of the USS Yorktown, an admin/operator entered data which caused a divide by zero condition in the application. Because the application did not have any exception handling built into it for a divide by zero condition, it died.

    You can't blame the OS for this. The application should have had exception handling built into it in a couple of places. It probably should have checked any new entries before comitting them to ensure the new data would not introduce such a condition, and the app itself should have had appropriate error handling to prevent a panic/dump when a divide by zero condition was encountered.

    If the app was coded by the same people on another platform, the end result would have been the same.

    --
    Idiot, n. A member of a large and powerful tribe whose influence in human affairs has always been dominant
  14. It was a bad break in C code by hayne · · Score: 5, Informative

    Actually, the switching code was in C and the crash was due to a programmer's apparent misunderstanding of the 'break' statement. See full details at: http://www.csc.calpoly.edu/~jdalbey/SWE/Papers/att _collapse.html

  15. Good site by dumboy · · Score: 4, Informative
    Check out this site

    http://wwwzenger.informatik.tu-muenchen.de/perso ns/huckle/bugse.html

  16. One of the best resources I've found by PghFox · · Score: 5, Informative
    The Pragmatic Programmer: From Journeyman to Master, is one of the best resources I've found to avoid common programming mistakes. This book details many of the common errors we make as software developers and describes strategies for overcoming them. Having been in the field for close to two decades, I've found this book to be of immense value, and give it a high recommendation.

    Some of the tips, which may appear obvious to some of us, include:
    • Always Aim for Simplicity, Clarity and Generality
    • Treat all of your code as if you're going to release it
    • Keep subroutines small; break-up code as you go
    • Document as you go, not after the fact
    • Write tests as you go, not after the fact
    • Fix bugs immediately; do not delay fixing them
    • Do not duplicate any code, anywhere
    • Separate form and functionality
    • Subroutines should do one thing and do it well
    • Make your work easy to reuse
    --
    --- Fox
  17. Re:Insidious bug from the wayback machine by rufusdufus · · Score: 4, Informative

    Perhaps I should point out the bug: the comment "//check for \" ends with a pre-processor line-entesion character (\), which effective appends the next line onto the current line, thus the code "slashfound=1" is effectively commented out and the next statment (++index) only executes if c=='\'

  18. Re:Why this cant be right... by bongholio · · Score: 4, Informative

    You're all sorta right.. here is one of my favorite aviation pages It'll tell you more than you ever wanted to know about airplane physics (from a pilot's point of view). Chapter 1 covers these altitude/speed/power concepts...

  19. Re:That is NOTHING -- 10,000 died in Bhopal, India by jgaynor · · Score: 4, Informative

    A "large quantity of water" entered the storage tank because an employee who had just been fired dropped a hose into it out of spite (he didnt know what would happen, he just wanted to ruin something). Yes the safety precautions were under-par, but when someone with legitimate access wants to destroy something its pretty hard to prevent.

    And yes, this has nothing to do with programming error :).

  20. Re:A Great Story by florescent_beige · · Score: 4, Informative

    Speaking of aviation: This SAAB Gripen crash was attributed to the coding of the control laws in the flight control computer. So was this one. And this F-22. And lets all remember the Apollo 11 incident.

    --
    Equine Mammals Are Considerably Smaller
  21. Computer-Related Risks by Peter G. Neumann by Malic · · Score: 4, Informative

    I think I've recommended this book serveral times on Slashdot. Simply put, THE collection of computing related horror stories.

    http://www.amazon.com/exec/obidos/tg/detail/-/02 01 55805X/qid=1035769692/sr=8-13/ref=sr_8_13/104-4078 673-1863905?v=glance&n=507846

    --
    I swear by MacOS X. Although I use to swear *at* MacOS 9...