Slashdot Mirror


Examples of Programming Gone Wrong?

LightForce3 asks: "I'm a beginning CS student, and in my studies I've come across examples of programmer error causing very large problems, such as the Ariane 5 failure and the Therac-25 accidents, often as tales of caution to beginner programmers such as myself. My (morbid?) curiosity has been piqued, and I'm looking for other examples of programmer error leading to serious problems. After all, it is better to learn from the mistakes of others than from your own, right? ;) What programming-related accidents, incidents, and failures, both well-known and obscure, do Slashdot readers know about, and are there any good resources for researching these?"

11 of 626 comments (clear)

  1. RTM Worm by rwash · · Score: 5, Interesting

    In the 80's, Robert T Morris accidentally released a worm that exploited problems in sendmail and other common internet daemons that took down most of what was the internet at that time. This was expecially bad since about half of it was military.

    1. Re:RTM Worm by ProfessorPuke · · Score: 5, Interesting

      "accidentally released" is wrong, or prehaps whitewashing by RTM's friends. The release was fully intentional. What was accidental about it is that he hadn't realized that in addition to infecting virtually every UNIX system it found, it would also DOS them. The worm constantly tried to infect every available system, meaning that a system which was vulnerable would recieve many, MANY copies of the worm, exhausting its processing power.

      RTM had been aware of the possiblilty, and implemented a fix- but he did it wrong. He'd created code so that a new worm, when first arriving at a host, could check if a previous instance of the worm had been there. If so, it could abort its infection process.

      However, he was afraid that this would make vaccinating machines too easy (by sysops faking the "already infected" flag), so he created a 12.5% random chance that an incoming worm would ignored the fact that a machine was already compromised and infect it again. That probability had NO rational basis behind it, (in fact the whole idea of using randomizing like this is flawed), and served to postpone the shutdown of the internet by at most an hour.

      This was an especially bad blunder because it set a frightening example of what hackers could do. If RTM had used a 100% chance of non-reinfection, (and played his cards right from then on), he'd have been hailed as an innovative security analyst who'd prevented security-compromising violations of the Pentagon's systems. Instead he was tossed in prison for years.

  2. Why, the world's favorite mail client, by Scarblac · · Score: 5, Interesting

    Outlook!

    Built with the idea that code in attachments should be executable, often automatically. Also full of exploitable bugs, to get even more stuff running automatically, regardless of who who sent it. Responsible for a huge amount of damage by all sorts of worms, trojans, etc.

    Someone, somewhere got the idea that email would look better with html; and if it got html, it should get scripting too, that's consistent with web pages! And it's cool if attachments (like pictures) can be opened in their appropriate program automatically - let's run any executables then, that's consistent!

    This is oversimplified, but I really feel that this is a case of stupid consistency that caused multi-billion dollar damage. Email should never be executed by the mail client.

    --
    I believe posters are recognized by their sig. So I made one.
  3. That was an easy setup by Ghoser777 · · Score: 5, Interesting
    A clear example is: [insert random microsoft product].

    Oh wait... -1 Redundant

    Here's a good site though with tons of examples.

    My favorite would be the infamous time when NASA did half its calculation in metric and the rest in SI. ;)

    F-bacher

    --
    James Tiberius Kirk: "Spock, the women on your planet are logical. No other planet in the galaxy can make that claim."
  4. A Great Story by puppetman · · Score: 5, Interesting

    that was told to my class about the altitude of fighter jets.

    A company was hired to rewrite the code that was used on one of the models of fighter jets, and they offered to fix an unusual bug.

    The details are: apparently they had two altimeters - one was barometric, and the other I don't remember.

    Anyway, the programmer was coding along, and was writing code to determine what would happen if the altimeters stopped functioning.

    He came to the case where they both weren't working, and couldn't figure out what to do, so called one of the pilots that was acting as an information source for the developers, and asked him what altitude they normally flew at, and he answered, "12,000 feet" or something similar.

    So the programmer wrote,

    if altimeter1 not working
    {
    if altimeter2 not working
    {
    set height = 12000;
    }
    }

    Stupid, but this code could not be changed. The pilots had the following rule deeply ingrained: if the altitude stays at 12,000 for more than a few seconds, pull up, as your altimeters aren't working.

  5. Train collision by Shaddup · · Score: 5, Interesting

    A company I once worked for (as an intern) was in the business of what's called "train control" software. Briefly, it's the software that dispatchers use to monitor the status of the switches, the position of all the trains being tracked by the system, etc. One of the features of the system is to provide early-warning of potential collisions. Well, the system is quite reliable (having been in service, in one form or another, since the 70's). However, there have been some accidents.

    Once such accident, in Mexico, was caused by an unexpected combination of several simultaneous failures. One day, for some reason, one of the servers needed to be reset. At the same time, two freight trains were stopped at a switch, in the process of what's called a "pass," where one train turns off onto a side track to let the other train pass by on the main track. Long story short, the status bits of the switch got lost during the server reset (there is a provision for restoring track states when the backup servers take over, but it didn't work for some reason). After asking if the track was clear, the driver for train1 recieved a green light from the dispatch office. The dispatcher, not knowing that train2 hadn't cleared the switch yet, figured everything was ok. The trains collided at very low speed, and not head-on, but nonetheless the collision cost the rail line several million in equipment and downtime. No one was hurt.

    The lesson: When writing bullet-proof software, check every possible condition! More extensive field testing would have caught the failover bug.

  6. Non-life threatening, but interesting bug... by Anonymous Coward · · Score: 5, Interesting

    I'm an AC for a reason...

    Let's just say that two years ago a very large international shipping company suffered two days of worldwide failure in the package routings printed on labels. The bug was caused by an incorrectly placed paren in an index offset calculation, leading to truncation of an intermediate result (to a 16 bit unsigned int, when it should have been 32). The bug sat dormant for five years because the result matrix it was indexing into was smaller than 64kbytes. As soon as it grew over that size - boom! What a way to wake up at 2am when the Asian-Pacific region starts calling...

    I didn't make it, but I was definitely involved with the fix. After that we did some very thorough auditing on all of the routing code - and fortunately didn't find any other surprises lurking.

  7. Airbus by That_Dan_Guy · · Score: 5, Interesting

    This isn't really a programming error, but a user training error.

    In the Airbus if the pilot tries to correct (use the flight controls) while the computer is engaged the computer will correct the pilot's correction. Unlike in a car with cruise control where if you hit the breaks it just cuts the cruise control. Many China Airlines planes have crashed due to poor pilot training in this regard. They weren't trained well enough to shut off the computer control before taking control of the plane.

    I'm also sure someone can be a little more detailed than this, but it is, IMO, at least a design error that has caused hundreds of deaths.

    As a side note, my Software Engineer professor refused to ever fly on a fly by wire plane, and was opposed to SDI simply because he didn't beleive that either had been or ever would be debugged properly. (if there is one error in every 10,000 lines of code, and it has 3 or 4 million Lines of Code, how many errors is that? His answer: too many to trust)

  8. OT: Scuds and Patriot missile defenses by GuyMannDude · · Score: 5, Interesting
    People keep pointint to the floating point error as the cause for why the Patriot system at that time (the PAC-2) let that scud go through. But as I've already pointed out in an earlier post, the PAC-2 did a crappy job (far worse than is generally known) intercepting scuds not because of coding errors but because the problem of hitting an erratically moving missile was so difficult. I think it's important to get the word out as we approach a new war with Iraq and consider a national missile defense shield. This recent article briefly discusses Israel's own attempts at missile defense because they don't trust the PAC-2 (for good reason) and it's questionable whether the US is going to give them some PAC-3 batteries.

    Bottom line: that stuff about the floating point error in the PAC-2 system looks neat on paper but it's not at all clear that the faulty calculation was responsible for the loss of life.

    GMD

  9. Re:That's kind of silly by pete-classic · · Score: 4, Interesting
    From a text I am currently working on:


    The compiler requires you to declare variables, but does not require you to initialize them. Does that mean you can get away with leaving them uninitialized? Well, you might program your entire life without coming upon a reason not to. If you don't initialize them, however, you will almost certainly run into a very difficult bug, probably sooner than later. Using an uninitialized variable is perfectly valid syntax, but is always a logic error. The compiler won't complain, but you will get wild, unpredictable, and wrong results. In the worst case, you might get believable, but wrong results. This leads us to what to use as an initializer. Most people use zero. Using an "obviously wrong" value may be more useful. Often a maximal value (such as int students="65536") is more obviously wrong. [Emphasis added for this post.]


    This isn't variable initialization, but the principal replies. Data that you know are junk should look like junk! Trying to "fake it" or make it "look good" is exactly the wrong thing to do.

    -Peter
  10. F-16 AOA and WOW by taaminator · · Score: 4, Interesting

    "Flight instruments don't lie"

    First, BEFORE YOU LEAVE THE GROUND, pilots are taught that instruments don't lie. Specifically, when the human inner ear is placed in flight, things go wrong (the inner ear canals are static, not dynamic, devices; the fluid has no dampening or rate sensors). When there is no external reference, the inner ear canals adjust to the eye's visual presentation. It's called the 'leans.' Bad joo-joo. Many a perfectly good aircraft has been flown into the ground because the pilot believed his ears and eyes and not his instruments.

    Second, IN FLIGHT, angle-of-attack (AOA) is a spectacular indicator of where your airfoil exists within (or outside) the flight envelope for your aircraft. Inside the flight envelope, you can seek best range (mpg) or best endurance (loiter) or best climb.

    In most aircraft, the angle-of-attack indicator is a manual instrument (on the skin is a sensor which looks like a big euro-style handle and it runs to an indicator in the cockpit).

    Many pilots are correctly taught to 'fly' the angle-of-attack.

    Third, ON THE GROUND, when you land, you use the aircraft shape as an airbrake. You hold the aircraft nose off the ground as long as possible to create drag.

    Fourth, ON THE GROUND, when you land, you do not want to hold the aircraft nose too far off the ground or the tail will scrape the runway and your fitness report will reflect and you'll be the butt of bad jokes at Snopes for eternity.

    The AOA is used to assist in the performance of aerodynmic braking. The aircraft performance manual publishes the tried and true range of AOAs for aerodynamic braking. [It also indicates when too much AOA will ding the aircraft.]

    Aerodynamic braking is part art and part science and requires accurate instruments.

    Enter the F-16 ... it has an electronic AOA.

    F-16 pilots were taught to fly the flight direction indicators to land.

    However, many old and new pilots fell back on the old AOA once the wheels touched the ground to do aerodynamic braking.

    Suddenly, F-16 tails were scraping along the runway at an alarming (and expensive) rate.

    [As an aside, the problem was probably ignored until a senior officer ground off a few inches of aluminum THEN there was a problem.]

    The programmers who wrote the AOA routines were rightly told that the AOA is used in flight. So, when the AOA detected that the aircraft had placed weight on the wheels (weight-on-wheels - WOW), it was programmed to quit working. Unfortunately, it kept the last AOA reading ... no matter what the real AOA was.

    Pilot flies, pilot lands, pilot believes instruments, pilot scrapes multi-million dollar aircraft's tail along runway.

    The programming solution was simple: when there was WOW, fade the AOA.

    This was another case when contracts pit spec wording against spec intent against functional application and understanding of how it's supposed to work ... Fortunately, it was expensive and not lethal.

    "Why did they call you 'sparky' and why are you driving school buses in North Topeka?"