Slashdot Mirror


History's Worst Software Bugs

bharatm writes "Wired has an article on the 10 worst sofware bugs.. From the article 'Coding errors have sparked explosions, crippled interplanetary probes -- even killed people. Here's our pick for the 10 worst bugs ever, but the judging wasn't easy.'"

34 of 645 comments (clear)

  1. The meat of the article... by cytoman · · Score: 4, Informative

    July 28, 1962 -- Mariner I space probe. A bug in the flight software for the Mariner 1 causes the rocket to divert from its intended path on launch. Mission control destroys the rocket over the Atlantic Ocean. The investigation into the accident discovers that a formula written on paper in pencil was improperly transcribed into computer code, causing the computer to miscalculate the rocket's trajectory.

    1982 -- Soviet gas pipeline. Operatives working for the U.S. Central Intelligence Agency allegedly (.pdf) plant a bug in a Canadian computer system purchased to control the trans-Siberian gas pipeline. The Soviets had obtained the system as part of a wide-ranging effort to covertly purchase or steal sensitive U.S. technology. The CIA reportedly found out about the program and decided to make it backfire with equipment that would pass Soviet inspection and then fail once in operation. The resulting event is reportedly the largest non-nuclear explosion in the planet's history.

    1985-1987 -- Therac-25 medical accelerator. A radiation therapy device malfunctions and delivers lethal radiation doses at several medical facilities. Based upon a previous design, the Therac-25 was an "improved" therapy system that could deliver two different kinds of radiation: either a low-power electron beam (beta particles) or X-rays. The Therac-25's X-rays were generated by smashing high-power electrons into a metal target positioned between the electron gun and the patient. A second "improvement" was the replacement of the older Therac-20's electromechanical safety interlocks with software control, a decision made because software was perceived to be more reliable.

    What engineers didn't know was that both the 20 and the 25 were built upon an operating system that had been kludged together by a programmer with no formal training. Because of a subtle bug called a "race condition," a quick-fingered typist could accidentally configure the Therac-25 so the electron beam would fire in high-power mode but with the metal X-ray target out of position. At least five patients die; others are seriously injured.

    1988 -- Buffer overflow in Berkeley Unix finger daemon. The first internet worm (the so-called Morris Worm) infects between 2,000 and 6,000 computers in less than a day by taking advantage of a buffer overflow. The specific code is a function in the standard input/output library routine called gets() designed to get a line of text over the network. Unfortunately, gets() has no provision to limit its input, and an overly large input allows the worm to take over any machine to which it can connect.

    Programmers respond by attempting to stamp out the gets() function in working code, but they refuse to remove it from the C programming language's standard input/output library, where it remains to this day.

    1988-1996 -- Kerberos Random Number Generator. The authors of the Kerberos security system neglect to properly "seed" the program's random number generator with a truly random seed. As a result, for eight years it is possible to trivially break into any computer that relies on Kerberos for authentication. It is unknown if this bug was ever actually exploited.

    January 15, 1990 -- ATT Network Outage. A bug in a new release of the software that controls ATT's #4ESS long distance switches causes these mammoth computers to crash when they receive a specif

    1. Re:The meat of the article... by endersdouble · · Score: 2, Informative

      Not to mention anyone at Tunguska.

  2. Whatever happened to the US Navy? by Lead+Butthead · · Score: 3, Informative

    Something about their latest toy... ahm, ship that had to be towed back to port because Windows NT they used to run everything on the ship keep blue screening.

    --
    ELOI, ELOI, LAMA SABACHTHANI!?
    1. Re:Whatever happened to the US Navy? by Anonymous Coward · · Score: 1, Informative

      Incorrect on a couple of points:

      First, the ship did not need to be towed back into port, though it did sit dead in the water for a bit (if I recall correctly). The story was bandied about the UNIX support group I was part of at one of the shipyards that built Aegis class cruisers. The ship this happened to was an Aegis.

      Second, the problem was in the software running on top of Windows; it failed to properly bounds check operator input and allowed a division by zero error, which in turn caused a cascade failure of almost all the critical systems on the ship.

      The Navy still decided to switch to Windows instead of HP/UX for the fleet. :P

    2. Re:Whatever happened to the US Navy? by Phanatic1a · · Score: 4, Informative

      There are no "Aegis class" cruisers. Aegis is a ship combat system, specifically an AN/SPY-1 radar system, a computer based command-and-control system, and one of a number of missile systems either in current-tech VLS cells or older cylindrical magazines and launchers.

      The Aegis system can be found on Ticonderoga-class cruisers and Arleigh Burke-class destroyers in the USN, Kongo-class destroyers in the IJN, and some Spanish frigate whose designation I forget.

      The ship we're talking about is the USS Yorktown, CG-48, and the problem was pretty much as you describe. A user input an erroneous zero value for some quantitity (fuel pressure, I think), and the system ate itself and took the engines offline.

      The Yorktown was decommissioned last year. Shame that the practice of using Windows in ship-critical systems wasn't.

  3. Re:Moth. by BridgeBum · · Score: 4, Informative

    The term predates computers. In the original usage, any sort of mechanical device or system could have bugs.

    http://www.silicon.com/software/webservices/0,3902 4657,10005407,00.htm

    --
    My UID is the product of 2 primes.
  4. Re:Microsoft's striking absence by Shaper_pmp · · Score: 2, Informative

    Jeuss Christ. I'd somehow never heard of this bug, and I've been developing for Windows machines for years.

    How on earth was such a basic and low-level bug ignored for so long? It doesn't seem like rocket-science to fix it with a small bounds-checking if statement!

    --
    Everything in moderation, including moderation itself
  5. Re:Microsoft's striking absence by mattyohe · · Score: 3, Informative

    Do people just open an article, do a Ctrl+F and type microsoft to find something 'juciy'? If you would have RTFA you would have seen that the 'Ping Of Death' was mentioned which did impact Windows machines.

    --
    - what is the definition of simultanagnosia?! I've been meaning to look it up!
  6. Re:This bug reminds me of a Dilbert comic by Yahweh+Doesn't+Exist · · Score: 3, Informative

    if you subscribe (I don't any more) you can probably find it by searching for "random".

    I think the last line is actually something like
    Dilbert: That isn't very random though.
    Some guy: That's the trouble with randomness - you can never tell.

  7. Re:omg by PhilHibbs · · Score: 4, Informative

    Its intent was not to cause terror, but to inflict economic damage. I heard about a similar incident where a Japanese shipbuilder was stealing blueprints from a UK shipyard tendering for a contract and undercutting them. The UK shipbuilder deliberately designed a ship that would capsize on launch, which the Japanese duly stole, built, and launched. I don't know if anyone was killed, but ethically it's a tricky one.

  8. Airbus Crash by CruddyBuddy · · Score: 5, Informative
    Here is video of an Airbus crashing into the trees because the autopilot didn't like the landing conditions. IIRC (remember), the pilot's pull-up was ignored because the flight conditions weren't optimum despite an obvious life threatening situation. If this isn't a software bug, what would you call it? (Maybe the software considered crash modes and this configuration allowed the black box to survive intact.)

    http://www.alexisparkinn.com/photogallery/Videos/A irbus320_trees.mpg/

    (Let the slashdotting begin! (poor servers))

    All things considered, I don't know if the pilots survived.

    --
    ----------
    Any problem can be made unsolvable if there are enough meetings made to discuss it.
    1. Re:Airbus Crash by be-fan · · Score: 5, Informative

      I actually know why this happened. We learned about it in our flight dynamics class. The problem was the result in a mistmatch between what the pilot thought the airplane was doing, and what it was actually doing. The A320 had software that prevented the pilot from stalling the airplane during flight. However, the protection only kicked in above 90', because the software assumed that if you were below that, you wanted to land (which involves a stall right at touchdown). The pilot was trying to do a flyby, and was supposed to be above 100', but for whatever reason he came in at around 30'. Now, the reasons he didn't pull up and ramp up the engines are debatable, but the equitable explanations suggest that he assumed that the airplane's stall protection would kick in, while the airplane had disabled them because it thought it was about to land.

      --
      A deep unwavering belief is a sure sign you're missing something...
    2. Re:Airbus Crash by Tim+Browse · · Score: 3, Informative
      Just to add yet another explanation, when I worked for Rediffusion (UK flight simulator manufacturer), this air show crash was discussed during our induction. If I remember correctly, the pilot span down the engines to lower the aircraft, and then tried to power them up again to lift the aircraft out of the descent and fly over the trees. The pilot claimed the system over-rode his desire to power up the engines, causing the crash. (I believe he had already over-ridden some safety mechanism to allow him to perform this descent in the first place.)

      However the actual problem was that airliner engines aren't like some awesome fighter jet with afterburner. They take time to spin up - from examining the black box, they determined that at the point the pilot wanted to ascend, even if the engines span up at the maxiumum rate, it was still nowhere near enough to pull the plane out of the descent. Hence, pilot error.

  9. MOD PARENT BACK DOWN IT WAS A SOFTWARE BUG by gorim · · Score: 4, Informative

    Because it was actually implemented as microcode and stored into the CPU, whether as mask rom or some other means of storing, but it was indeed software either way you look at it.

  10. Re:Microsoft's striking absence by varmittang · · Score: 4, Informative

    Remember when the LA air traffic control tower crashed, due to a bug in MS software after 49 days. I would think that this would make it up there. http://www.itgarage.com/node/459

    --
    -----BEGIN PGP SIGNATURE-----
    12345
    -----END PGP SIGNATURE-----
  11. Re:Whoops forgot to hit preview by CastrTroy · · Score: 3, Informative

    There are software engineers in Canada now. They can legally sign off on a software project. The problem is, is that you don't want to have every one of your programmers be licensed software engineers, all signing off on their own code. It would be too expensive to try and hire that many engineers, and managing all the signatures for all the code, when different people work on the same piece of code would be a nightmare to manage. Basically you'd have to have one engineer, or team thereof, overseeing the entire project to be sure that proper methods are being followed to ensure that there aren't any bugs. What you're asking for is more like saying that everyone who in building a bridge be licensed, and that they should all have to sign off on every rivet they put in.

    The problem is, is that most companies producing software do not want to pay for an engineer to oversee their project. Also, the way most software operations are run, you wouldn't see an engineer, signing off on the projects. The engineer would force things to be much more tested in order to be ensure that things were actually worthy to be signed off on. There is lots of this kind of software being built for planes, and other situations where it really matters if there is bugs. I don't think this kind of situation will ever happen with off the shelf software. For one thing, software would cost too much, and most people aren't willing to pay $2000 to run an operating system on their home computer, and also because most engineers wouldn't sign off on a system, in which they didn't know the computer their software would run under. There's too many variables on a home computer to be able to garauntee, at that level, that your software will operate completely as expected.

    --

    Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
  12. Re:This bug reminds me of a Dilbert comic by Lucan_UK · · Score: 5, Informative

    Here is the Dilbert Strip... Enjoy
    http://www.geocities.com/raptorred42/Dilbert0001.j pg

    --
    why?
  13. Re:Whoops forgot to hit preview by Balthisar · · Score: 3, Informative
    You are not even allowed to call yourself an engineer without getting that license. That person is actually held legally responsible for the projects he signs off on.

    Actually, you're confusing the title "P.E." (professional engineer) with the generally accepted term "engineer." One (the P.E.) is a licensed engineer, and others are used traditionally and arbitrarily with no legal recourse. For example, I and my co-workers are bona fide engineers, and most of us have engineering or engineering technology degrees. None of what we do requires a P.E. to sign off on anything, although there are other aspects of our business (and many other businesses) that do require a P.E.

    Of course, there are all kinds of "engineers" that have that title but don't truly merit it -- customer service engineer; field service engineer; applications engineer; and so on. Most of these don't hold engineering degrees. For many of them, I don't begrudge them their title, either. But we also know that they're not P.E.'s.

    --
    --Jim (me)
  14. Re:Intel FP divide is -not- a software bug by ameline · · Score: 4, Informative

    That is correct -- Modern processors perform divides by having a reciprocal estimate lookup table.
    This table produces an estimate with 12 or so good bits of precision. Iterative refinement (typically microcoded) then produces the rest of the bits. After that the reciprocal is multiplied in, and you get the result.

    More recently this has been somewhat exposed, as most all modern processors have a reciprocal estimate instruction which executed in a single cycle. This is very useful if, for example, you want to normalize a bunch of normal vectors before passing them into the graphics pipeline. 12 bits is almost always enough for this purpose, and the reciprocal sqrt instruction is very much your friend here. So something that was dominated by the ~60 cycles of 1.0f/sqrt(sum_of_squares) becomes 1 cycle. Total speedup is about 10x -- and it's vectorizable -- the SSE unit will do a vector rsqrte.

    My understanding of the pentium fdiv bug is that a section of the reciprocal estimate table had bad data in it.

    This, in my opinion, counts as software, as would the microcode. If the bug had been in the multiplier, adder, or logic circuitry of the lookup table, then it would count as hardware.

    Many, if not all the complex ciscy instructions are implemented in microcode -- so I believe that a bug in them would count as a software bug.

    --
    Ian Ameline
  15. Re:Predictions are hard by Jason+Ford · · Score: 5, Informative
    Several recent studies lend support to this observation. From an article at the American Pyschological Association:

    We've all seen it: the employee who's convinced she's doing a great job and gets a mediocre performance appraisal, or the student who's sure he's aced an exam and winds up with a D.

    The tendency that people have to overrate their abilities fascinates Cornell University social psychologist David Dunning, PhD. "People overestimate themselves," he says, "but more than that, they really seem to believe it. I've been trying to figure out where that certainty of belief comes from."

    Dunning is doing that through a series of manipulated studies, mostly with students at Cornell. He's finding that the least competent performers inflate their abilities the most; that the reason for the overinflation seems to be ignorance, not arrogance; and that chronic self-beliefs, however inaccurate, underlie both people's over and underestimations of how well they're doing.

    --
    I did not become a vegetarian for my health, I did it for the health of the chickens. --Isaac Bashevis Singer
  16. Re:Worst _software_ bugs, huh ? by squiggleslash · · Score: 2, Informative

    The bug was about missing data in a lookup table. Intel said the problem was caused by a bug in the script designed to populate that table when the CPU was being designed, though legend has it that someone erronously "proved" that the data wasn't needed. So, I guess, either way you look at it, be it the script, the table, or the alleged logic flaw, it's a software, not a hardware bug (or at least, it's a bug caused by software.)

    --
    You are not alone. This is not normal. None of this is normal.
  17. Goose and gander by A+nonymous+Coward · · Score: 1, Informative

    [the USSR's] murderous, oppressive grip on Eastern Europe and attempts at foisting their cheerful utopia on South America and Africa

    As opposed to the US's murderous, oppressive grip on third world countries generally and attempts at foisting their cheerful utopia on the rest of the world.

    It's fair to say the US's grip wasn't as thorough, but it sure was oppressive, and it encompassed more of the world than the USSR for more years. How many legitimate governments did the US overthrow because they didn't like them?

    Terrorism is terrorism. Justifying the largest non-nuclear explosion in the name of fighting terrorism belongs in George Orwell's literature.

    1. Re:Goose and gander by Jherek+Carnelian · · Score: 3, Informative

      Chile
      Colin Powell's statement: "With respect to your earlier comments about Chile in the 1970s and what happened with Mr. Allende, it is not a part of American history that we're proud of."

      Iran

      Guatemala

      Greece

      There's lots more where those came from -- all democratically elected too. I hope you survive the cognitive dissonance.

  18. Not a bug by stlhawkeye · · Score: 4, Informative
    In a series of accidents, therapy planning software created by Multidata Systems International, a U.S. firm, miscalculates the proper dosage of radiation for patients undergoing radiation therapy.

    I used to work with the lead programmer on this software package from Multidata. We worked together at two different companies for a total of about four years.

    Multidata's software allows a radiation therapist to draw on a computer screen the placement of metal shields called "blocks" designed to protect healthy tissue from the radiation. But the software will only allow technicians to use four shielding blocks, and the Panamanian doctors wish to use five.

    This is also made very clear in the documentation. This isn't a bug at all, the dosimitrists misused the software.

    The doctors discover that they can trick the software by drawing all five blocks as a single large block with a hole in the middle. What the doctors don't realize is that the Multidata software gives different answers in this configuration depending on how the hole is drawn: draw it in one direction and the correct dose is calculated, draw in another direction and the software recommends twice the necessary exposure.

    Exactly. They tried to create a feature that the software did not support, and they did so in a manner that broke the software.

    At least eight patients die, while another 20 receive overdoses likely to cause significant health problems. The physicians, who were legally required to double-check the computer's calculations by hand, are indicted for murder.

    It's not a software bug, it's a user error. This isn't a bug any more than it's a "bug" that your Linux box stops working properly if you do sudo rm -rf /. The users of the product knew better.

    To be fair, Multidata was not a great shop from a procedural standpoint - the guy who ran it was insane, but the software was rock solid. I actually worked with a number of former Multidata employees who jumped ship and went to a rival shop that builds similar software, and they were all fairly competant and intelligent.

    --
    "I have never won a debate with an ignorant person." -Ali ibn Abi Talib
  19. Re:omg by greginnj · · Score: 2, Informative
    And i suppose if I had a "broken" gun in my basement and you broke in and stole it, then tried to use it and injured yourself, you could sue me right?
    ...believe it or not, this is essentially true. A lawyer friend of mine (in NJ) tells me that if you booby-trap your house against thieves, a thief breaks in, and is injured, he can sue you and has some chance of winning. I forget what the actual liability is (it's not 'unsafe working conditions' or something urban-legend-sounding like that), but there are grounds for a suit.
    --
    Read the best of all of Slash: seenonslash.com
  20. Well, they're ok, but not quite the worst by douthat · · Score: 5, Informative

    I think the two worst computer bugs of all time are the two that quite possibly could have wiped us all out. More inforation here.

    (Copied from the article:)
            * November 9, 1979, when the US made emergency retaliation preparations after NORAD saw on-screen indications that a full-scale Soviet attack had been launched. No attempt was made to use the "red telephone" hotline to clarify the situation with the USSR and it was not until early-warning radar systems confirmed no such launch had taken place that NORAD realised that a computer system test had caused the display errors. A Senator at NORAD at the time described an atmosphere of absolute panic. A GAO investigation led to the construction of an off-site test facility, to prevent similar mistakes subsequently. A fictionalized version of this incident was filmed as the movie WarGames, in which the test system is inadvertantly triggered by a teenage hacker believing himself to be playing a video game.

            * September 26, 1983, when Soviet military officer Stanislav Petrov refused to launch ICBMs, despite computer indications that the US had already launched.

            If it weren't for two humans who said "fuck what the computer says!", we might be in a very different place right now.

    --
    She loves me: 09F911029D74E35BD84156C5635688C0 She loves me not: 09F911029D74E35BD84156C5635688BF ...
  21. Probably BS by hughk · · Score: 2, Informative

    I looked at this a while back because many millenia ago, I worked at the company that produced the telemetry/control system for the Trans-Sib pipeline. It was a specialised outfit based in Warwickshire, UK. It is very doubtdul that their systems could have nobbled by anyone. The network was closed, based on an X.25ish HDLC and the software was blown on to UV erasable EPROMs. The CIA may have modified the s/w at the pump stations, but again it is doubtful.

    --
    See my journal, I write things there
  22. Mars Climate Orbiter -- English/metric SNAFU by alumshubby · · Score: 2, Informative

    speaking of NASA foulups, Remember this one? "(CNN) -- NASA lost a $125 million Mars orbiter because a Lockheed Martin engineering team used English units of measurement while the agency's team used the more conventional metric system for a key spacecraft operation, according to a review finding released Thursday."

    --
    "How many light bulbs does it take to change a person?" --BMcC-->
  23. Not that "bug found in relay" story again by Anonymous Coward · · Score: 2, Informative
    With that recall, the Pruis joined the ranks of the buggy computer -- a club that began in 1947 when engineers found a moth in Panel F, Relay #70 of the Harvard Mark 1 system. The computer was running a test of its multiplier and adder when the engineers noticed something was wrong. The moth was trapped, removed and taped into the computer's logbook with the words: "first actual case of a bug being found."

    I hate to be pedantic (well no, I love it), but according to the Jargon file's entry on "bug":

    Indeed, the use of bug to mean an industrial defect was already established in Thomas Edison's time, and a more specific and rather modern use can be found in an electrical handbook from 1896 (Hawkin's New Catechism of Electricity, Theo. Audel & Co.) which says: "The term 'bug' is used to a limited extent to designate any fault or trouble in the connections or working of electric apparatus." It further notes that the term is "said to have originated in quadruplex telegraphy and have been transferred to all electric apparatus." ...
    Actually, use of bug in the general sense of a disruptive event goes back to Shakespeare! (Henry VI, part III - Act V, Scene II: King Edward: "So, lie thou there. Die thou; and die our fear; For Warwick was a bug that fear'd us all.") In the first edition of Samuel Johnson's dictionary one meaning of bug is "A frightful object; a walking spectre"; this is traced to 'bugbear', a Welsh term for a variety of mythological monster which (to complete the circle) has recently been reintroduced into the popular lexicon through fantasy role-playing games.

    But then again, why expect more from Wired.

  24. Re:And don't forget your roots... by mrisaacs · · Score: 3, Informative

    Actually engineers existed long before the steam engine. The title of Civil Engineer was created to differentiate the practioners from the Military Engineers - the most common and probably the oldest usage of the title of engineer.

    --
    ...carrier dead.....
  25. From the quoted article... by Anonymous Coward · · Score: 2, Informative

    Some outside observers, however, said they are not convinced NT is blameless.

    "It still boggles the mind that any divide by zero error on NT would cause a system to crash, let alone" 27 end-user terminals, said Gil Young, corporate network engineer for a systems integration firm in Orlando, Fla. "I don't care what operating system, computer or application I'm using, I should be able to type in a zero and expect the computer not to crash, especially if that zero is to represent a closed valve."

  26. Re:Same old tiresome error: "BUG" was old then by ScottForbes · · Score: 2, Informative
    I hate to interrupt your rant, but... the Wired article doesn't say that the term "bug" originated in 1947. It merely notes that the first widely known "buggy computer" was the Harvard Mark I:
    With that recall, the Pruis joined the ranks of the buggy computer -- a club that began in 1947 when engineers found a moth in Panel F, Relay #70 of the Harvard Mark 1 system. The computer was running a test of its multiplier and adder when the engineers noticed something was wrong. The moth was trapped, removed and taped into the computer's logbook with the words: "first actual case of a bug being found."
    Unless you demand that anyone retelling the 1947 anecdote immediately prove their street cred -- "Of course the term 'bug' did not originate with this incident, blah blah blah, I mention this to prove that I'm smarter than you are" -- then the Harvard Mark I's moth is the earliest example of a computer glitch that the public might have heard about. Since the rest of the article is about other bugs the public might have heard about, and since the article repeated Hopper's exact words about finding an "actual bug" (which, as you note, implies that they'd been calling them "bugs" long before they found a genuine moth), how about you easing up a little and giving writer Simson Garfinkel some slack?
  27. Re:You get what you pay for NONSENSE by twiddlingbits · · Score: 2, Informative

    You mised one..

    Item 0: Requirements reviews by peers and independants. If you don't have good requirements you obviously don't know things well enough to be building them. Sure you can catch some requirements issues in 1 and 2 but the longer you wait the costlier it is to fix.

    A MSCS is NOT a Software Engineering Degree, so why WOULD you take courses in SE?I'd say that CS and SE are two different professions. There are places to get a MS SE (Texas Tech comes to mind) if you are interested.

  28. Re:Mangement problems by TFloore · · Score: 2, Informative

    The phone network bug was a misplaced { character in a nested if-else construct.

    Is that what it was? I thought I'd heard that the AT&T outage was from a missing break; in a switch-case statement.

    I found that more believable, because a missing { would cause a compiler error, where a missing break; is a valid way to purposely fall into the next case.

    Though, really, I suspect both of us are just repeating rumors we heard.

    --
    This is my sig. There are many like it but this one is... Oops. Frank, I've got your sig again! Where's mine?