History's Worst Software Bugs
bharatm writes "Wired has an article on the 10 worst sofware bugs.. From the article 'Coding errors have sparked explosions, crippled interplanetary probes -- even killed people. Here's our pick for the 10 worst bugs ever, but the judging wasn't easy.'"
I wouldnt say they are the 10 worst bugs ever... more like the 10 most widely known media announced bugs. Okay I have no examples of any others but I'm sure there must be worse bugs out there...
anyone think of any others?
why?
The fact the site appears to be buggered... :|
Not everyone with a video camera can go out and film a network TV series. Does that mean we should require them to become licensed before they can operate their cameras?
No.
There will always be a difference between professional and amateur grade. You'll never need a license to run a compiler.
1995/1996 -- The Ping of Death. A lack of sanity checks and error handling in the IP fragmentation reassembly code makes it possible to crash a wide variety of operating systems by sending a malformed "ping" packet from anywhere on the internet. Most obviously affected are computers running Windows, which lock up and display the so-called "blue screen of death" when they receive these packets. But the attack also affects many Macintosh and Unix systems as well.
===
WinNuke made it...
MoM++ - A Classic Expanded - [Master of Magic 1.5]
http://mompp.sourceforge.net/
The last one on the list is this
... to me that sounds like a user not using the software correctly..
"Multidata's software allows a radiation therapist to draw on a computer screen the placement of metal shields called "blocks" designed to protect healthy tissue from the radiation. But the software will only allow technicians to use four shielding blocks, and the Panamanian doctors wish to use five.
The doctors discover that they can trick the software by drawing all five blocks as a single large block with a hole in the middle. What the doctors don't realize is that the Multidata software gives different answers in this configuration depending on how the hole is drawn: draw it in one direction and the correct dose is calculated, draw in another direction and the software recommends twice the necessary exposure.
At least eight patients die, while another 20 receive overdoses likely to cause significant health problems. The physicians, who were legally required to double-check the computer's calculations by hand, are indicted for murder. "
why?
Why do they have the Intel Pentium floating point divide error listed as a bug? That was a hardware design error in the circuit, it was not a software bug. Of course it caused software to behave unexpectedly, but still I'm surprised that Wired put that one in there.
Hero of Allacrost, a FOSS RPG for *NIX/*BSD/OS X/Win
Consider how much software is written by people with five years or less of professional experience, on short schedules, with no time allocated for continuing education. If software projects weren't always rush jobs, and on relative shoestring budgets, the quality would be better. If continuing education for programmers was a priority, quality would be better. If a couple of decades of experience was properly appreciated, quality would be better.
Wonderful article. Twenty years ago I believed that writing software would soon become a licensed profession. (Need alicense to own a compiler, for instance.) I thought that the event that would inevitably trigger this is when a software bug caused a human death.
This is like saying you need a license to operate a Soda Vending Machine because some idiot decided tipping it over trying to get a free soda was a smart idea. You might have to put warnings on compliers like do not code if you have no clue what you are doing, etc but requiring a license won't ever happen. I am sure there will be lawsuits in the future regarding software bugs, but any software being used where an error could cause a human death is going to have a corporation behind it, that can be held responsible.
I've read about this instance before, and I think it's attributable to ignorance on both the user and the developer. The software developer in this case knows the life of a human being is resting on his code, so it should have been nigh impossible to "trick" the software into allowing anything other than what the specs said it could do.
Proud member of the American Non Sequitur Society. We might not make much sense, but boy do we love pizza!
I found it a hard subject in school and have never used it practically, but it seems to be the only SURE way of proving the correctness of a program. Shouldn't we be using it, at least in real-time mission-critical applications now. I think it needs to be stressed a lot more in school from the start, as compared to topics like web development and java and all other pragmatic things that can be learned more easily.
Life is about being a Phoenix!
The Russian's got what they paid for. Not all espionage is successful, they learned not to believe something just because they wanted to. Besides, the CIA stopped WWIII from starting in 1992, and I don't see you walking around Langly with a sign offering free blow jobs.
Indeed. Causing a *NON FATAL* explosion in a country that imprisoned as many as 2.5 million political prisoners in Gulags at one time, and is estimated to have murdered upwards of 60 MILLION of its own citizens. Terrorism?
Terrorism is an act of mayhem designed to terrorize. This did not.
Sabotage? Yes.
Act of war? Probably.
Terrorism? Not even close.
Your statement is just a display of anti-American rhetoric with no basis in reality.
What a fool believes, he sees, no wise man has the power to reason away.
That is wrong. This is a myth that has been disproved several times. See for example the "IEEE Annals of Computer History" where Adm. Grace Hopper said that that the term "bug" was used at least since the 30s, and maybe earlier, to describe an electrical problem in a system. See also here.
In interview, Hopper confirmed that the notebook moth's caption, "First actual case of bug being found", clearly shows that it was a joke referring to a term that was already in use at the time.
Any idiot researching this anecdote for five minutes could have found about it. I guess Wired couldn't be bothered. At this level of laziness and incompetence, one wonders why they just don't start publishing printouts of slashdot laced with ads. At least, this place contains occasional nudgets of truth.
Once again, Wired blew it. Nice jobs, guys.
--
Mad science! Robots! Underwear! Cute girls! Full comic online! http://www.girlgeniusonline.com/
License to own a compiler is a bit extreme, don't you think?
One of the items from the article included a medical product that killed five people due to an error written by an inexperienced programmer. In situations like this, yes, a license-to-program would be fantastic. The people who wrote the software for the Toyota Prius, that too. The guy who wrote "Finger"... maybe not.
For potential severity, this one's worse than a few they listed.
Basically, the Navy was running critical ship systems on a Windows NT platform, and a divide-by-zero in a database caused a buffer overrun that resulted in a shutdown of the engines, leaving the ship dead in the water for 2.5 hours.
Fortunately, it was on maneuvers off of Cape Charles, and not at war off the coast of Yemen or something. Scratch a billion-dollar destroyer and most of her crew because of an NT bug, in that case.
Why isn't Outlook Express in here? Early versions basically changed unopened e-mail viruses from a hoax to reality, when Microsoft decided it was a *good* idea to automatically run any VB script that was recieved. That's cluelessness like trusting everyone to be good and decent human beings while you walk through a prison shower with "Please rape me" painted on your back.
Later versions tried to fix the problem while keeping the functionality, as if somehow the bad guys would intentionally include the Evil Bit in their code.
"No problem. I have the capacity to do infinite work so long as you don't mind that my quality approaches zero."-Dilbert
It is inherently broken by design. You pass it a buffer of indeterminate size (selecting one large enough for your purposes), but you don't have any way of telling the function how big the buffer is. If you read more data than the buffer can handle, bad things happen.
No, the size of the buffer cannot reliably be determined from inside the function. Not even if you make it a macro.
Why do they retain it? Because dropping it would break a LOT of existing code. So they have been modifying the compilers to generate warning messages when it sees them.
Who would win this election: Andrew Weiner vs Andrew Weiner's weiner.
You keep using that word, 'terror'. Are you sure you know what it means?
The fact that there was an explosion of such magnitude doesn't bother me a bit. And I bet the majority of the citizens of the USSR weren't shaken a bit by this explosion, because (drum roll) they never knew such an accident had happened (and that's, for me, the scary part). And nothing spells success better than an act of terror noone finds out about, now does it?
Man is a slave because freedom is difficult, whereas slavery is easy.
What about the Y2K bug? I believe that had a greater economic impact than many of the other "worst."
What those who want activist courts fear is rule by the people.
However, you have been fooled. The parent comment is competely at odds with the article.
The article shows largely a series of examples where you DID have HIGHLY PAID and HIGHLY trained professionals with plenty of experience and oversight, but nevertheless very significant bugs occurred. So, the real lesson from this article is not "you get what you pay for," but rather that "software development is very hard" and perhaps that "by nature of its hardness, we can expect critical flaws to pop up from time to time, even when highly trained, experienced, and monitored programmers are involved."
Except that the software didn't break well. It should have either reported that the action wasn't allowed or calculated correctly. It shouldn't look like it's working but give erroneous results. If a single block with a hole isn't supported, why are you allowed to select it?
"That CIA gas plant explosion 'bug' is disgusting and has America == No.1 Terrorist written all over it if true."
I might as well say: "Idiots like you that corrupt the language are worse than terrorists."
Both are absurd exaggerations that have nothing to do with reality, and only degrade the ability of our language to carry meaning.
Get Real. Terrorism is the deliberate use of violence against civilians in order to induce a state of terror in the general population, as a method intended to achieve political, religious or ideological goals.
The CIA were not using violence, they were attempting to cause stolen technology to fail.
The CIA were not targeting civilians. Moreover, AFAIK, not one person was even killed in the explosion, which happened in a very remote area, and the specific explosion was certianly not planned (they had no knowledge of or control over how the Soviets used the stolen technology).
The CIA were certainly neither attempting to induce a state of terror, not cause change by inducing a state of terror.
You want to oppose the US government? Great -- there are many good bases on which to do so. But please, before you speak up next time, get some facts, learn how to use the language, and THINK! You might then have a chance of convincing somebody of your point, instead of just annoying them with your ignorance.
It's not a software bug, it's a user error.
It's both. The program should not have accepted easily recognised invalid input and the user should not have entered it.
I don't care if it's not in the spec, it's commonly accepted programming practice that all input should be bounds checked and any program that doesn't do that is crap.
Your rm example is not equivalent as command line programs are by design flexible; in unusual circumstances it may be exactly what the operator wants to do.
---
Keep your options open!
No. When you design software that is explicitly intended to perform potentially lethal actions on human beings, you absolutely make sure it's foolproof. You do input validation at every freaking step, then double-check the result before you pull the trigger.
If I go in for LASIK and get my retina burned off because some technician turned the wrong dial up to 11, you bet your ass I'm suing the manufacturer right along side the clinic. It should not be possible for the user to screw up the software when life is on the line.Some of the bugs reported in the story were not so much the fault of programmers, but of management. The phone network bug was a misplaced { character in a nested if-else construct. The code had already been though extensive testing, and then a small change was needed. Because it was a "minor" change someone said it didn't need to go through the extensive (expensive) testing again. It's always easy to point at the code or the guy who wrote it. Especially when the boss is the one tasked with finding out what went wrong.
It doesn't matter how highly paid and trained your professionals are, if the environment that produces the software is not conducive to eliminating these types of flaws. Like if they are not given enough resources to test and QA the the projects they are assigned, there is no organizational commitment to take the time and expense to document properly, or leadership overrides technical objections to project timeframes, etc. Most of the cited projects could probably be classified as failures of project management rather than failures of the end product (the software) that these flawed projects produced. Yes, software is hard and the software profession should continue its efforts to improve quality, but that doesn't let the organizational culture, leadership and processes that produced the software in these cases off the hook.
Why is it when the accounting profession makes spectacular mistakes that take down entire Fortune 500 class organizations, there is a critical analysis of the processes that led to these failures, and remedies often comprise prescriptive measures for these processes, but similar analysis for software failures focus upon the software flaw but not the environment that allowed the flaw to emerge? Now sometimes the remedy in the accounting case might not make complete sense (like SOX), but the point here is people don't look at just the end result (the accounting system transactions) of the accounting process.
Look, I write software for control systems (and I design them electrically too). Just because programmers at Microsoft or EA Games have tight schedules where they are just too stressed to write code well doesn't mean all code needs to be written like that.
Back to what you were saying, if you have a system that could cause damage or whatever, then you start by writing your output routines, and you create rules to govern the machine (i.e. outputs A and B can't come on at the same time, or output C can't exceed this value). Then you write another module that monitors the inputs AND outputs looking for fault conditions that shuts down the machine if you do anything dangerous. Only this part of the code needs to be signed off by an engineer. Typically it's simple code, and easy to prove correct, with peer review. Then you write other modules that essentially make requests through the safety checks to do anything. You don't have to review the complex other code so much, because your output stage should catch any mistakes.
That's how you make a machine safe. Unfortunately, most engineers I know just go out and write the software figuring there's no difference, and that's how bad things happen. It comes from believing you won't make a mistake, or believing that testing will catch all problems. If you plan from the start that you're going to be making mistakes, you can catch them before damage is done. It's too bad this isn't taught, even in the software engineering classes I took at a Canadian university.
"I have never let my schooling interfere with my education." - Mark Twain
1. Design reviews, by peers and independents
2. Code reviews, by peers and independents
3. Regulary, organized, unit testing
4. Correctness proving
5. Documentation is about a bazillion forms
6. Defect tracking
7. Effective software process metrics measurement and improvement
8. Continuing education
9. Humility / egoless programming
This list was assembled in about a minute off the top of my head. I work in a CMM3/4 type organization, and although there are processes for these things, most people don't use them, or consider them a hassle.
So my point is, the parent is right -- creating good software, even when done by properly trained experts with great experience -- is hard. But the grandparent is right too -- doing all of the above to 'do it right' takes time and money, and many organizations, and by this I mean software process management as well as the actual engineers, don't understand the value / aren't willing to pay for or aren't willing to do all that work. And occasionally, as the article shows, the piper comes and takes his payment.
In Soviet Russia, us are belong to all your base.
Your example is incomplete. Imagine that you type "rm -rf / junk" and the system responds "Delete /junk?", so you answer "Y" and it then deletes the whole filesystem.
It is most certainly a bug. First, there is a mismatch between what is shown on the screen and what the system is doing. That is a bug by any definition. Second, the system obviously had gaps in its validation of input. This makes it no less of a bug than many of the others listed (eg fingerd bug).
Furthermore, it is the responsibility of designers and developers of medical software to ensure that potential hazards are identified and mitigated. A hazard of "calculated dose does not match image shown on screen" is not some obscure hazard that no one would have thought of - it is the first that comes to mind!
Please tell me that these people are not involved in medical software anymore.
My guess is that the operator really wouldn't know either (although, she would probably assure me that it "it's very safe").
Why is there an "insightful" mod and why isn't it "-1"? If I wanted insight, I wouldn't be reading
If it weren't for two humans who said "fuck what the computer says!", we might be in a very different place right now.
I guess that is why they were there.
Computers are excellent at performing according to the logic that is programmed into them. For the most part, they cannot "think" or take a step back and say, "I'm sure I did everything right, but something still looks wrong". I used to put on my math tests something like, "I know this is not the right answer, but here is my work". To me, that is much more important than purporting that the answer is correct, and most of the time, I had done something stupid that given more time than the class allowed, I would have found the error.
Just recently, I had an issue with my bank because I had just over my minimum balance to not receive a maintenance fee. But one month I did dip below that minimum value, and I put the money back in shortly after that. Anyway, for a few months I was still getting maintenance fees because the balance was going below the minimum value because the maintenance fee was causing the balance to go below the minimum again. I would go to the bank, show them my balance history, and they would say sorry and refund my account. However, the refund was not applied at the time of incident, but immediately, so next month I would get another fee.
Finally, I said, "Look, I can't keep coming in here to get this fee removed. Especially, when the fee is because of a fee, and I've been able to keep the balance at the agreed upon amount with the exception of when you keep billing me. I could put more money in the account to compensate for a fee so that it would not drop below the minimum, but in my eyes, that is similar to extortion. I can close the account if necessary." Finally, the banker put a fee hold on my account for 45days or so, and its only a memory now.
Some of the bugs they listed are not truly bugs.
Soviet Gas Pipeline...This was a desired feature working just as intended (unless they CIA didn't want to blow up the pipeline)
Buffer Overflow in Berkley - a worm is not a bug. it is a program designed to infiltrate a system and do something. While the people utilizing the program may not have intended this to happen (duh) the makers of the worm did.
A bug is an unwanted aspect of the code as implemented by the people who wrote (or edited the code) but this does not include something affected by a virus/worm. A program that crashes every six minutes for no apparant, or intended reason has a bug...a program that gets infected by a virus which causes it to crash every six minutes is not a bug. Also, a piece of code that is intentially inserted in the hopes of crashing a system is not a bug...it is a feature. It may be undesirable, but it is a feature.
I mod down so you can mod up. Your welcome.
That works great for the kind of machines you described, and I wasn't saying good code couldn't be written, I'm saying that it isn't usually written, and won't be written in off the shelf software. The problem is, if you look at something like a mars lander, then you can't just shut it down if it gets some bad inputs. Also, even good inputs can result in the the machine not doing what it needs to do. If it has to land on mars, and some input tells it to fire the left rocket for 4 seconds, then the input may fall within proper values, but may push it way off course. There's no GPS in space, so you can't get your position very accurately, and if you go way off course, you may not have enough fuel to get back on course.
Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
It really doesn't matter what language you use: bugs can be written in any of them. In this case, the customer wanted a GUI workstation running on Windows, with the possibility of being cross-platform. Java was new and cool (1.1 had been out for six week when we started), and they decided to give it a shot. This is a company with fifty years experience in medical systems, not some dotcom startup, so the procedures are in place to make sure that their products don't kill people.
As it turns out, JDK1.1 (along with a native-C library for quick image processing, and a custom PCI card for doing 30MB/sec image transfers) was just fine for the task. We had a team of seven testers working on the project full time for a year, and were able to ship with zero severity 1-2 defects.
We set a new record for lowest defects/KLOC at the customer (a major player in the medical systems industry), despite running JDK 1.1 on Windows NT 4. Our product was several times faster than the C-based product it replaced, had more functionality, and provided more accurate diagnosis for the patient.
Good design is the most important thing in developing good software. The language/runtime/OS can provide crutches to save you if you screw up, but bad design will result in defects no matter how sturdy the crutches are.
The problem is, if you look at something like a mars lander, then you can't just shut it down if it gets some bad inputs.
True, it's just that in my line of work, off is usually the safe state, but what should be done is to go to some kind of safe state, whatever that may be. Sometimes you revert to a manual operation, for instance.
Also, even good inputs can result in the the machine not doing what it needs to do.
Which is why you need to also hire a mechanical and an electrical engineer to design those aspects so that the mechanical and electrical systems fail in a safe and detectable way.
For instance, it used to be that stoplights were designed with a physical disk inside that rotated creating the "program" of the different lights. You also had interlocked electrical circuits so that both greens could never come on at the same time. These are mechanical and electrical ways to make the system fail to a safe condition. I have recently been at an intersection where I saw the traffic lights had green both ways (during a storm). This is because some vendor is selling a traffic light system on the market that is completely software based, and they hired a bargain basement programmer and/or engineer to design it, and we should find them and shoot them for their incompetence, but I doubt that will happen.
"I have never let my schooling interfere with my education." - Mark Twain
I think the whole thing has to do with the different spheres of knowing (IIRC - the actual title might be different):
1. Knowledge you have that you are aware of
2. Knowledge you have that you are ignorant of
3. Knowledge you are aware you are ignorant of
4. Knowledge you are are not aware you are ignorant of
So, as you move knowledge from the other areas into area 1, you tend to pull things "up" if you will. Knowledge moves from 4 to 3 as well.
2 isn't a contradiction, just that you might not be aware that some "tip" is true, or may not realize at a certain time that certain stuff you know is actually relevant to the situation at hand.
The scary part is area 4 is a default, so the less you move "up", the less you are aware that you don't know things. This is why lots of people say things like you did - the more you learn, the more you learn you don't know.
Opera, Proxomitron-Grypen,GPG 0x0A1C6EE3
You are wasting your time. US is perfect, everyone else sux. Get it, ok! Now go vote for some NeoCons.
"If Engineers built buildings the way computer programmers wrote programs, the first woodpecker that came along would destroy civilization."
If engineers built buildings the way computer programmers wrote programs, an average engineer would be able to build an array of radio telescopes by himself in one evening. A team of 30 engineers would be able to build a ringworld in 3 months.
i.e. it would be nice if software were like designing an office, where there were 3 architects, 5 engineers, a building inspector, and 50 professional workmen to examine a system containing just a few hundred variables, and almost identical to the last 20 buildings they'd constructed.
And in case that didn't start a flamewar, how about...
"Just one unexpected input (of an aeroplane) caused the failure of two of New York's biggest civil engineering projects -- imagine how they'd cope with being attacked every 3 seconds like some internet software"
"Dude, focus here - democractically elected - not sham election"
Right, and you verified the elections of these people how? let me give you an example, Hugo chavez led a military coup to take over his country and then later was "democratically eleced" according to Jimmy carter. I am going to assume you consider his government legitimate.
I can tell you one thing, if the US hadn't intervened in many of those countries, the wouldn't have free democratic govenments today. It is far easier to remove a dictator who is just a man than it is to eliminate an entrenched political party/idea like communism. Compare N. korea to S. korea or taiwan to china... etc etc.
The war with islam is a war on the beast
The war on terror is a war for peace
Although in this incident there is a clear operator error (attempting to do some function clearly out of spec), the creators of the software are also to blame, if the problem was as you described it.
Changing the order of the vertices of a geometric figure should not affect the way the "inside" of the figure should be, since the order of the points is irrelevant (geometric-wise, as in mathemathics).
The software should have probably prompted the user (in all cases) which should be the inside area and not assumed something that is not clearly defined (especially since we're talking about a potentially lethal assumption).
As the sibling posts say, a better UI would have probably helped a lot, but there was a fatal mistake in the software from the beginning.
You shouldn't call software made under insane management and disregarding procedures "rock solid" (especially if there are deaths involved). It is definitely not. I would have supposed that software developers would have taken a hint after Therac-25.
GPG 0x1B479C78