Examples of Programming Gone Wrong?
LightForce3 asks: "I'm a beginning CS student, and in my studies I've come across examples of programmer error causing very large problems, such as the Ariane 5 failure and the Therac-25 accidents, often as tales of caution to beginner programmers such as myself. My (morbid?) curiosity has been piqued, and I'm looking for other examples of programmer error leading to serious problems. After all, it is better to learn from the mistakes of others than from your own, right? ;) What programming-related accidents, incidents, and failures, both well-known and obscure, do Slashdot readers know about, and are there any good resources for researching these?"
...wherein a technique to save memory on older computers resulted in a massive media panic twenty years later. Oh, and it caused a couple glitches
Co-founder of GerbilMechs
Why not provide a link instead of saying "Oh yeah, I saw it way back when."
You people who say "use google to find it" or "this was already asked" are worse than the people who actualy ask the question.
Their only problem (if it could be said to be a problem) is ignorance, your kind however are a much better example of the problem of self-rightous lazyness.
Don't be so narrow in your approach. Is it a programming error if a stadium roof collapses because the engineers couldn't understand what the output of their computer model was saying?
What about when the construction crew quietly substituted what they thought was an equivalent design to what the computer program came up with for a skywalk over a hotel lobby?
After almost 20 years in this field, I think that at least 80% of the serious "errors" I see are because the user didn't understand the results of the program, and only 20% of them are due to classic development errors.
The lesson to learn from this: the user interface matters. Give some thought to presenting the information in a meaningful manner (e.g., the infamous pre-Challenger graphs showing O-ring erosion vs. the post-Challenger graph that mapped damage by temperature at the time of launch), and allow users to see the information in the way that makes the most sense to them.
For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
Yeah it is probably in the archives. I've read it before.
:-)
Problem is, the slashdot search engine sucks. I haven't yet been able to query the archives and actually find what I'm looking for without needing to dig through hundreds of irrelevent discussions. Sometimes I think it might be faster to just scroll back through the "Older Stuff" section.
Or we could just have another discussion about it.
Ascalante: Your bride is over 3,000 years old.
Kull: She told me she was 19!
The database failure caused NT to crash. Good software design includes failure planning.
No, I meant read it until you understand it. I don't want anyone working for me that doesn't think understanding documentation is a good thing or doing something the correct way rather than "it works so I might be doing it right."
And there's a difference between not being able to code and understanding a particular function. I may read a function's man page 2 or 3 times to make sure I understand correctly what is going on. Not nessesarly because I'm incompetent, but because the wording my be confusing (wow, confusing wording in a manpage? Who would have thought..). That doesn't mean every single function for a particular language requires you to read the documentation for it multiple times. I assume nothing. Assuming something leads to bugs and insecurity. I've been programming in C for many, many, many years. When I do a little PHP programming to create some web interfaces I don't assume that just because both C and PHP have a function called strlen, and the general documentation says it returns the length of a string, that they work identically. So I read the entire strlen documentation for PHP to understand exactly whats happening. It only took less than a minute, but now I'm not assuming. I know. This goes for lots of things. The more complex functions you use, the more important it is to fully understand them.
The point is coding correctly is the most important skill to learn. I have friends that hack together scripts and programs from examples and snipits of other code and a little bit of their own code to glue it together, with little to no understanding of what they are actually doing. Then months later something breaks they can't fix and they act as if it was the author who wrote the example code's problem.
No, it's there fault. Not because they hacked together examples, but because they didn't take to the time to make sure they knew what the examples were doing, that the examples were implemented correctly, and that they understood exactly how the code in the examples worked.
Take a look at OpenBSD's philosphy.. You can learn a lot from it.
..There's a-dooin's a-transpirin'
From the site:
"In 1969, as part of its global empire, Union Carbide Corporation set up its pesticide formulation unit in the northern end of the city of Bhopal in central India. Initially it mixed and packaged pesticides imported from the US but was gradually expanded. In December 1979 its Methyl Iso Cyanate (MC) plant with an imtalled capacity of 5000 tonnes went into production.
On the night of December 2, 1984, during routine maintenance operations in the Methyl Iso Cyanate (MC) plant, at about 9.30 p.m., a large quantity of water entered storage tank no. 610 containing over 60 tonnes of AEC.
This triggered off a runaway reaction resulting in a tremendous increase of temperature and pressure in the tank and 40 tonnes of MIC along with Hydrogen Cyanide and other reaction products burst past the ruptured disc and into the night air of Bhopal at around 12.30 a.m. Safety systems were grossly under-designed and inoperative. Senior factory officials knew of the lethal build-up in the tank at least one hour before the leakage, yet the siren to warn neighbourhood communities was sounded more than one hour after the leak started.
By then, the poisons had enveloped an area of 40 sq.kms. killing thousands of people in its immediate wake. Over 500 thousand suffered from acute breathlessness, pain in the eyes and vomiting as they ran in panic to get away from the poison clouds that hung close to the ground for more than four hours."
Nothing to do with programming errors here that I can see. Sounds more like gross negligence and incompetence to me.
-A.
student of animation and the fine arts
"The best argument against democracy is a five minute chat with the average voter."
--Winston Churchill
of the Tacoma Narrows bridge falling. The *fault* was with the design, and hence, the designers.
An extended bolt puncturing the gas tank during a rear end collision was the *cause* of Ford Pintos exploding. The *fault* was with the design, and hence, the designers.
Both of these items could have been claimed to be perfectly free of design flaws while being used as "intended."
This argument did not help the designers in not being found liable for their design flaws.
The divide by zero error was the *cause* of the operating system's failure. The *fault* was with the operating system. The *operating system* crashed. An operating system failure is *always* the fault of the operating system, and hence, its designers.
Read any textbook on the design of operating systems and in the first page or two you find some sort of statement along the line of, " A faulty app should never cause the operating system to fail." This is correct design.
Let me repeat. If an app fails, it is the fault of the app. If the operating system fails, no matter what an app has done, it is the fault of the operating system. An operating system must *assume* apps badly written by complete incompetents.
It doesn't matter what operating system. Windows, Linux, Mac or just the beads on your abacus.
* It is the responsibiltiy of the operating system not to fail.*
The fact that such failures can be explained away as the fault of the app by people who should know better makes me grieve for the state of engineering these days. It can only result in products being produced with greater and greater "craposity" factors eventually resulting in a culture of complete "crapitude."
KFG