Examples of Programming Gone Wrong?
LightForce3 asks: "I'm a beginning CS student, and in my studies I've come across examples of programmer error causing very large problems, such as the Ariane 5 failure and the Therac-25 accidents, often as tales of caution to beginner programmers such as myself. My (morbid?) curiosity has been piqued, and I'm looking for other examples of programmer error leading to serious problems. After all, it is better to learn from the mistakes of others than from your own, right? ;) What programming-related accidents, incidents, and failures, both well-known and obscure, do Slashdot readers know about, and are there any good resources for researching these?"
this is already any ask slashdot from a while back.. check the archives.
This book is devoted to just that. It's what you're looking for...go get it and read it.
http://slashdot.org/articles/99/09/30/1437217.shtm l
-- Kircle
1.) Patriot missile failure
2.) Intel f*cking up floating-point calculations in one of their chips
3.) High-tech toilet glitch (no, really!)
4.) Windows ME
If you celebrate Xmas, befriend me (538
What happened to Challenger wasn't a programming mistake, but rather a case of not following policy. The solid rocket boosters were never designed to operate in cold temperatures. The result of working outside of design specs was catastrophic failure, yes, but that wasn't the result of a programming error.
A Central Office (CO) switch is basically a mainframe-class computer programed in assembler. A few years back, a newly-installed switch failed due to a bug in the code, causing a cascading failure of the phone system for a few hours.
Incorrect: This was not a programming issue. Nor was it a software issue at all. The problem was the O-ring seals in the SRBs (Solid Rocket Boosters). The manufacturer stated that they should not be operated under 53 degrees, and NASA overrode the recomendation and launched anyway. The expected happened.
NASA hasn't ever had a hardware problem. Or a software problem. Ever. Every problem can be directly tied to one specific person being a fscking moron. The closest you could come is that Mars probe that crashed because of mismatched units. And that was just poor communication among the software guys.
The RISKS Digest is a mailing list and usenet newsgroup that describes all kinds of situations where technology has gone wrong. Many of the stories involve programming errors.
Google's RISKs Archive
Here's the Link
-- Kircle
MIT runs a class called 6.033: Computer Systems Engineering. These lecture notes contain a list of projects that had great sums of money spent on them only to be abandoned. Also the reading list has a bunch of papers that discuss the "big splash" failures like Therac 25.
but couldn't find it.
Anyway, here are a couple of links.
Software horror stories
More horrors
I think it would be interesting to know when the first infinite loop occured in the early days of programming, and how the programers dealt with it. Obviously, back then they only had single-tasking machines.
Let's say you turned in some bad FORTRAN code to the university computer on a time share. What if nobody noticed for hours that your program was taking up all the processing time? That would make some people pretty pissed. :p
Come on now, that's the lazy way!
How about citing an actual example of windows code bugs causing big problems? I'll go first. The USS Yorktown had to be towed back to harbor when the NT system that was automating most of the ship crashed.
that was not an error in the programming... some dumbass gave all the calculation in English units for acceleration to the programmer who writes his program using SI for units (or metric... same thing...).
US shooting down Airbus 320
You're referring to the destruction of Iran Air flight 655 by the USS Vincennes near the Strait of Hormuz, on July 4, 1988. For one thing, it was an Airbus A300 (bigger and older than an A320). The failure there was mostly in human decision making, not in the AEGIS radar system, which faithfully reported that the airliner was travelling at 450 knots on a steady bearing towards Vincennes, roughly four miles outside the commercial air corridor, and not broadcasting IFF information (which of course they wouldn't, as a foreign civilian airliner). It was the officers of Vincennes who interpreted this information as a threat, misidentified the target as an Iranian F14, and destroyed it.
Toronto-area transit rider? Rate your ride.
Wouldn't setting it to something like 0 be better?
In most areas of the world (unless you're flying over the Dead Sea, or Death Valley, or New Orleans), if your altimeter reads 0, you're probably already dead. Altimeters used for navigation read MSL (height above mean sea level), not AGL (height above ground). There are radar altimeters that read in AGL, but these are used for close-to-ground maneuvers like landing.
the database they were using faulted on a divide by zero. nothing to do with NT.
I can't recomend the risks site too highly. (redundent I know)
a m.html
Risks To The Public In Computers And Related Systems
http://catless.ncl.ac.uk/Risks
On how to be 0wned by other people: Counterpane: Crypto-Gram . Shares with comp.risks the reframe of "I can't belive people don't learn from this"
Counterpane: Crypto-Gram
http://www.counterpane.com/crypto-gr
Don Norman's _The Design of Everyday Things_ and website also offer insight on how to avoid UI failures relating to failures.
http://www.jnd.org/index.html
Also, get a copy of _Code Complete_ and/or _Code Write_ by Steve McConnell [pub: Microsoft Press Which is rich irony) Lots of mistakes and how to avoid them.
The cautionary note might be that most of these failures are human related at some level. Whether it be at the project level, or the UI level -- there are lots of ways to cause a failure.
Finally, avoid any kind of carreer in Software QA. There is no better way to just get kicked around at the expense of the people putting the bugs in the software in the first place.
Anybody can work under ideal circumstances. -- Jeff K. (January 4, 2001)
Someone here was claiming that NASA has never had a software bug. That sounded pretty unbelievable to me. And sure enough, it's not true. In the recent Mars missions alone, they had a bunch of software bugs resulting in things varying from non-fatal vehicle failures to outright loss of spacecraft.
Regarding the loss of the Mars Climate Orbiter spacecraft, from nasa.gov: "The 'root cause' of the loss of the spacecraft was the failed translation of English units into metric units in a segment of ground-based, navigation-related mission software"
Also, here are several "software bugs" (their words) relating to the Mars Surveyor Lander Vehicle are described. These bugs were detected and fixed in the field (ie, Mars). At least one of the bugs caused a heater failure in the vehicle on Mars. This failure was recovered from.
Anyways, those are just two quickies, but NASA has their share of bugs. (And generally some pretty ingenious ways to reprogram and update vehicle software post-launch.)
On a related note, here's a paper from NASA entitled "The Infeasibility of Quantifying the Reliability of Life-Critical Real-Time Software".
Much as I dislike NT, especially in critical environments, this problem had nothing to do with NT. It had everything to do with bad coding.
As we all know, information systems are only as smart as people make them. In the case of the USS Yorktown, an admin/operator entered data which caused a divide by zero condition in the application. Because the application did not have any exception handling built into it for a divide by zero condition, it died.
You can't blame the OS for this. The application should have had exception handling built into it in a couple of places. It probably should have checked any new entries before comitting them to ensure the new data would not introduce such a condition, and the app itself should have had appropriate error handling to prevent a panic/dump when a divide by zero condition was encountered.
If the app was coded by the same people on another platform, the end result would have been the same.
Idiot, n. A member of a large and powerful tribe whose influence in human affairs has always been dominant
Actually, the switching code was in C and the crash was due to a programmer's apparent misunderstanding of the 'break' statement. See full details at: http://www.csc.calpoly.edu/~jdalbey/SWE/Papers/att _collapse.html
http://wwwzenger.informatik.tu-muenchen.de/perso ns/huckle/bugse.html
Back when C++ was new, there was an insidious problem with the syntax that never showed up during compilation.
//check for \ //found one, handle path
if(c=='\')
slashfound=1;
++index;
Code similar to this delayed shipment of a commercial product because it caused serious instability.
Some of the tips, which may appear obvious to some of us, include:
--- Fox
Unfortunately, it has been conclusively proven by experience that the risk of an incapacitated pilot causing an accident is much, much less than the risk of a pilot and computer being at odds over the correct course of action in an emergency, or the risk of computer settings confusing the pilot. I prefer the Boeing design philosophy, which is that the pilot is the final authority on the operation of the airplane, not the computer. The pilot, not the software engineer, is on board the airplane, and therefore has a much higher interest in ensuring that the vehicle gets on the ground in one piece.
Toronto-area transit rider? Rate your ride.
Prehaps true, but back in the days of punchcards anc COBOL you wernt storing a integer for a date, you were storing a string.
You're all sorta right.. here is one of my favorite aviation pages It'll tell you more than you ever wanted to know about airplane physics (from a pilot's point of view). Chapter 1 covers these altitude/speed/power concepts...
A "large quantity of water" entered the storage tank because an employee who had just been fired dropped a hose into it out of spite (he didnt know what would happen, he just wanted to ruin something). Yes the safety precautions were under-par, but when someone with legitimate access wants to destroy something its pretty hard to prevent.
:).
And yes, this has nothing to do with programming error
In particular, there was a management decision that the software for the previous model would be used, even though the design criteria for the new model were different. In particular, the Ariane 5 was capable of accelleration that overflowed variables in the program written for Ariane 4.
Speaking of aviation: This SAAB Gripen crash was attributed to the coding of the control laws in the flight control computer. So was this one. And this F-22. And lets all remember the Apollo 11 incident.
Equine Mammals Are Considerably Smaller
I think I've recommended this book serveral times on Slashdot. Simply put, THE collection of computing related horror stories.
2 01 55805X/qid=1035769692/sr=8-13/ref=sr_8_13/104-4078 673-1863905?v=glance&n=507846
http://www.amazon.com/exec/obidos/tg/detail/-/0
I swear by MacOS X. Although I use to swear *at* MacOS 9...
Just for the record: he never went to jail.
From my Software Engineering textbook (author: Vliet if you're interested), a few references you might like: - http://www.csl.sri.com/users/neumann/neumann-book. html
- http://www.rothstein.com/slbooks/sl296.htm
Also, you might like:
"Design Paradigms: Case Histories of Error and Judgment in Engineering" by H. Petroski (not restricted to Software Eng)
Enjoy,
Rod
"I respect faith but doubt is what gets you an education." --who knows
On a slightly different tack - the
Sleipner A oil platform sank because of a bad design, caused by inaccurate computer based modelling (using an FEA tool inappropriately). In this case it was the data not the software.
Of course they didn't. The patriot was specifically designed to detonate itself CLOSE TO the offending missile and, hopefully, in the process destroy the latter. This is, in fact, what happened: Tel Aviv and surrounding areas were rained on by falling scud parts. These were pieces of the scuds intercepted by the Patriots.
The problem of intercepting a moving target is difficult, but it becomes much easier when the goal is to simply get "near enough" to disable it with an explosion.
... is whot bwings os tugevza tsuzay.