History's Worst Software Bugs
bharatm writes "Wired has an article on the 10 worst sofware bugs.. From the article 'Coding errors have sparked explosions, crippled interplanetary probes -- even killed people. Here's our pick for the 10 worst bugs ever, but the judging wasn't easy.'"
Wonderful article. Twenty years ago I believed that writing software would soon become a licensed profession. (Need a
license to own a compiler, for instance.) I thought that the event that would inevitably trigger this is when a software
bug caused a human death.
I still believe that programming will eventually require a license, but I now think that lobbying by the big media
companies will be the cause. Depressing, huh?
When you are writing software for life-critical applications, there is various software and techniques that ensures bug-free code. Just look at all the airplanes, powerplants, car computers, etc. It's not very usual at all to see one fail critically.
Send email from the afterlife! Write your e-will at Dead Man's Switch.
The moth was trapped, removed and taped into the computer's logbook with the words: "first actual case of a bug being found."
Why would they say that, if the term "bug" didn't exist? I mean, you wouldn't find a rat in your car and say "First actual case of a car 'rat' being found" if you didn't use it as a term to indicate something. You'd just say "this bug caused computing errors". I smell a car rat.
Send email from the afterlife! Write your e-will at Dead Man's Switch.
Yes, I saw that too and I guess they have forgotten the most devastating MS bug which is present in all releases from NT 3.1 and at least up to 2k. I haven't tested XP.
I couldn't find the description right now, but I'm sure others know the bug. The one were you can basically type a special textfile using type-command or similar and will basically BSOD the machine. The file consists of tabs, spaces and newline/carriage return pairs and nothing else. MS never fixed the bug.
If you mod me down, I *will* introduce you to my sister!
If continuing education for programmers was a priority, quality would be better.
This also requires more than the current courses which are pretty much level starter course. It is sad that after a few days being busy with a language before a course, you will already find mistakes/bugs or just better ways to do it than what is promoted in the course.
For example after a 3 day crash course (I missed day 1, else it would have been 4 days), I became a certified Stellent developer. So a "real" test at the end to determine if you are worth it or not.
My wife's sketchblog Blob[p]: Gastrono-me
And i suppose if I had a "broken" gun in my basement and you broke in and stole it, then tried to use it and injured yourself, you could sue me right?
Sorry, i am having a hard time seeing the correlation to Terrorism here. It seems that you have a predisposition to the US's stance on terror and are desparately trying to make a connection for a political statement. Unfortunately, typical slashdot readers will agree with you =)
This would be very different if the US broke in to USSR and altered their software to malfunction. That definitely would be a criminal act, but more perhaps importantly and act of War. I doubt the Soviets ever figured out what happened until they were told.
Are you intolerant of intolerant people?
I don't think you can justify the largest non-nuclear explosion ever just because it was a "side-effect" of economic damage. otherwise it becomes very easy to justify 9/11 since all the targets were economic/military.
on a much smaller scale, I think it's illegal here in UK to set "traps", for example a landmine in your house in case of thieves breaking in. I believe the reasoning is the indiscriminate nature - it could kill a fireman trying to save the house from burning down.
similarly, even in war, indiscriminate killing is ethically wrong and I doubt the gas workers were wearing military uniforms (and I guess the US still pretended to care about the Geneva convention back then)
That's easier said than done. After all, buggy software is usually better than no software. But who's to say that it will even prevent the problem.
Mariner I software was correct, but failed because the software was incorrectly typed into the computer. The Ariane 5 software was correct when it was written for Ariane 4. The only way to find that bug is with simulation of the whole system. The Therac software was correct because it was part of a system of hardware interlocks. Later machines took half the system without replacing the other half and people expected it to work the same way.
Formal specs won't help you if your software is not being used as designed or if the designer can't know all possible inputs (such as fly-by-wire software for aircraft).
dom
I designed and build a diagnostic radiology workstation (in 1997, in Java 1.1, 4x5 megapixel monitors, still in use today). During the development effort we were regaled with stories of software glitches in medical systems resulting in disaster. It really keeps you focused.
In one case, a radiation treatment system had a bug where if you used the backspace key when entering the dose a patient received, the display would show you deleted the last digit, but internally you hadn't. So the patient would recieve 10^backspace times the intended dose of radiation. Not a big deal normally, since the techs would typically shut the machine off between treatments. Until one day when they had two patients needing treatment back to back. The tech knew something was wrong when the machine was running for an unusually long time. The patient knew something was wrong when he died.
On our team a defect that crashed the system was considered severity 2. Severity 1 was reserved for defects that could result in a mis-diagnosis, which most patients agree is worse than a crash.
I think it would better be called terrorism.
Why? Because code that the Soviets stole from the US turned out to be (from their perspective) defective? I don't think it's terrorism if my car blows up while you, having stolen it, are driving it around.
More to the point, though, the CIA's objective was to cripple the cash flow of the Soviet Union, an entity that really was busy terrorizing much of the world. Their murderous, oppressive grip on Eastern Europe and attempts at foisting their cheerful utopia on South America and Africa wasn't going to get anywhere without the cash they were trying to raise by selling Siberian natural gas to the west. Making the Soviet government's cold cash sales operation less workable for them was part of what finished pulling the rug out from under that hellish government. That they so desperately needed western cash was a sign of how hollow that regime actually was, and that event just added clarity to the picture. I doubt the CIA expected that exact outcome, but you never really know what someone's going to do with the stuff they steal from you. Makes you wonder what's ticking under the hood in North Korea's squalid little IT universe, doesn't it? No doubt our team, and China's as well, have planted similar things in case they're needed. Tactics like that are going to be more subtle now, probably.
Don't disappoint your bird dog. Go to the range.
I actually did a research report on the Therac-25 incident while I was in Software Engineering class a few semesters ago (I was also in Technical Writing at the time, so I could kill two assignments with one report!) ;-) The details of the incident(s) are actually quite fascinating and sometimes spine-chilling.
Here's the report in PDF if anyone's interested: reportfinal.pdf
And in HTML for those of you who prefer it: link
And what makes you think that phone network software isn't peer reviewed?
regards,
treefrog
From Wiki page:
It also found that FirstEnergy did not take remedial action or warn other control centers until it was too late because of a bug in the Unix-based General Electric Energy's XA/21 system that prevented alarms from showing on their control system, and they had inadequate staff to detect and correct the software bug. The cascading effect that resulted ultimately forced the shutdown of more than 100 power plants.
"1982 -- Soviet gas pipeline. Operatives working for the U.S. Central Intelligence Agency allegedly plant a bug in a Canadian computer system purchased to control the trans-Siberian gas pipeline"
can this really be considered a bug? It was an intentional software error..
My dad tells this story from time to time. I don't know if it's true, but it makes a good story. Back in the early days of computers when only big corporations had them, most software was written in-house by staff programmers. One of the major soda manufacturers had a new mainframe and had one of their top programmers write an accounting package for them. It so happens that the manufacturer was a major competitor of 7-Up. Well for whatever reason the programmer left the company on not-too-good terms. The very next time the manufacturer when to print out a report from the accounting package, every 7th page contained the phrase "Drink 7-Up" in big block letters. They had their remaining programmers go back through the code and try to remove this new "feature" but they were unable to. This guy was so good that he'd embedded the logic for this nastygram right into the actual logic of the accounting package. Supposedly there was code that would dynamically generate other instructions that, when executed would generate other instructions, etc. They were supposedly unable to get rid of the 7-Up message without breaking other parts of the program, so they ended up having to go back to square one and write a whole new accounting package.
So the story goes...
From the post:
The resulting event is reportedly the largest non-nuclear explosion in the planet's history.
The actual quote from a hyperlink in the article mentioned in the post:
"The result was the most monumental non-nuclear explosion and fire ever seen from space"
The actual largest non-nuclear explosion occured during World War One in Halifax Harbour when an munitions ship collided with another ship and exploded. It is known as the Halifax Explosion. It was picked up on seismographs and created an 18 metre tsunami.
-- I ignore anonymous replies to my comments and postings.
Years ago, while working on a project for a medical firm, I found out first hand just how horrible things can go wrong with what we eventually agreed was a "bug" but was more of a "human bug" issue that made me sit up and realize that it's not just programmers who will use our programs.
Without getting to detailed, the end users were allowing certain conditions to go unchecked as the software was telling them it was "OK". There was a rather neat explosion (read, small) that hurt nobody and damaged some equipment because instead of being "OK" it was telling the operator that there was exactly "ZERO K" of space available for data storage on a recording device and the test needed to be shutdown.
Now, the operators were told that when the counter got low the would see a warning and be told to stop the tests so, was it a bug, was it my assumption that these 11.95/hour service techs would "understand" what "0K" means from "OK" (that's a zero(0) and an O there)? Either way, there was some damage, we had a bit of a laugh, but at least nobody got irradiated and died.
Why do overlook and oversee mean opposite things?
Right, but back then you had to know how they worked to operate them. In fact I've never seen a modernized steam engine that ran itself. You couldn't even crest a hill too fast, or you'd have a flash in the boiler and blow the thing up, potentially killing people who weren't even in or near the engine at the time since there's a lot of energy involved in phase change and the boiler parts are all heavy. Thus steam engineers actually knew something, or they (and many people around them) were in a lot of trouble.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
> First, the ship did not need to be towed back into port, though it did sit dead in the water for a bit ..
y .html
t own.html
"The ship had to be towed into the Naval base at Norfolk, Va., because a database overflow caused its propulsion system to fail, according to Anthony DiGiorgio, a civilian engineer with the Atlantic Fleet Technical Support Center in Norfolk."
"Using Windows NT, which is known to have some failure modes, on a warship is similar to hoping that luck will be in our favor," DiGiorgio said
Curiously enough DiGiorgio later wrote a retraction and 'resigned` from the Navy as did Vice Adm. Henry Giffin.
"DiGiorgio denies reported statements"
"I did not say that the Yorktown was towed into Norfolk"
http://www.gcn.com/17_20/news/33292-1.html
"Ron Redman, deputy technical director of the Fleet Introduction Division of the Aegis Program Executive Office, said there have been numerous software failures associated with NT aboard the Yorktown."
"Refining that is an ongoing process," Redman said. "Unix is a better system for control of equipment and machinery, whereas NT is a better system for the transfer of information and data. NT has never been fully refined and there are times when we have had shutdowns that resulted from NT."
"The Yorktown has been towed into port several times because of the systems failures" [Ron Redman - Aegis]
"This is the only time this casualty has occurred and the only propulsion casualty involved with the control system since May 2, 1997, when software configuration was frozen," Vice Adm. Henry Giffin
> Second, the problem was in the software running on top of Windows
But the software made a call to Windows to divide by zero and Windows made a call to the fpu which did just that.
http://www.slothmud.org/~hayward/mic_humor/nt_nav
http://www.jerrypournelle.com/reports/jerryp/york
According to several docs, this system was taken down by mod.
January 15, 1990 -- AT&T Network Outage. A bug in a new release of the software that controls AT&T's #4ESS long distance switches causes these mammoth computers to crash when they receive a specific message from one of their neighboring machines -- a message that the neighbors send out when they recover from a crash.
One day a switch in New York crashes and reboots, causing its neighboring switches to crash, then their neighbors' neighbors, and so on. Soon, 114 switches are crashing and rebooting every six seconds, leaving an estimated 60 thousand people without long distance service for nine hours. The fix: engineers load the previous software release.
Engines have existed for centuries. Roman engineers, for example, built siege engines (ballistas, and the like).
Dictionary.com
Engineer
[Middle English enginour, from Old French engigneor, from Medieval Latin ingenitor, contriver, from ingenire, to contrive, from Latin ingenium, ability. See engine.]
Engine
n. 1.1. A machine that converts energy into mechanical force or motion.
2.1. A mechanical appliance, instrument, or tool: engines of war.
[Middle English engin, skill, machine, from Old French, innate ability, from Latin ingenium. See gen- in Indo-European Roots.]
Some person down the line noticed that the Russians didn't have that many missiles, couldn't have launched them all with such synchronization, and that there were an awful lot of two's in the report ... actually, every digit of every number was a two. It turned out to be a fried chip somewhere, always pumping out the same bit regardless of input (I have no understanding of the technical side of the issue; maybe it hit the 32-bit limit and the int->string function reacted with 2's).
Good thing we were not too automated, and that we employed somebody smart enough to critically examine his printouts.
Disclaimer, this is a favorite tidbit of one of my professors ... I have no real source to refer to.
Use my userscript to add story images to Slashdot. There's no going back.
Guess what: They don't, although they appear to be hedging their bets with safety critical software.
An interesting read...
"Prepare for the worst - hope for the best."
Consider how much software is written by people with five years or less of professional experience, on short schedules, with no time allocated for continuing education. If software projects weren't always rush jobs, and on relative shoestring budgets, the quality would be better.
The software reliability crisis has very little to do with greed, engineering incompetence or the lack of big budgets, in my opinion. There is something fundamentally wrong with the way we program our computers, something that no amount of quality control measures can ever cure.
The reason that software is bad has to do with a custom that is as old as the computer: the practice of using the algorithm as the basis for software construction. Switch to a synchronous, signal-based approach and the problem will disappear. Complex algorithmic software is essentially unreliable, something that Fred Brooks has shown in his now famous "No Silver Bullet" paper back in 1987. For an alternative approach to software construction see this article in The Silver Bullet News.
Regardless of what has been said in the past, the problem can be solved. Otherwise, we are in big trouble, very big trouble.