History's Worst Software Bugs
bharatm writes "Wired has an article on the 10 worst sofware bugs.. From the article 'Coding errors have sparked explosions, crippled interplanetary probes -- even killed people. Here's our pick for the 10 worst bugs ever, but the judging wasn't easy.'"
Wonderful article. Twenty years ago I believed that writing software would soon become a licensed profession. (Need a
license to own a compiler, for instance.) I thought that the event that would inevitably trigger this is when a software
bug caused a human death.
I still believe that programming will eventually require a license, but I now think that lobbying by the big media
companies will be the cause. Depressing, huh?
When you are writing software for life-critical applications, there is various software and techniques that ensures bug-free code. Just look at all the airplanes, powerplants, car computers, etc. It's not very usual at all to see one fail critically.
Send email from the afterlife! Write your e-will at Dead Man's Switch.
I wouldnt say they are the 10 worst bugs ever... more like the 10 most widely known media announced bugs. Okay I have no examples of any others but I'm sure there must be worse bugs out there...
anyone think of any others?
why?
That's because it's quality vs. quantity.
sarchasm
Bringing down the company's intranet countless times over the years almost seems like an amusing little distraction. No one died, nothing blew up, and I've even managed to keep my job. It must be that people are getting used to these "software bug" excuses for the various problems that pop up with computers. I'll have to remember that for next time.
Caller: "My computer exploded and I'm bleeding profusely!"
911 Operator: "Must be a software bug."
The fact the site appears to be buggered... :|
So nobody's hit on the really big one yet.
Deleted
Not everyone with a video camera can go out and film a network TV series. Does that mean we should require them to become licensed before they can operate their cameras?
No.
There will always be a difference between professional and amateur grade. You'll never need a license to run a compiler.
The moth was trapped, removed and taped into the computer's logbook with the words: "first actual case of a bug being found."
Why would they say that, if the term "bug" didn't exist? I mean, you wouldn't find a rat in your car and say "First actual case of a car 'rat' being found" if you didn't use it as a term to indicate something. You'd just say "this bug caused computing errors". I smell a car rat.
Send email from the afterlife! Write your e-will at Dead Man's Switch.
1995/1996 -- The Ping of Death. A lack of sanity checks and error handling in the IP fragmentation reassembly code makes it possible to crash a wide variety of operating systems by sending a malformed "ping" packet from anywhere on the internet. Most obviously affected are computers running Windows, which lock up and display the so-called "blue screen of death" when they receive these packets. But the attack also affects many Macintosh and Unix systems as well.
===
WinNuke made it...
MoM++ - A Classic Expanded - [Master of Magic 1.5]
http://mompp.sourceforge.net/
The last one on the list is this
... to me that sounds like a user not using the software correctly..
"Multidata's software allows a radiation therapist to draw on a computer screen the placement of metal shields called "blocks" designed to protect healthy tissue from the radiation. But the software will only allow technicians to use four shielding blocks, and the Panamanian doctors wish to use five.
The doctors discover that they can trick the software by drawing all five blocks as a single large block with a hole in the middle. What the doctors don't realize is that the Multidata software gives different answers in this configuration depending on how the hole is drawn: draw it in one direction and the correct dose is calculated, draw in another direction and the software recommends twice the necessary exposure.
At least eight patients die, while another 20 receive overdoses likely to cause significant health problems. The physicians, who were legally required to double-check the computer's calculations by hand, are indicted for murder. "
why?
Yes, I saw that too and I guess they have forgotten the most devastating MS bug which is present in all releases from NT 3.1 and at least up to 2k. I haven't tested XP.
I couldn't find the description right now, but I'm sure others know the bug. The one were you can basically type a special textfile using type-command or similar and will basically BSOD the machine. The file consists of tabs, spaces and newline/carriage return pairs and nothing else. MS never fixed the bug.
If you mod me down, I *will* introduce you to my sister!
The doctors wanted to trick the software. But then the software didn't work as intended. A really unexpected outcome, really :P
Oh wait, it wasn't
Engineers so good they had to steal their pipeline control software. And, apparently, a ton of other Western engineering too.
Build a man a fire, he's warm for one night. Set him on fire, and he's warm for the rest of his life.
Why do they have the Intel Pentium floating point divide error listed as a bug? That was a hardware design error in the circuit, it was not a software bug. Of course it caused software to behave unexpectedly, but still I'm surprised that Wired put that one in there.
Hero of Allacrost, a FOSS RPG for *NIX/*BSD/OS X/Win
Since that WAS one of the ten.
July 28, 1962 -- Mariner I space probe. A bug in the flight software for the Mariner 1 causes the rocket to divert from its intended path on launch. Mission control destroys the rocket over the Atlantic Ocean. The investigation into the accident discovers that a formula written on paper in pencil was improperly transcribed into computer code, causing the computer to miscalculate the rocket's trajectory.
1982 -- Soviet gas pipeline. Operatives working for the U.S. Central Intelligence Agency allegedly (.pdf) plant a bug in a Canadian computer system purchased to control the trans-Siberian gas pipeline. The Soviets had obtained the system as part of a wide-ranging effort to covertly purchase or steal sensitive U.S. technology. The CIA reportedly found out about the program and decided to make it backfire with equipment that would pass Soviet inspection and then fail once in operation. The resulting event is reportedly the largest non-nuclear explosion in the planet's history.
1985-1987 -- Therac-25 medical accelerator. A radiation therapy device malfunctions and delivers lethal radiation doses at several medical facilities. Based upon a previous design, the Therac-25 was an "improved" therapy system that could deliver two different kinds of radiation: either a low-power electron beam (beta particles) or X-rays. The Therac-25's X-rays were generated by smashing high-power electrons into a metal target positioned between the electron gun and the patient. A second "improvement" was the replacement of the older Therac-20's electromechanical safety interlocks with software control, a decision made because software was perceived to be more reliable.
What engineers didn't know was that both the 20 and the 25 were built upon an operating system that had been kludged together by a programmer with no formal training. Because of a subtle bug called a "race condition," a quick-fingered typist could accidentally configure the Therac-25 so the electron beam would fire in high-power mode but with the metal X-ray target out of position. At least five patients die; others are seriously injured.
1988 -- Buffer overflow in Berkeley Unix finger daemon. The first internet worm (the so-called Morris Worm) infects between 2,000 and 6,000 computers in less than a day by taking advantage of a buffer overflow. The specific code is a function in the standard input/output library routine called gets() designed to get a line of text over the network. Unfortunately, gets() has no provision to limit its input, and an overly large input allows the worm to take over any machine to which it can connect.
Programmers respond by attempting to stamp out the gets() function in working code, but they refuse to remove it from the C programming language's standard input/output library, where it remains to this day.
1988-1996 -- Kerberos Random Number Generator. The authors of the Kerberos security system neglect to properly "seed" the program's random number generator with a truly random seed. As a result, for eight years it is possible to trivially break into any computer that relies on Kerberos for authentication. It is unknown if this bug was ever actually exploited.
January 15, 1990 -- ATT Network Outage. A bug in a new release of the software that controls ATT's #4ESS long distance switches causes these mammoth computers to crash when they receive a specif
Something about their latest toy... ahm, ship that had to be towed back to port because Windows NT they used to run everything on the ship keep blue screening.
ELOI, ELOI, LAMA SABACHTHANI!?
Consider how much software is written by people with five years or less of professional experience, on short schedules, with no time allocated for continuing education. If software projects weren't always rush jobs, and on relative shoestring budgets, the quality would be better. If continuing education for programmers was a priority, quality would be better. If a couple of decades of experience was properly appreciated, quality would be better.
Probably more a case of bad design than a coding error, but in sure many of us have experienced the crippling pain of resolution changes in games etc. that do now defualt back to the original working one, leaving you with an unintelligible smear of a display, forcing you to have to fumble around blindly, vainly hoping that the menu sounds will help you restore the resolution. That last happened to me with NFSU2, and it is f***ing unacceptable for any non-amature software maker to cause this type of rage.....!
Hehehe.... This reminds me of a Dilbert cartoon. Here is what I can remember:
Some guy: And here is our random number generator.
Another guy: 2 2 2 2 2 2 2 2 2 2 2 2.
Dilbert: That isn't very random though.
Some guy: He is randomly getting the same number.
Anyone actually know which comic I am thinking of.
Ooo man the floppy drive is broken. No wait. The computer is just upside down.
This is my favourite.
I have discovered a truly marvelous proof of killer sig, which this margin is too narrow to contain.
I've read about this instance before, and I think it's attributable to ignorance on both the user and the developer. The software developer in this case knows the life of a human being is resting on his code, so it should have been nigh impossible to "trick" the software into allowing anything other than what the specs said it could do.
Proud member of the American Non Sequitur Society. We might not make much sense, but boy do we love pizza!
I found it a hard subject in school and have never used it practically, but it seems to be the only SURE way of proving the correctness of a program. Shouldn't we be using it, at least in real-time mission-critical applications now. I think it needs to be stressed a lot more in school from the start, as compared to topics like web development and java and all other pragmatic things that can be learned more easily.
Life is about being a Phoenix!
Jeuss Christ. I'd somehow never heard of this bug, and I've been developing for Windows machines for years.
How on earth was such a basic and low-level bug ignored for so long? It doesn't seem like rocket-science to fix it with a small bounds-checking if statement!
Everything in moderation, including moderation itself
Ahh, it sounded like WinNuke, apparantely it is different, thanks for the info :)
I decided to look up Ping of Death, too... it pretty much cleared up any confusion, between the 2 articles...
MoM++ - A Classic Expanded - [Master of Magic 1.5]
http://mompp.sourceforge.net/
But machinery nonetheless, designed and built by humans. As such it is not necessarily any more reliable than physical engineering--and its harder to verify.
I voted this morning on an electronic machine that did not produce a paper ballot. You better believe software security and reliability are on my mind.
Build a man a fire, he's warm for one night. Set him on fire, and he's warm for the rest of his life.
Since MS wasn't on the list, I'm sure we are going to see allot of flames!
Do people just open an article, do a Ctrl+F and type microsoft to find something 'juciy'? If you would have RTFA you would have seen that the 'Ping Of Death' was mentioned which did impact Windows machines.
- what is the definition of simultanagnosia?! I've been meaning to look it up!
I think this is an interesting concept, but I would question the criteria of the worst bugs being those that create fatalities that moment. I'd argue that bugs that affect a function's effeciency, or a bug that allows a vulnerability could impact many (millions?) people, for perhaps minutes, or hours.
:)
That impact could result in lots of people losing their jobs, losing their minds, and losing their families and lives through suicide or abuse to compensate.
Think about what a virus (maybe exposed through a bug or vulnerability) could do to a small company.
Think about what a bug could do to a computer science student
===
"This is a really small fix..." -said by all developers
Its intent was not to cause terror, but to inflict economic damage. I heard about a similar incident where a Japanese shipbuilder was stealing blueprints from a UK shipyard tendering for a contract and undercutting them. The UK shipbuilder deliberately designed a ship that would capsize on launch, which the Japanese duly stole, built, and launched. I don't know if anyone was killed, but ethically it's a tricky one.
Because it was implemented as microcode into the processor.
Indeed. Causing a *NON FATAL* explosion in a country that imprisoned as many as 2.5 million political prisoners in Gulags at one time, and is estimated to have murdered upwards of 60 MILLION of its own citizens. Terrorism?
Terrorism is an act of mayhem designed to terrorize. This did not.
Sabotage? Yes.
Act of war? Probably.
Terrorism? Not even close.
Your statement is just a display of anti-American rhetoric with no basis in reality.
Yeah, I'm fairly certain that Windows ME (or Windows: MoneyGrab) was one of the largest software bugs ever written.
Coding with assembly is like playing with Legos. Coding an application in assembly is like building a car with Legos.
This is like saying you need a license to operate a Soda Vending Machine because some idiot decided tipping it over trying to get a free soda was a smart idea. You might have to put warnings on compliers like do not code if you have no clue what you are doing, etc but requiring a license won't ever happen. I am sure there will be lawsuits in the future regarding software bugs, but any software being used where an error could cause a human death is going to have a corporation behind it, that can be held responsible. Actually, engineers obtain licenses so why not software programers. You are not even allowed to call yourself an engineer without getting that license. Basically, if this system was in place for programmers the programmer would have to take legal responsibility for his code.
Ooo man the floppy drive is broken. No wait. The computer is just upside down.
http://www.alexisparkinn.com/photogallery/Videos/A irbus320_trees.mpg/
(Let the slashdotting begin! (poor servers))
All things considered, I don't know if the pilots survived.
----------
Any problem can be made unsolvable if there are enough meetings made to discuss it.
Because it was actually implemented as microcode and stored into the CPU, whether as mask rom or some other means of storing, but it was indeed software either way you look at it.
That is wrong. This is a myth that has been disproved several times. See for example the "IEEE Annals of Computer History" where Adm. Grace Hopper said that that the term "bug" was used at least since the 30s, and maybe earlier, to describe an electrical problem in a system. See also here.
In interview, Hopper confirmed that the notebook moth's caption, "First actual case of bug being found", clearly shows that it was a joke referring to a term that was already in use at the time.
Any idiot researching this anecdote for five minutes could have found about it. I guess Wired couldn't be bothered. At this level of laziness and incompetence, one wonders why they just don't start publishing printouts of slashdot laced with ads. At least, this place contains occasional nudgets of truth.
Once again, Wired blew it. Nice jobs, guys.
--
Mad science! Robots! Underwear! Cute girls! Full comic online! http://www.girlgeniusonline.com/
According to this article the damage was purely economic. Of course, I'm sure there was significant environmental damage.
As far as I can figure, people go crazy in times of war, even cold wars.
I: ..... ..... ..... ..... ..... .....
100
200
300
399
499
How this VAX decided that 99 times through a i = 1 to 100 loop was OK, I'll never know. I re-ran code with the identical inputs and it never did this again.
Two wrongs don't make a right, but three lefts do.
According to the caption, "the term "bug" had been in use for many years previously by engineers to indicate an indefinite problem".
Laws do not persuade just because they threaten. --Seneca
And i suppose if I had a "broken" gun in my basement and you broke in and stole it, then tried to use it and injured yourself, you could sue me right?
Sorry, i am having a hard time seeing the correlation to Terrorism here. It seems that you have a predisposition to the US's stance on terror and are desparately trying to make a connection for a political statement. Unfortunately, typical slashdot readers will agree with you =)
This would be very different if the US broke in to USSR and altered their software to malfunction. That definitely would be a criminal act, but more perhaps importantly and act of War. I doubt the Soviets ever figured out what happened until they were told.
Are you intolerant of intolerant people?
Is it just me, or does the page fail to load in Firefox? I don't see the article text, but I do see "href='http://mediauk.247realmedia.com/..." where the article should be. Could be Flashblock or Adblock gumming it up.
I don't think you can justify the largest non-nuclear explosion ever just because it was a "side-effect" of economic damage. otherwise it becomes very easy to justify 9/11 since all the targets were economic/military.
on a much smaller scale, I think it's illegal here in UK to set "traps", for example a landmine in your house in case of thieves breaking in. I believe the reasoning is the indiscriminate nature - it could kill a fireman trying to save the house from burning down.
similarly, even in war, indiscriminate killing is ethically wrong and I doubt the gas workers were wearing military uniforms (and I guess the US still pretended to care about the Geneva convention back then)
Remember when the LA air traffic control tower crashed, due to a bug in MS software after 49 days. I would think that this would make it up there. http://www.itgarage.com/node/459
-----BEGIN PGP SIGNATURE-----
12345
-----END PGP SIGNATURE-----
1982 -- Soviet gas pipeline. Operatives working for the U.S. Central Intelligence Agency allegedly (.pdf) plant a bug in a Canadian computer system purchased to control the trans-Siberian gas pipeline. The Soviets had obtained the system as part of a wide-ranging effort to covertly purchase or steal sensitive U.S. technology. The CIA reportedly found out about the program and decided to make it backfire with equipment that would pass Soviet inspection and then fail once in operation. The resulting event is reportedly the largest non-nuclear explosion in the planet's history.
I didn't know what it was called, though. Thanks for the link!
Laws do not persuade just because they threaten. --Seneca
I think the last line is actually something like Dilbert: That isn't very random though. Some guy: That's the trouble with randomness - you can never tell. Yes that is it. Thank you.
Ooo man the floppy drive is broken. No wait. The computer is just upside down.
Instead of removing all references to gets() in existing code and keeping the faulty gets() in the standard C library, why not just improve gets() to make it secure and proprgate the new version?
Or has that also been done?
I don't write a lot of C code, so I really don't know...
Mainframe/UNIX Bit Twiddler and long time Windows/Linux Hobbyist.
The Theorem Theorem: If If, Then Then.
Its intent was not to cause terror, but to inflict economic damage.
In a country that was already rife with the underpinnings of civil war due to economic issues, don't you think that its intent was EXACTLY that? If those people had any more economic problems to face, a lot of people were going to die fighting.
I'd say that its intent was to INDIRECTLY cause terror.
That's easier said than done. After all, buggy software is usually better than no software. But who's to say that it will even prevent the problem.
Mariner I software was correct, but failed because the software was incorrectly typed into the computer. The Ariane 5 software was correct when it was written for Ariane 4. The only way to find that bug is with simulation of the whole system. The Therac software was correct because it was part of a system of hardware interlocks. Later machines took half the system without replacing the other half and people expected it to work the same way.
Formal specs won't help you if your software is not being used as designed or if the designer can't know all possible inputs (such as fly-by-wire software for aircraft).
dom
Faster! Faster! Faster would be better!
I designed and build a diagnostic radiology workstation (in 1997, in Java 1.1, 4x5 megapixel monitors, still in use today). During the development effort we were regaled with stories of software glitches in medical systems resulting in disaster. It really keeps you focused.
In one case, a radiation treatment system had a bug where if you used the backspace key when entering the dose a patient received, the display would show you deleted the last digit, but internally you hadn't. So the patient would recieve 10^backspace times the intended dose of radiation. Not a big deal normally, since the techs would typically shut the machine off between treatments. Until one day when they had two patients needing treatment back to back. The tech knew something was wrong when the machine was running for an unusually long time. The patient knew something was wrong when he died.
On our team a defect that crashed the system was considered severity 2. Severity 1 was reserved for defects that could result in a mis-diagnosis, which most patients agree is worse than a crash.
This article was ones of the ones on the sidebar when I viewed the main wired article. Just an isolated incident, but certainly if it were a symptom of a more widespread "bug" it could be serious enough to make the books (not that anyone would ever know)
Does bharatm perhaps have a bug in his/her spell-checker? "sofware" bugs?
No - "software bugs".
Sigh. My id isn't prime. 2 2 2 2 2 3 5 313
There are software engineers in Canada now. They can legally sign off on a software project. The problem is, is that you don't want to have every one of your programmers be licensed software engineers, all signing off on their own code. It would be too expensive to try and hire that many engineers, and managing all the signatures for all the code, when different people work on the same piece of code would be a nightmare to manage. Basically you'd have to have one engineer, or team thereof, overseeing the entire project to be sure that proper methods are being followed to ensure that there aren't any bugs. What you're asking for is more like saying that everyone who in building a bridge be licensed, and that they should all have to sign off on every rivet they put in.
The problem is, is that most companies producing software do not want to pay for an engineer to oversee their project. Also, the way most software operations are run, you wouldn't see an engineer, signing off on the projects. The engineer would force things to be much more tested in order to be ensure that things were actually worthy to be signed off on. There is lots of this kind of software being built for planes, and other situations where it really matters if there is bugs. I don't think this kind of situation will ever happen with off the shelf software. For one thing, software would cost too much, and most people aren't willing to pay $2000 to run an operating system on their home computer, and also because most engineers wouldn't sign off on a system, in which they didn't know the computer their software would run under. There's too many variables on a home computer to be able to garauntee, at that level, that your software will operate completely as expected.
Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
It almost seems like peer review should be required in critical software like phone switches, medical equipment, etc. I mean, scientists need peer review for their work, how about some practical real world peer review going on? Maybe some sort of non-profit gets setup, or an international body formed like the IEEE or something, to review this sort of thing.
I also wanted to throw in my 2 cents and say that this was a great article. Lots of stuff on slashdot you have to wonder what they were thinking, but this was pretty interesting.
This article has recently been linked from Slashdot. Please keep an eye on the page history for errors or vandalism.
For potential severity, this one's worse than a few they listed.
Basically, the Navy was running critical ship systems on a Windows NT platform, and a divide-by-zero in a database caused a buffer overrun that resulted in a shutdown of the engines, leaving the ship dead in the water for 2.5 hours.
Fortunately, it was on maneuvers off of Cape Charles, and not at war off the coast of Yemen or something. Scratch a billion-dollar destroyer and most of her crew because of an NT bug, in that case.
Well, several of them result in the death of medical patients or the destruction of a multi-billion-dollar rocket or spacecraft. I found this one near the top of the list a curious addition, though:Now, yeah, that's bad in the amount of money it costs Intel. But being a non-destructive, non-lethal bug that almost everyone's forgotten about by now, I think it pales in comparison to the Y2K bug, which cost the entire worldwide software industry far more money over a far greater length of time and still crippled credit card readers, financial software, and other computer software that cost real lost productivity for plain ol' consumers.
Actually, you're confusing the title "P.E." (professional engineer) with the generally accepted term "engineer." One (the P.E.) is a licensed engineer, and others are used traditionally and arbitrarily with no legal recourse. For example, I and my co-workers are bona fide engineers, and most of us have engineering or engineering technology degrees. None of what we do requires a P.E. to sign off on anything, although there are other aspects of our business (and many other businesses) that do require a P.E.
Of course, there are all kinds of "engineers" that have that title but don't truly merit it -- customer service engineer; field service engineer; applications engineer; and so on. Most of these don't hold engineering degrees. For many of them, I don't begrudge them their title, either. But we also know that they're not P.E.'s.
--Jim (me)
The problem on the Soviet pipeline wasn't a bug. It was deliberate sabotage. I think that there's a slight difference.
Yes, that is the one.... But it was publicly known way before that. I tested it back in the late 90's.
If you mod me down, I *will* introduce you to my sister!
Sure they are. Re-read the section on the Ping of Death.
From TFA:
Most obviously affected are computers running Windows, which lock up and display the so-called "blue screen of death" when they receive these packets.
The string "Microsoft"? No. The string "Windows", yes.
1995/1996 -- The Ping of Death. A lack of sanity checks and error handling in the IP fragmentation reassembly code makes it possible to crash a wide variety of operating systems by sending a malformed "ping" packet from anywhere on the internet. Most obviously affected are computers running Windows, which lock up and display the so-called "blue screen of death" when they receive these packets. But the attack also affects many Macintosh and Unix systems as well.
The importance of coding guidelines (not the ones that specify trivial, cosmetic, issues) is starting to be appreciated. It is a fact of life that much code is written by relatively inexperienced developers. Guidelines, at least well thought out ones, are essentially tips on what to look out for/avoid, based on the know-how of more experienced people. Coding guidelines can operate at a number of levels, the language level (ie, what language constructs to avoid or be careful when using) is now the stubject of an international study group.
2004 Luxembourg blackout
Patriot Missle - Missles had to be shut down once a day because targeting system would cycle every minute and change the internal cordinating system a fraction of a degree. Over the course of a few days the targeting system would be completely useless.
PS/2 shutdown bug - Analog copiers at the time fuser componants worked athe same frequency as the processor's shutdown signal.
Minus World - Super Mario Brother - A hidden water glitch
ErMac - Mortal Combat
You say things that offend me and I can deal with it. Can you?
What about the program called 'Microsoft Windows'?
Ooo man the floppy drive is broken. No wait. The computer is just upside down.
Wired includes an advert through JavaScript. In this case, the script inserts malformed html, which cause FF to render incorrectly.
SOLUTION: turn off JavaScript, or fire up IE.
Why isn't Outlook Express in here? Early versions basically changed unopened e-mail viruses from a hoax to reality, when Microsoft decided it was a *good* idea to automatically run any VB script that was recieved. That's cluelessness like trusting everyone to be good and decent human beings while you walk through a prison shower with "Please rape me" painted on your back.
Later versions tried to fix the problem while keeping the functionality, as if somehow the bad guys would intentionally include the Evil Bit in their code.
"No problem. I have the capacity to do infinite work so long as you don't mind that my quality approaches zero."-Dilbert
According to Wired:
"A bug in the flight software for the Mariner 1 causes the rocket to divert from its intended path on launch. Mission control destroys the rocket over the Atlantic Ocean. The investigation into the accident discovers that a formula written on paper in pencil was improperly transcribed into computer code, causing the computer to miscalculate the rocket's trajectory."
I heard a different version of the story. In the one I heard, the formula written on the paper was put into the computer properly -- but it was written on the paper wrong. Who's right?
Coder's Stone: The programming language quick ref for iPad
The very first "engineers" ran steam engines.
"Eve of Destruction", it's not just for old hippies anymore...
It can't be a feature, it was never documented.
No folly is more costly than the folly of intolerant idealism. - Winston Churchill
The slashthink holds that this event was the impetus for softcode in processors; I'm not convinced because the Pentium begun the muddling of RISC and CISC architectures with CISC words and RISC internals through microcode. The detailed explanation of the FDIV bug at http://www.cs.earlham.edu/~dusko/cs63/fdiv.html seems to indicate that the flaw was in the microcde (without explicitly stating so, and I don't know enough about the core behaviour of Pentiums to say).
[the USSR's] murderous, oppressive grip on Eastern Europe and attempts at foisting their cheerful utopia on South America and Africa
As opposed to the US's murderous, oppressive grip on third world countries generally and attempts at foisting their cheerful utopia on the rest of the world.
It's fair to say the US's grip wasn't as thorough, but it sure was oppressive, and it encompassed more of the world than the USSR for more years. How many legitimate governments did the US overthrow because they didn't like them?
Terrorism is terrorism. Justifying the largest non-nuclear explosion in the name of fighting terrorism belongs in George Orwell's literature.
Infuriate left and right
I used to work with the lead programmer on this software package from Multidata. We worked together at two different companies for a total of about four years.
Multidata's software allows a radiation therapist to draw on a computer screen the placement of metal shields called "blocks" designed to protect healthy tissue from the radiation. But the software will only allow technicians to use four shielding blocks, and the Panamanian doctors wish to use five.
This is also made very clear in the documentation. This isn't a bug at all, the dosimitrists misused the software.
The doctors discover that they can trick the software by drawing all five blocks as a single large block with a hole in the middle. What the doctors don't realize is that the Multidata software gives different answers in this configuration depending on how the hole is drawn: draw it in one direction and the correct dose is calculated, draw in another direction and the software recommends twice the necessary exposure.
Exactly. They tried to create a feature that the software did not support, and they did so in a manner that broke the software.
At least eight patients die, while another 20 receive overdoses likely to cause significant health problems. The physicians, who were legally required to double-check the computer's calculations by hand, are indicted for murder.
It's not a software bug, it's a user error. This isn't a bug any more than it's a "bug" that your Linux box stops working properly if you do sudo rm -rf /. The users of the product knew better.
To be fair, Multidata was not a great shop from a procedural standpoint - the guy who ran it was insane, but the software was rock solid. I actually worked with a number of former Multidata employees who jumped ship and went to a rival shop that builds similar software, and they were all fairly competant and intelligent.
"I have never won a debate with an ignorant person." -Ali ibn Abi Talib
From Wiki page:
It also found that FirstEnergy did not take remedial action or warn other control centers until it was too late because of a bug in the Unix-based General Electric Energy's XA/21 system that prevented alarms from showing on their control system, and they had inadequate staff to detect and correct the software bug. The cascading effect that resulted ultimately forced the shutdown of more than 100 power plants.
Read the best of all of Slash: seenonslash.com
Testing techniques abound - unit testing, integration testing, data flow testing, and mutation testing to name a few. Scripting tests makes them repeatable, and if we test and test again, we can have some certainty of the reliability of the software. How? Software reliability engineering. See the book by John D. Musa. (See his web site, too.) It's all about using statistics and probability to analyze the likelihood of another failure in a certain amount of time. We all know it's cheaper to fix a problem earlier, so it's best to design the system so that, given the frequency of observed failures during testing and the cost of a failure, you set an acceptable risk and build the software to match the risk.
I hate call waitin`~+~~~
NO CARRIER
The depressing nature of Microsoft's vast array of bugs induces a general cumulative increase in malaise in society as a whole, rather than having any incidents that jump out and grab you.
But certainly a tactic the RIAA would use if they could get away with it...
"Eve of Destruction", it's not just for old hippies anymore...
"1982 -- Soviet gas pipeline. Operatives working for the U.S. Central Intelligence Agency allegedly plant a bug in a Canadian computer system purchased to control the trans-Siberian gas pipeline"
can this really be considered a bug? It was an intentional software error..
You keep using that word, 'terror'. Are you sure you know what it means?
The fact that there was an explosion of such magnitude doesn't bother me a bit. And I bet the majority of the citizens of the USSR weren't shaken a bit by this explosion, because (drum roll) they never knew such an accident had happened (and that's, for me, the scary part). And nothing spells success better than an act of terror noone finds out about, now does it?
Man is a slave because freedom is difficult, whereas slavery is easy.
Right. I could. Also, if I were trespassing on your property and, say, slipped on your pool deck and got injured, I could sue you for that too. Shall I go on?
The World Wide Web is dying. Soon, we shall have only the Internet.
My dad tells this story from time to time. I don't know if it's true, but it makes a good story. Back in the early days of computers when only big corporations had them, most software was written in-house by staff programmers. One of the major soda manufacturers had a new mainframe and had one of their top programmers write an accounting package for them. It so happens that the manufacturer was a major competitor of 7-Up. Well for whatever reason the programmer left the company on not-too-good terms. The very next time the manufacturer when to print out a report from the accounting package, every 7th page contained the phrase "Drink 7-Up" in big block letters. They had their remaining programmers go back through the code and try to remove this new "feature" but they were unable to. This guy was so good that he'd embedded the logic for this nastygram right into the actual logic of the accounting package. Supposedly there was code that would dynamically generate other instructions that, when executed would generate other instructions, etc. They were supposedly unable to get rid of the 7-Up message without breaking other parts of the program, so they ended up having to go back to square one and write a whole new accounting package.
So the story goes...
our 2nd year course professor spent 2-3 classes talked about Therac-25 in a C/C++ intro class. That always reminds me of how serious a bug can damage.
I find it a bit ironic that while your username states "Yahweh Doesn't Exist", you still keep calling up His name...
Man is a slave because freedom is difficult, whereas slavery is easy.
I'm really surprised that the Prius recall got in there over this bug:h tml.
http://www.ima.umn.edu/~arnold/455.f96/disasters.
Maybe I'm old fashioned, but loss of life is a bigger problem than loss of profit.
And they said zombies weren't real!
How tactful! 'Operatives'? I'm sure the Chinese/Cuba counter-part will be called 'spy'.
I think the two worst computer bugs of all time are the two that quite possibly could have wiped us all out. More inforation here.
(Copied from the article:)
* November 9, 1979, when the US made emergency retaliation preparations after NORAD saw on-screen indications that a full-scale Soviet attack had been launched. No attempt was made to use the "red telephone" hotline to clarify the situation with the USSR and it was not until early-warning radar systems confirmed no such launch had taken place that NORAD realised that a computer system test had caused the display errors. A Senator at NORAD at the time described an atmosphere of absolute panic. A GAO investigation led to the construction of an off-site test facility, to prevent similar mistakes subsequently. A fictionalized version of this incident was filmed as the movie WarGames, in which the test system is inadvertantly triggered by a teenage hacker believing himself to be playing a video game.
* September 26, 1983, when Soviet military officer Stanislav Petrov refused to launch ICBMs, despite computer indications that the US had already launched.
If it weren't for two humans who said "fuck what the computer says!", we might be in a very different place right now.
She loves me: 09F911029D74E35BD84156C5635688C0 She loves me not: 09F911029D74E35BD84156C5635688BF
I looked at this a while back because many millenia ago, I worked at the company that produced the telemetry/control system for the Trans-Sib pipeline. It was a specialised outfit based in Warwickshire, UK. It is very doubtdul that their systems could have nobbled by anyone. The network was closed, based on an X.25ish HDLC and the software was blown on to UV erasable EPROMs. The CIA may have modified the s/w at the pump stations, but again it is doubtful.
See my journal, I write things there
From the post:
The resulting event is reportedly the largest non-nuclear explosion in the planet's history.
The actual quote from a hyperlink in the article mentioned in the post:
"The result was the most monumental non-nuclear explosion and fire ever seen from space"
The actual largest non-nuclear explosion occured during World War One in Halifax Harbour when an munitions ship collided with another ship and exploded. It is known as the Halifax Explosion. It was picked up on seismographs and created an 18 metre tsunami.
-- I ignore anonymous replies to my comments and postings.
Years ago, while working on a project for a medical firm, I found out first hand just how horrible things can go wrong with what we eventually agreed was a "bug" but was more of a "human bug" issue that made me sit up and realize that it's not just programmers who will use our programs.
Without getting to detailed, the end users were allowing certain conditions to go unchecked as the software was telling them it was "OK". There was a rather neat explosion (read, small) that hurt nobody and damaged some equipment because instead of being "OK" it was telling the operator that there was exactly "ZERO K" of space available for data storage on a recording device and the test needed to be shutdown.
Now, the operators were told that when the counter got low the would see a warning and be told to stop the tests so, was it a bug, was it my assumption that these 11.95/hour service techs would "understand" what "0K" means from "OK" (that's a zero(0) and an O there)? Either way, there was some damage, we had a bit of a laugh, but at least nobody got irradiated and died.
Why do overlook and oversee mean opposite things?
What about the Y2K bug? I believe that had a greater economic impact than many of the other "worst."
What those who want activist courts fear is rule by the people.
Would licensing prevent the multi-million dollar debacles created by large consulting firms that seem to routinely escape any means of accountability? Would an individual programmer's license have any effect whatsoever on the QA portion of a development project? Or the management? Or any decisions that are completely outside the control of any individual license holder?
I certainly don't advocate sloppy programming, but I'm not entirely sure that licensing would have the desired impact. If anything, it's just more red tape.
speaking of NASA foulups, Remember this one? "(CNN) -- NASA lost a $125 million Mars orbiter because a Lockheed Martin engineering team used English units of measurement while the agency's team used the more conventional metric system for a key spacecraft operation, according to a review finding released Thursday."
"How many light bulbs does it take to change a person?" --BMcC-->
If they were that worried bugs and so forth they'd have written it
in ADA or a similar far more reliable language than that piece of
junkware. Back in 1997 you couldn't rely Java and its VM to run a
clock without crashing , never mind a life critical piece of equipment!
What the fsck was your company thinking???!
Second paragraph, first sentence: Spelling error. "Pruis."
However, you have been fooled. The parent comment is competely at odds with the article.
The article shows largely a series of examples where you DID have HIGHLY PAID and HIGHLY trained professionals with plenty of experience and oversight, but nevertheless very significant bugs occurred. So, the real lesson from this article is not "you get what you pay for," but rather that "software development is very hard" and perhaps that "by nature of its hardness, we can expect critical flaws to pop up from time to time, even when highly trained, experienced, and monitored programmers are involved."
The Soviets stole blueprints and built a gas line using it. What part of planting faulty plans is unlawful? They were stolen.
The US action was not unlawful. Hence, the action was not terrorism.
Now, do I think the CIA engages in terrorist activite? Well, quite possibly. But this doesn't even fall near that. It was a counter-espionage act designed to foil the success of an enemy's spying activities. And it worked.
When I went through college, the computer science program had a required course on the "Social Impacts of Computing" -- everything from the privacy implications of data mining to deaths. The Therac-25 case was required reading.
One thing that stuck in my mind was a point that it takes decades to see the true impact of any new technology. The telephone, the automobile, the airplane. Look at the US highway system and suburbia for impacts of cars that took 50 years to really hit. We're just beginning to see the wider impacts of computers.
"That CIA gas plant explosion 'bug' is disgusting and has America == No.1 Terrorist written all over it if true."
I might as well say: "Idiots like you that corrupt the language are worse than terrorists."
Both are absurd exaggerations that have nothing to do with reality, and only degrade the ability of our language to carry meaning.
Get Real. Terrorism is the deliberate use of violence against civilians in order to induce a state of terror in the general population, as a method intended to achieve political, religious or ideological goals.
The CIA were not using violence, they were attempting to cause stolen technology to fail.
The CIA were not targeting civilians. Moreover, AFAIK, not one person was even killed in the explosion, which happened in a very remote area, and the specific explosion was certianly not planned (they had no knowledge of or control over how the Soviets used the stolen technology).
The CIA were certainly neither attempting to induce a state of terror, not cause change by inducing a state of terror.
You want to oppose the US government? Great -- there are many good bases on which to do so. But please, before you speak up next time, get some facts, learn how to use the language, and THINK! You might then have a chance of convincing somebody of your point, instead of just annoying them with your ignorance.
I guess the difference is that that bug isn't really likely to come up in standard usage. Although it was widespread and known for so long, it never caused any widespread problems, unlike every example in that rundown, which are all worse.
I'm not saying it's not serious — it clearly is. It's just less serious than each of the ones in the article.
I hate to be pedantic (well no, I love it), but according to the Jargon file's entry on "bug":
But then again, why expect more from Wired.
> The article shows largely a series of examples where you DID have HIGHLY PAID and HIGHLY ...software development is very hard
> trained professionals with plenty of experience and oversight...
>
You are absolutely right, it is hard. You really are making the same point I am. What they thought was enough, was not. It's not a matter of paying the same people more money. It's a matter of taking more time and using more developers, and more experienced developers. It's also a matter of more careful design (ie more time and expense), more design reviews, more code reviews, more testing, all by more experienced people. It sounds like you are throwing up your hands and saying "It's hard, so accept the bugs and get over it". OTOH I'm saying that more and better resources, properly developed and managed, can make a critical difference.
Interesting???? More like a troll.
In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
Looks like they haven't found the bugs in my code yet
That was a feature, not a bug.
Technically, the parent was responding to his parent, not the article. In any case, I think agreeing with (or repeating) an article makes insightfulness more difficult. Some of the most insightful statements come from disagreement and tangental thinking.
See this comment. Of course you can always screw something up, but as the poster says, you can mathematically prove that some classes of bugs will not happen. But well, you can take my "ensures" to mean "lowers the probability of an error to a negligible amount" if you want.
Send email from the afterlife! Write your e-will at Dead Man's Switch.
This is fair enough, but if the claimed "insightfulness" is so much at odds with what has just been so clearly demonstrated via serious examples in the article, then in my book a little more than blatant assertion is needed to qualify as "insightful."
I wonder how many people reading this thread are working on software that could potentially cause bodily injury? I unfortunately have to count myself in that category, as I am coding for a new geophysical transmitter which handles a considerable amount of current, and there are both software and hardware fail-safes that need to work. Some of the key electronics are fibre-optically isolated, but you can only go so far...
"MS isn't mentioned ONCE."
It's less striking when you read the article. Most people wouldn't use Windows to control a rocket.
"Derp de derp."
Make no mistake, Microsoft software does have flaws and they cost society billions of dollars a year. They're just not as blatantly obvious as having a rocket explode on lift-off or shooting a patient full of too much radiation.
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
While I'm not sure if anyone has been directly killed by a machine running a windows product, I can probably be assured that many monitors, keyboards, and computer cases have been smashed and thrown out of buildings because of their software.
10 PRUNT "Hello World
AT&ROFLMAO
I saw the idiot's reply to my parent post, daring me to name any legitimate government the US overthrew, knew that would happen, and dreaded having to respond to anyone that ignorant because I'd want to go get dates and references and everything ...
:-)
And you guys came thru for me! Hot damn that is good
Infuriate left and right
According to several docs, this system was taken down by mod.
January 15, 1990 -- AT&T Network Outage. A bug in a new release of the software that controls AT&T's #4ESS long distance switches causes these mammoth computers to crash when they receive a specific message from one of their neighboring machines -- a message that the neighbors send out when they recover from a crash.
One day a switch in New York crashes and reboots, causing its neighboring switches to crash, then their neighbors' neighbors, and so on. Soon, 114 switches are crashing and rebooting every six seconds, leaving an estimated 60 thousand people without long distance service for nine hours. The fix: engineers load the previous software release.
Some of the bugs reported in the story were not so much the fault of programmers, but of management. The phone network bug was a misplaced { character in a nested if-else construct. The code had already been though extensive testing, and then a small change was needed. Because it was a "minor" change someone said it didn't need to go through the extensive (expensive) testing again. It's always easy to point at the code or the guy who wrote it. Especially when the boss is the one tasked with finding out what went wrong.
...is not a bug. It was planted on purpose. It is malicious code, a trojan or virus.
I looked him up in the wikipedia, and don't understand what you mean. The US has fooled around in countless elections and domestic affairs around the world, but I don't know what you mean in this case. Care to elaborate?
Infuriate left and right
It doesn't matter how highly paid and trained your professionals are, if the environment that produces the software is not conducive to eliminating these types of flaws. Like if they are not given enough resources to test and QA the the projects they are assigned, there is no organizational commitment to take the time and expense to document properly, or leadership overrides technical objections to project timeframes, etc. Most of the cited projects could probably be classified as failures of project management rather than failures of the end product (the software) that these flawed projects produced. Yes, software is hard and the software profession should continue its efforts to improve quality, but that doesn't let the organizational culture, leadership and processes that produced the software in these cases off the hook.
Why is it when the accounting profession makes spectacular mistakes that take down entire Fortune 500 class organizations, there is a critical analysis of the processes that led to these failures, and remedies often comprise prescriptive measures for these processes, but similar analysis for software failures focus upon the software flaw but not the environment that allowed the flaw to emerge? Now sometimes the remedy in the accounting case might not make complete sense (like SOX), but the point here is people don't look at just the end result (the accounting system transactions) of the accounting process.
I was informed that in the state of Texas it is illegal to use the word "engineer" in your job title unless you a) have passed the P.E. exam or b) drive a train.
And that's the way it should be.
09F911029D74E35BD84156C5635688C0
Jesus loves you, I think you suck
No they use Command+F ;-)
Some outside observers, however, said they are not convinced NT is blameless.
"It still boggles the mind that any divide by zero error on NT would cause a system to crash, let alone" 27 end-user terminals, said Gil Young, corporate network engineer for a systems integration firm in Orlando, Fla. "I don't care what operating system, computer or application I'm using, I should be able to type in a zero and expect the computer not to crash, especially if that zero is to represent a closed valve."
In fact, what MS products have goes beyond what the weak word "bug" transmit (check this movie poster for a small example) unless you put the Heinlein's Starship Troopers ones in that category.
Anyway, i would had put in that list when Windows NT killed a navy ship... maybe losing a rocket could have been more expensive, but windows NT is more widespread and probably still used in critical places.
Look, I write software for control systems (and I design them electrically too). Just because programmers at Microsoft or EA Games have tight schedules where they are just too stressed to write code well doesn't mean all code needs to be written like that.
Back to what you were saying, if you have a system that could cause damage or whatever, then you start by writing your output routines, and you create rules to govern the machine (i.e. outputs A and B can't come on at the same time, or output C can't exceed this value). Then you write another module that monitors the inputs AND outputs looking for fault conditions that shuts down the machine if you do anything dangerous. Only this part of the code needs to be signed off by an engineer. Typically it's simple code, and easy to prove correct, with peer review. Then you write other modules that essentially make requests through the safety checks to do anything. You don't have to review the complex other code so much, because your output stage should catch any mistakes.
That's how you make a machine safe. Unfortunately, most engineers I know just go out and write the software figuring there's no difference, and that's how bad things happen. It comes from believing you won't make a mistake, or believing that testing will catch all problems. If you plan from the start that you're going to be making mistakes, you can catch them before damage is done. It's too bad this isn't taught, even in the software engineering classes I took at a Canadian university.
"I have never let my schooling interfere with my education." - Mark Twain
But isn't that true for any system, and not only software? That if it is used other than for what it was designed for, it could fail. Suppose I need to make a simple function for generating prime numbers from 1000 to 500000, then tht would be my interface and then I could verify its correctness using a formal method verification system (or an automatic verifier). Of course this is making it extremely simplistic...but I think there are occassions where formal verification can, if not eliminate, highly reduce the number of bugs. Although, at the state that I had read about it, it would probably take years to write a system of medium complexity using such a method!
Life is about being a Phoenix!
It's important that life critical software (and hardware) not only have few bugs (and good requirements yada, yada), but that it is designed defensively - which may require additional sensors and additional hardware in addition to watchdog monitors etc.
For example, even if you're pretty sure that the software in a machine won't decide to pump a liter of morphine into the patient within ten seconds, you should have some hardware/software combination that stops such absurd and deadly results even if the primary software fails (or the operator made an absurd error which the primary software failed to catch).
For another example (since I didn't RTFA), I don't know if they list the irradiation device that killed at least one patient because a bug (in a driver associated with keyboard input IIRC) resulted in a shield (which should ALWAYS have been in place if the "more powerful" radiation source were used) not being moved into place before the more powerful radiation source was activated. IMHO, this system should have had an independent (probably electromechanical) system that just REFUSED to activate that radiation source if independent sensors didn't detect that the shield was REALLY in place (and, perhaps even, that the shield had just recently been retracted and extended - to help detect failures in the "shield sensing" device or an operator's misguided attempts to fool the system).
Why is there an "insightful" mod and why isn't it "-1"? If I wanted insight, I wouldn't be reading
1. Design reviews, by peers and independents
2. Code reviews, by peers and independents
3. Regulary, organized, unit testing
4. Correctness proving
5. Documentation is about a bazillion forms
6. Defect tracking
7. Effective software process metrics measurement and improvement
8. Continuing education
9. Humility / egoless programming
This list was assembled in about a minute off the top of my head. I work in a CMM3/4 type organization, and although there are processes for these things, most people don't use them, or consider them a hassle.
So my point is, the parent is right -- creating good software, even when done by properly trained experts with great experience -- is hard. But the grandparent is right too -- doing all of the above to 'do it right' takes time and money, and many organizations, and by this I mean software process management as well as the actual engineers, don't understand the value / aren't willing to pay for or aren't willing to do all that work. And occasionally, as the article shows, the piper comes and takes his payment.
In Soviet Russia, us are belong to all your base.
Remember when the LA air traffic control tower crashed
A control tower crashed? How does a control tower crash?
I mod down so you can mod up. Your welcome.
A 1999 PC World story, Software Bugs Run Rampant, looked at consumer software bugginess, including Microsoft's.
I love that donkey. Hell, I love everybody.
I think more interesting is the fact that you can trace the growth of software engineering through these bugs. The most recent bug on the list (2000) was largely a case of good software being misused. Its not surprising that recent developments in software development have been largely focused not on the engineering output, but the user experience. Processes designed to get programmers thinking about not only how the system works at the code level, but how it will be used as well.
I do think the original conclusion is reasonable, although not supported. Software CAN be largely free of error, but it requires a dedication to engineering (process), skill, and expertise all at once. As the serious bugs in the list occured, software development has evolved to keep them from happening in the future. The scary thing is that a great many companies, working on critical systems, still refuse to adopt the modern practices that give them the best chance of NOT killing anybody. That fact alone makes the argument for certification quite compelling.
Turn s60 photos into awesome videos with mScrapbook for all S60 3rd edition phones!
a club that began in 1947 when engineers found a moth in Panel F, Relay #70 of the Harvard Mark 1 system. The computer was running a test of its multiplier and adder when the engineers noticed something was wrong. The moth was trapped, removed and taped into the computer's logbook with the words: "first actual
The phrase about buggy software was coined by Rear Admiral Grace Hopper. She discovered the moth that crawled into the computer and causing the error. There's a small exhibit showing her picture and holding a moth at the Pentagon.
You can find a sample article Here
I mod down so you can mod up. Your welcome.
In Argentina, in some provinces, you must be "licensed" to program, or even to call yourself "computer consultant". Hard to believe? Read it here or here or here an opinion agains it (sorry, all in Spanish).
from one of the links, (my own translation):
"[professional license] assure that professional exercise is made exclusively by the people who certify corresponding academic formation as well as ethical integrity in their performance, as a minimum guarantee of the quality..."
DNA in your Linux: DNALinux
Ok, so if the machine is known to have the issue, and isn't fixed, that's not MS's fault it's the admin's.
And if you disagree, take it up with the dozens of individuals who made exactly the same argument when discussing the "new" Linux worm earlier today. But post it in that thread, just to see what happens.
How pathetic are you that you follow me from topic to topic and waste all your mod points at once modding me down?
My guess is that the operator really wouldn't know either (although, she would probably assure me that it "it's very safe").
Why is there an "insightful" mod and why isn't it "-1"? If I wanted insight, I wouldn't be reading
You mised one..
Item 0: Requirements reviews by peers and independants. If you don't have good requirements you obviously don't know things well enough to be building them. Sure you can catch some requirements issues in 1 and 2 but the longer you wait the costlier it is to fix.
A MSCS is NOT a Software Engineering Degree, so why WOULD you take courses in SE?I'd say that CS and SE are two different professions. There are places to get a MS SE (Texas Tech comes to mind) if you are interested.
Some of the bugs they listed are not truly bugs.
Soviet Gas Pipeline...This was a desired feature working just as intended (unless they CIA didn't want to blow up the pipeline)
Buffer Overflow in Berkley - a worm is not a bug. it is a program designed to infiltrate a system and do something. While the people utilizing the program may not have intended this to happen (duh) the makers of the worm did.
A bug is an unwanted aspect of the code as implemented by the people who wrote (or edited the code) but this does not include something affected by a virus/worm. A program that crashes every six minutes for no apparant, or intended reason has a bug...a program that gets infected by a virus which causes it to crash every six minutes is not a bug. Also, a piece of code that is intentially inserted in the hopes of crashing a system is not a bug...it is a feature. It may be undesirable, but it is a feature.
I mod down so you can mod up. Your welcome.
Do it now, then you'll understand.
"The simple root of the problem on Yorktown was that politics were played in the assigning of the contract -- there was not a discussion of engineers, it was just a very small group of people pitching for it," said an engineer close to the project, who spoke on the condition of anonymity.
In a statement issued this week on why NT was chosen over Unix, the Navy said that while Windows NT was specified in the Statement of Work as the operating system for the workstations in question, other components of a coming upgrade will primarily utilize Unix-based systems.
"They rushed this stuff on the ship, there was no real prototype, and then they tried to make things work as they went along," the source said. "I don't think that Unix or NT were ever really evaluated -- it was just somebody thinking this was good, with no knowledge."
The article is very clear. It makes the case that this was a poor job of designing and implementing the system. The software just happened to be NT.
How pathetic are you that you follow me from topic to topic and waste all your mod points at once modding me down?
The more I learn, the less I know
I heard that a few years ago. The older I get, the truer it becomes. The more I learn about life in general gives me insight as to what's possible out there, and as to what I haven't accomplished!
If an officer ever threatens to taze you, say you have a pacemaker.
FTA: A radiation therapy device malfunctions and delivers lethal radiation doses at several medical facilities. Based upon a previous design, the Therac-25 was an "improved" therapy system that could deliver two different kinds of radiation: either a low-power electron beam (beta particles) or X-rays. The Therac-25's X-rays were generated by smashing high-power electrons into a metal target positioned between the electron gun and the patient. A second "improvement" was the replacement of the older Therac-20's electromechanical safety interlocks with software control, a decision made because software was perceived to be more reliable.
What engineers didn't know was that both the 20 and the 25 were built upon an operating system that had been kludged together by a programmer with no formal training.
Worse: I recall this from memory
Worst: and checked if on the cartoon, pinned on a wall a few offices away.
Some person down the line noticed that the Russians didn't have that many missiles, couldn't have launched them all with such synchronization, and that there were an awful lot of two's in the report ... actually, every digit of every number was a two. It turned out to be a fried chip somewhere, always pumping out the same bit regardless of input (I have no understanding of the technical side of the issue; maybe it hit the 32-bit limit and the int->string function reacted with 2's).
Good thing we were not too automated, and that we employed somebody smart enough to critically examine his printouts.
Disclaimer, this is a favorite tidbit of one of my professors ... I have no real source to refer to.
Use my userscript to add story images to Slashdot. There's no going back.
I've come in contact with a number of (U.S.) companies in the last few years that write various critical applications, (ie: that could kill someone if done wrong) and all of them have had CMMI level 3 or more, or Six Sigma or ISO processes in place. This is not because they're being benevolent, but because their own lawyers would never allow them to sell such a product without covering their own asses. Also, financially it makes more sense to have testing and reviews first to CYA than put up with the possible bad P.R./Lawsuits, etc.
The Yorktown's failures encompass a large number of flaws and single points of failure. I guess it's more a testament to bad architecture than any one single bug.
The cesspool just got a check and balance.
That works great for the kind of machines you described, and I wasn't saying good code couldn't be written, I'm saying that it isn't usually written, and won't be written in off the shelf software. The problem is, if you look at something like a mars lander, then you can't just shut it down if it gets some bad inputs. Also, even good inputs can result in the the machine not doing what it needs to do. If it has to land on mars, and some input tells it to fire the left rocket for 4 seconds, then the input may fall within proper values, but may push it way off course. There's no GPS in space, so you can't get your position very accurately, and if you go way off course, you may not have enough fuel to get back on course.
Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Guess what: They don't, although they appear to be hedging their bets with safety critical software.
An interesting read...
"Prepare for the worst - hope for the best."
The article shows largely a series of examples where you DID have HIGHLY PAID and HIGHLY trained professionals with plenty of experience and oversight, but nevertheless very significant bugs occurred
first bug was improper process
second was sabotage
third was INEXPERIENCE
fourth was new tech experiment, professionals involved but probably mostly grad students with varying experience
fifth - I'll grant you this one
sixth - I've worked with telcos for 10 years, chances are, it was a mod rushed out by marketing, switch bugs are, like in any other programming field, par for the course.
seventh - poor oversight and testing
eighth - university code base, generally minimal oversight.
ninth - improper testing
tenth - might qualify, but the doctors clearly went outside the allowed specs and did not check the calculations.
So, out of the 10, only one and half can be said as having "HIGHLY PAID and HIGHLY trained professionals with plenty of experience and oversight", I would say that your statement is at odds with the article, not the parent post.
Did you RTFA? It stated that the CIA found out the Soviets were going to purchase a Canadian computer system, and decided to knowingly sabotage the equipment in such a way that it would pass initial test but would fail in actual operation. So ditch the "-might-".
and testing... don't forget testing... one was a case of a perfectly cromulent program which triggered a bug in the underlying OS... another was the case of a passable program when used properly, but which allowed users to "hack the system" to get a desired result, which then revealed an unexpected bug.
This seems like a good thread to plug my essay Better Languages for Better Software (again). In a nutshell: many (probably the vast majority) of bugs that are found in software nowadays can be entirely eliminated by using safer programming languages. As an added bonus, these languages are often more concise than languages in popular use today, which means more productivity and/or time to fix bugs.
Please correct me if I got my facts wrong.
Or does a that's inherently insecure (that is, it's not actually possible to fix the underlying security flaw without changing the design) not count as a bug?
Consider how much software is written by people with five years or less of professional experience, on short schedules, with no time allocated for continuing education. If software projects weren't always rush jobs, and on relative shoestring budgets, the quality would be better.
The software reliability crisis has very little to do with greed, engineering incompetence or the lack of big budgets, in my opinion. There is something fundamentally wrong with the way we program our computers, something that no amount of quality control measures can ever cure.
The reason that software is bad has to do with a custom that is as old as the computer: the practice of using the algorithm as the basis for software construction. Switch to a synchronous, signal-based approach and the problem will disappear. Complex algorithmic software is essentially unreliable, something that Fred Brooks has shown in his now famous "No Silver Bullet" paper back in 1987. For an alternative approach to software construction see this article in The Silver Bullet News.
Regardless of what has been said in the past, the problem can be solved. Otherwise, we are in big trouble, very big trouble.
From the following website:
Blood-forming organ (Bone marrow) syndrome (>100 rad) is characterized by damage to cells that divide at the most rapid pace (such as bone marrow, the spleen and lymphatic tissue). Symptoms include internal bleeding, fatigue, bacterial infections, and fever.
Gastrointestinal tract syndrome (>1000 rad) is characterized by damage to cells that divide less rapidly (such as the linings of the stomach and intestines). Symptoms include nausea, vomiting, diarrhea, dehydration, electrolytic imbalance, loss of digestion ability, bleeding ulcers, and the symptoms of blood-forming organ syndrome.
Central nervous system syndrome (>5000 rad) is characterized by damage to cells that do not reproduce such as nerve cells. Symptoms include loss of coordination, confusion, coma, convulsions, shock, and the symptoms of the blood forming organ and gastrointestinal tract syndromes. Scientists now have evidence that death under these conditions is not caused by actual radiation damage to the nervous system, but rather from complications caused by internal bleeding, and fluid and pressure build-up on the brain
Other effects from an acute dose include:
200 to 300 rad to the skin can result in the reddening of the skin (erythema), similar to a mild sunburn and may result in hair loss due to damage to hair follicles.
125 to 200 rad to the ovaries can result in prolonged or permanent suppression of menstruation in about fifty percent (50%) of women.
600 rad to the ovaries or testicles can result in permanent sterilization.
50 rad to the thyroid gland can result in benign (non cancerous) tumors.
As a group, the effects caused by acute doses are called deterministic. Broadly speaking, this means that severity of the effect is determined by the amount of dose received. Deterministic effects usually have some threshold level - below which, the effect will probably not occur, but above which the effect is expected. When the dose is above the threshold, the severity of the effect increases as the dose increases.
"Rocky Rococo, at your cervix!"
The problem is, if you look at something like a mars lander, then you can't just shut it down if it gets some bad inputs.
True, it's just that in my line of work, off is usually the safe state, but what should be done is to go to some kind of safe state, whatever that may be. Sometimes you revert to a manual operation, for instance.
Also, even good inputs can result in the the machine not doing what it needs to do.
Which is why you need to also hire a mechanical and an electrical engineer to design those aspects so that the mechanical and electrical systems fail in a safe and detectable way.
For instance, it used to be that stoplights were designed with a physical disk inside that rotated creating the "program" of the different lights. You also had interlocked electrical circuits so that both greens could never come on at the same time. These are mechanical and electrical ways to make the system fail to a safe condition. I have recently been at an intersection where I saw the traffic lights had green both ways (during a storm). This is because some vendor is selling a traffic light system on the market that is completely software based, and they hired a bargain basement programmer and/or engineer to design it, and we should find them and shoot them for their incompetence, but I doubt that will happen.
"I have never let my schooling interfere with my education." - Mark Twain
The problem is, there isn't always a safe state. In many cases there is, but not in all cases. What's the safe state for the mars lander that's drifting off course. Manual override is impossible, because signals take 20 minutes to get there, and by then it may be too late. It may not even have line of site at the time, and therefore communication is impossible.
Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
No question, the bugs in this top ten list do not represent history's worst software bugs, but rather some of the most newsworthy (Google-able). I tend to think of Wired as being to technology what Omni magazine was to hard science.
The article mentions the radiation therapy device killed five people. I was only able to find three fighter shootdowns blamed on the Patriot. What others are there?
In other words, the Patriot missle system used during the 1st gulf war was just a quick kludge, and considering, it actually worked pretty well.
BTW, by saying the the Patriot missle system killed people are you implying that its failure to shoot down scuds is guilty of murder?
All you really have to do is force everyone at the company to use an include file with this:
#define gets() DONT_USE_GETS_YOU_MORON()
USA = Good Guys.
All others = Bad guys
More practically:
USA = our side
All opponents = enemy
Preferred winner: our side (USA)
Preferred loser: enemy
One of the systems mentioned was kobbled together by someone with NO prior programming experience.
The other 9 didn't mention the experience of the programmers.
Other examples did show a lack of basic interface skills.
I would wager nearly all these issue would have been caught if parallel testing had occured. Since so few people even know what parallel testing, much less how to apply it, it's no surprise the industry is in such a sorry state.
The Kruger Dunning explains most post on
highlights the number one reason bugs get to the end user, Improper testing.
Except in my case. In my case it's do to Secret opratives sneaking in and foikling my code. Also, they mess up my spelling.
The Kruger Dunning explains most post on
Tell that to all my co-workers who had their machine BSOD on boot thanks to me adding "type c:\bsod.txt" to the startup commands.
Just because a bug is not easily exploitable doesn't mean it is not a serious bug.
If you mod me down, I *will* introduce you to my sister!
Where the heck did you get this from? I did some googling and the 49 day problems seems to have been with Windows.
From the linked article: http://www.msnbc.msn.com/id/4394002
Vuja De: That sinking feeling that this is going to happen again. Often occurs in meetings with Product Managers.
No.
It shot down at least several aircraft due to friendly fire.
Further to that at least several of the blasts in Ryadh and other places around the gulf in Gulf War 1 did not look anything like Scud blasts. Smaller, high explosive blasts, some of them right above the ground instead of after hitting it. I remember even CNN voicing the suspicion that it was not a Scud blast, but the patriot selfdestructing (and this part disappearing immediately from all repeats of the coverage).
If I recall correctly, at least one of these strange blasts had casualties.
So no, I am not blaming Patriot for failing to shoot down R1s. I am blaming it for blowing up civilian ground targets while trying to do so. Along with a few friendly aircraft for good measure.
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/
Exactly sir. You are right. Humans are fallible by nature so even the best and brightest minds will inevitably make mistakes. Couple that with the fact it's not easy either to debug or proofread thousands to millions of line of code, its almost like finding a needle in a haystack, and unpredictable things can and will occur.
I'd say it was rather the decision to use Windows as OS for a system that important that was the error here.
I really think the default safe state for the mars lander while safely on the ground on Mars is "do nothing and wait for program download". The safe state while re-entering is either "do nothing and crash" or "take a stab at when to fire the retro-rockets and probably crash". Either way, it doesn't matter from a safety perspective - there's nobody to kill on Mars (as far as we know).
This applies more to medical equipment and big machinery back here on Earth.
"I have never let my schooling interfere with my education." - Mark Twain
This is not a "single" bug but rather a train wreck. A botched Siebel CRM upgrade cost AT&T thousands of new customers and an estimated $100 million in lost revenue. This eventually led to the sale of AT&T Cellular to Cingular at a less than optimal price. The article is great in explaining how management and big "personalities" lead to these kinds of disasters.
A full story of the bug:
http://www.cio.com/archive/041504/wireless.html
In agreeing with you I will also argue that you typically require HIGHLY PAID and HIGHLY trained professionals to get these really nasty bugs.
If you have someone who is an amateur they are likely to make obvious mistakes which will cause obvious (but non-lethal) problems in the code and therefor - it won't pass the most basic of quality testing.
On the other hand, with highly paid professionals, they know how to write code that passes quality testing protocols. So, their work will more easily be put into production and out in the real world where it has a chance to seriously harm users when the subtle bug is discovered.
When you have an owner, a CTO, a CEO, end users, product maintenance, new R&D, etc. all asking for your attention and you don't have time to relax and concentrate, errors fall through the cracks. You'll learn this.
Yes, it's a fuck-up. That's exactly what it is. I personally fucked up a line - ONE FUCKING LINE - of code that caused a worldwide recall on our products. (I used = instead of |= ) It's unlikely, but there's a chance that someone's going to die if the unit doesn't get refitted before it gets used.
It was missed in my tests, it was missed in the pre-production tests, it was missed in the post-production tests, and only one repeat ONE customer told me that it was wrong, MONTHS after other customers told me that they LOVED the new firmware version.
Some employers won't let you say "fuck", let alone "fuck-up". Right now, there's a guy getting a red flag at DOD / DND for having "Fuck" show up too many times on a web page. The moniker "bug" was coined over 100 years ago, when "damn" was offensive. If you want to use your own nomenclature, go ahead. Just don't be surprised when someone asks you want the fuck you're talking about.
---
ECHELON is a government program to find words like bomb, jihad, plutonium, assassinate, and anarchy.
The "strange blasts" seem to be an invention of your memory, or poor reporting at the time.
Do you have a source you can site? I've looked through a few online sources critical of the patriot's poor performance in Gulf War I, and have found no mention of any that destructed near the ground or civilian deaths. For example, here's a House committee report referred to by many other sources who decry the patriots' performance.
In Gulf War II, they've taken out two coalition planes, though. Maybe you were thinking of that?
While she has the honor of finding the first actual bug in a computer (the moth), the term "bug" had been in use for at least 70 years prior. For example, Edison used it in letters to associates in 1878. Wikipedia is your friend.
Clear, Dark Skies
The Therac-25 error is quite famous. That's why it blows my mind that there is another lethal radiation therapy software error listed. You'd think that people would have learned from the first mistake.
OTOH today I saw some medical machines (probably for blood-analysis and similar things), which were running embedded windows.
It definitely did me want to run away crying.
Andries
The description of the pentium floating point bug
reads like a damage control press release from intel.
In spite of what the article says this bug could result
in enormous errors in situations were cancellation error
came into play.
that effectively left PC users with only 640K of memory
for ever and ever.
They just need to accept there a two professions in the software industry, coders and proof readers. Proof reading code and making corrections and fowarding changes required is a seperate skill set to the creative writing of compact efficient code.
Chaos - everything, everywhere, everywhen
1988 Word Perfect 1.0 for Amiga
1990 Word Perfect 1.0 for AtariST
Gravely injured one -- fatal for the others.
Well, its a bug that is in the OS that MS profided to run this system. Its not the admins fault, and I'm sure the admins probably did everything to get rid of it. But its a known bug in the OS. So well known that a maintenance procedure has been implemented to keep the bug from showing itself by rebooting the comptuer system every 30 days. This one time they didn't reboot it after 30 days and it got to day 49, so the system crashed. Now why does this OS problem get blamed on the admin? There is no OS out there that should need to be rebooted before reaching a set date on a very important system such as this? I'm sorry, this is not a bad setup by and admin like the Linux worm would need to take advantage of a bad setup.
-----BEGIN PGP SIGNATURE-----
12345
-----END PGP SIGNATURE-----
But in the case of programming bugs, I believe the cause is the programming language. Currently, too much information is required by the programmer in order to get a program to run properly.
If a program is the sum of all information required to get it to run as intended, one part of it is the computer part--- memory, source code, data assets, other deterministic inputs. But the other part is the programmer or group of programmers who have a collective knowledge that is required to get the program to run correctly.
For example, if I write an API that has three functions A(), B(), and C(), the computer has the same information as if I write an API that has the three functions A(), Init_A(), Cleanup_A(). But the programmer has *extra* information that the computer does not know, namely that there is a relationship and interdependency between the three functions.
Furthermore, as demonstrated above, this relationship is often conveyed by ad-hoc naming conventions that are hardly standardized between two programmers at the same company, much less at different companies.
There are thousands, possibly millions of pieces of implicit information like this in the meat-memory of programmers. We tend to call this "programming experience". That is, each new program learns through trial and error or by cut and paste that functions need to look like this:
Init_A();
A();
Cleanup_A();
Rather that this:
A();
Cleanup_A();
Init_A();
But due to event-driven programming, we are sometimes uncertain what order each function will be called. So, we place ASSERTS and conditionals to handle the out-of-order cases.
What this all leads to is bugs because the memory state and mixed conventions of all of the various programmers is always in some uninitialized, unknown state. If a chain is only as strong as its weakest link, then a program is only as strong as one of its programmers on its worse day--- and we tend to stress our programmers a lot with long hours and tight schedules.
The undetermined state of the collective minds of all of the programmers also explains why there is a mythical man-month. If you regard the program as not just the source and its assets, but include its programmers as well, you can easily see where your systems will fail in how arbitrarily stringently you treat your source code, but how lax you treat documentation and programming conventions.
The solution, which is easier to say than to implement, is to take as much implicit information as possible away from the programmer and put it into the program. Certainly, as memory capacity grows, we shouldn't be using the same programming languages that we had when memory capacity was 1/1000th of what it is now.
Once this information is included in the program, Init_A(), A(), Cleanup_A() don't need to be named as such. Their relationship can be visualized by the programmer through a documentation tool that shows the relationship of each function to each other function that has a relationship to it (sort of like a LinkedIn network, but for functions).
I don't know if all of the implicit information in a programmer's head can be transcribed into explicit information for the computer to use. If, during the programming process, all of this information can be described efficiently and simply, then functions themselves may attempt to initialize themselves by searching their own relationships for other functions that satisfy their parameters.
Then, we can write something like "draw_pixel(10, 12, YELLOW);" and expect it to work properly as a stand alone line because the function itself can figure out how to call all of its preconditions.
This is perhaps something serious computer scientists can answer. I don't know how much implicit knowledge can be transferred from the programmer's brain and into an explicit structured code in the computer's memory and source. Certainly some, unlikely all, but perhaps most. The more the better.
At this point, if some advanced alien society were to look at the primitive state of our information age advances, they would laugh at us, ridicule us, and perhaps consider us "cute" in the same way as when we watch monkeys poking a stick into a hollow log for grubs.
Where's the Y2K bug? Just because we fixed it before it manifested doesn't mean it wasn't a whopper. Tests done on various equipment from oil & gas pipelines to telephone switching equipment showed the potential for serious consequences. It should definitely be on the list.
Flying is easy, just throw yourself at the ground and miss. -Douglas Adams
At this point, I'd just like to add that,
;)
aside from the bickering about whose got the
biggest example of stupidity...
We also have allowed the example of humans
used as live error checkers in a mission-critical
environment (the early-warning system controllers
in the nuke story) to be presented.
I find it humorous that, in this age, we're often
presented with the Orwellian as commonplace,
and perhaps in five years, we'll actually ignore
several instances in the media of politicians
asking the public, "What are you going to believe?
what you see, or what we tell you is the truth?"
Oh, and thank God(tm) there's no peer review at Wired.
I mean, how else would it be the world's most-widely-read
computer-savant satire magazine? If those guys actually
knew about what they were talking, I'd be worried
I've heard roughly the same thing. There seems to be disagreements about whether it was caused by the CIA planted bug or operator error. One story has it that the operators cranked up the pressure to push gas through because hard-to-locate leaks were draining needed pressure. Rather than investigate and fix the probable leaks, the operators simply cranked up the pressure to compensate because they were being pressured (pun) to deliver gas.
Table-ized A.I.
I love when people post, and try to seem reasonable and intelligent, but then do things like this.
"Now why does this OS problem get blamed on the admin?"
Well, let's see... WAIT!!! You answered your own question!
"its a known bug in the OS. So well known that a maintenance procedure has been implemented to keep the bug from showing itself by rebooting the comptuer system every 30 days."
That's why.
God you made that easy.
How pathetic are you that you follow me from topic to topic and waste all your mod points at once modding me down?
Then how come you make such a blatant logical error?
His argument was that highly trained / experience professionals would lead to less likelihood of errors happening or something to that extent.
My point was that the list counter-evidences this theory, as in that case highly experieced programmers also created catastrophic mistakes - even if we accept your dodgy ("poor oversight and testing", according to you, has nothing to do with the skill/experience of those involved) count of 1 or 2 out of ten, my basic point still stands, since for my point to be valid, i dont have to show that all ten were caused by experienced software people, but rather simply that some were. if you showed me a list of the 100 worst software mistakes of all time and it turns out that 99 were caused by junior programmers, then you might have a point. but as is, you don't.
I am sure you are a smart person, but consider taking a course in basic logic. Try http://www.philosophypages.com/lg/, especially the section marked "Quantification Theory."
Shutting a system down might be even more catastrophic. E.g. fly-by-wire software.
according to the official report, Ariane 5 bug was an overflow during a conversion caused by a too large lateral speed in ariane 5 cinematics. Ultimatly, this bug was due to the lack of 'requirement tracability' between ariane 5 and ariane 4. Another ultimate cause was the missing of proper testing simulation software.
See my other replies in this thread. Switching to a "manual mode" may be an alternative.
"I have never let my schooling interfere with my education." - Mark Twain
I guess this might turn into flamefest, but whatever....
Then how come you make such a blatant logical error?
I disagree on making a logical mistake. I was responding to (and quoted)
The article shows largely a series of examples where you DID have HIGHLY PAID and HIGHLY trained professionals with plenty of experience and oversight, but nevertheless very significant bugs occurred
The emphasis in bold is mine, but the words were yours. One or two of TEN does NOT make "largely", which implies a majority. Therefore at least five should fall into your rather specific categorization. This was not the case.
Perhaps you meant to say something else in the original post. But the words you used was inaccurate. Perhaps you mean that "HIGHLY PAID and HIGHLY trained professionals with plenty of experience and oversight" CAN result in serious bugs. Tho this was NOT the theme of the article either, but a perfectly valid comment.
Two different ways to input the same shape resulted in visually indistinguishable outputs, but one of them would cause a fatal mis-calculation. If not a b ug, at least a serious design flaw in the user interface.
There are software engineers in Canada now. They can legally sign off on a software project. The problem is, is that you don't want to have every one of your programmers be licensed software engineers, all signing off on their own code.
It seems to me that what we need is a well defined set of conditions that require a project to be signed off on. All others would not have the requirement.
it's reasonable that software that could be a danger to fife and limb would require signoff. For other software, the expense just isn't worth it.
Even in life critical applications, there are design methods that could restrict the signoff requirement to only the critical modules. For example, design a sort of operating shell that doesn't allow unsafe operational conditions. That code is developed in the most rigorous (and therefor expensive) manner and gets a sign-off. Code that runs inside it is allowed to be less strictly developed and is meant to provide non-critical features and perhaps come up with more efficient operation. As long as the shell is sure to countermand unsafe operation and carry out critical functions by itself, that is fine.