Slashdot Mirror


Debug your Code, or Else!

Trevor Lovett writes "I ran across a collection of famous software bugs that have caused large scale disasters including the explosion of the Ariane 5 rocket due to integer overflow and the misfiring of a US Patriot missile that caused 28 deaths because of accumulated floating point error. "

192 of 485 comments (clear)

  1. but dont forget by rosewood · · Score: 5, Funny

    Remember that time when that kid dialed into NORAD and used that security exploit to get into the Thermo-nuclear war simulator and everyone thought it was real until he and the inventor were able to trick the computer into playing Tic-Tac-Toe? I see a LOT of bugs in the software there but no one ever seems to care about that...

    1. Re:but dont forget by btellier · · Score: 2, Redundant

      Also, I remember hearing about this good looking hacker chick who used extremely large fonts and a camoflaged computer. Due to Penn Jillette's incompetance he failed to notice this good looking individual's ability to literally fly through the network, superman style, before crashing into the garbage file. Teller had neglected to take out the trash and was summarily beaten. In the resulting hilarity, more large fonts are exchanged and a virus is disassembled via Matrix-style code-fu. Near the end of this caper one of the hax0rs in question has sex with another human being, possibly as a result of his cult following of thousands of IRC kiddies boasting knockoffs of his nick.

    2. Re:but dont forget by coolgeek · · Score: 2

      Ummm I believe two-way communication with guided ordnance qualifies as a "connection to the outside world"

      --

      cat /dev/null >sig
  2. Missing From The List by BiggestPOS · · Score: 5, Funny
    1) 1999 - Buffer Overflow causes Half-Life to crash while I'm in an important clan match (counter-strike) we lose the match, and I lose many friends.

    2) 2000 - Poorly coded garbage collection causes Word 97 to crash, lose last 2 hours of research paper. Class was in 30 minutes, paper was late. I lost my scholarship.

    3) 2002 - IE Crashes while writing AWESOME first post for /., My karma never recovered.

    --
    What, me worry?
    1. Re:Missing From The List by Novus · · Score: 5, Funny

      shock moment: Word has garbage collection!?

      Yes. It collects megabytes of garbage in files with the extension ".DOC".

    2. Re:Missing From The List by liquidsin · · Score: 3, Funny

      If we can attribute the buffer overflow to be a problem with directx and not with half-life itself, then all three of your most horrific moments ever are the direct result of MS negilgence. Try to collect damages...they should be available for more court dates sometime around 2017 ;)

      --
      do not read this line twice.
    3. Re:Missing From The List by GafTheHorseInTears · · Score: 5, Funny

      4) 2002 - Windows Media Player freezes up while I'm whacking it to porn. Unfortunately, it freezes on one of those annoying shots where they cut away to the dude's face, and I'm too close to the finish line to be able to stop. Afterwards, I feel embarassed and uncomfortable, yet strangely aroused.

      --
      "You're just scared like a little white pussy. I'll fuck you till you love me, you faggot!"
    4. Re:Missing From The List by forged · · Score: 2

      Damn, does it remember me the last time I spanked the monkey. Funny eh, like people will repeat the same scenes over and over again. Must be in the genes :->

    5. Re:Missing From The List by TheOnlyCoolTim · · Score: 2

      It actually can use either OpenGL or Direct3D, and of course it still uses the other parts of DirectX even if you use OpenGL rendering.

      Tim

      --
      Omnia vestra castrorum habetur nobis.
  3. Looking over the list by SplendidIsolatn · · Score: 3, Interesting

    The surprise isn't how many situations have cropped up because of software bugs, but rather how few. If you think of all the things that code is written for, and yet there hasn't been any major 'disaster'. Yes, the deaths and accidents are tragic, but on the grander scale of things, it's amazing that nothing truly catastrophic has happened.

    --
    sig--we don't need no goddamn sig
    1. Re:Looking over the list by jgerman · · Score: 2

      Umm define catastrophic, because I define it as loss of life. I'm sure the fly-by-wire Airbus passengers who went down would consider it catastrophic, or the medical patients that recieved lethal doses of radiation. If you don't think death due to a bug is catastrophic send me your resume, I have some dosage control software I need a test subject for.

      --
      I'm the big fish in the big pond bitch.
    2. Re:Looking over the list by Drachemorder · · Score: 2
      actually, I should take that joke back... I know a few people who write code to run nuclear reactors...

      Homer Simpson?

    3. Re:Looking over the list by Tony-A · · Score: 2

      Define catastrophic.
      (Any) loss of life doen't work unless you are willing to ban automobiles, and even then you have problems, even with walking.
      Methinks that catastrophic carries the sense of a profound change in the choices of the survivors. If the odds are better with computer than manually administered dosage, unfortunate is maybe a better term than catastrophic. Would it be any less "catastrophic" if it were administered manually and the patients died because it wasn't quite accurate enough. Not an excuse for the bug, but catastrophies are big in scope.

      Black Death maybe qualifies.
      Smallpox II.
      Another asteroid, like 65 million years ago.
      Yellowstone blows its top again.
      Time bomb in (almost all) antivirus software. Clobbers BIOS and runs monitors so they burn up.
      Something small but the effects keep getting bigger as they cascade. For want of a nail type of thingee.

    4. Re:Looking over the list by Tony-A · · Score: 2

      I know a few people who write code to run nuclear reactors...
      But do you know the people who wrote code to run Clippy?
      Now really, would you trust Clippy to always open that valve on time?

  4. Millennium Bridge by rde · · Score: 4, Interesting

    I'd take issue with the inclusion of the London Millennium Bridge; that wasn't so much a failure of software, but a flawed model, that failed to take into account the effects of swaying pedestrians. After it was rectified, there were new data - never used in any bridge model - incorporated into such models so that it won't happen again. That's science; not a bug.

    1. Re:Millennium Bridge by rde · · Score: 2

      you don't complain about society because someone did something the programmer(s) didn't expect.

      This looks like it could turn into one of those really annoying semantic arguments. But what the hell.

      I don't consider flaws in mathematical models to be bugs; models, after all, simulate reality; this is something they can never do perfectly. Therefore all models are flawed. Are they all buggy? I'd contend that they aren't. Any model is continually in a state of refinement. When something that never occured to you actually happens, you include it. Until that unanticipated phenomenon happens, it's unreasonable to expect its inclusion.
      You could maintain that the model was buggy because it didn't predict the swaying. I don't think that this is the case, however; the model was complete as far as the state of the art at the time was concerned. That state of the art has since moved on; any models that now fail to take the new phenomenon into account may be described as buggy, but I still prefer 'incomplete'.

    2. Re:Millennium Bridge by dachshund · · Score: 2, Funny
      I'd take issue with the inclusion of the London Millennium Bridge; that wasn't so much a failure of software, but a flawed model, that failed to take into account the effects of swaying pedestrians

      Pedestrians were also asked not to sway anymore.

    3. Re:Millennium Bridge by not_cub · · Score: 2
      That's science; not a bug.

      Actually the phenomenon of resonance is well documented in science. Engineering involves disgarding the pieces of science that are unimportant. For example, most of quantum theory can be ignored when modelling a bridge, which simplifies things a bit. In the case of the millenium bridge, they oversimplified, and ignored the horizontal force exerted by people walking, which is what caused the swaying. So this is not science, not software, just an engineering error.

      not_cub

      --
      q='echo "q=$s$q$s;s=$b$s;b=$b$b;$q"';s=\';b=\\;echo "q=$s$q$s;s=$b$s;b=$b$b;$q"
  5. speaks more to TESTING by teambpsi · · Score: 5, Insightful

    It really amazing how many software project managers that don't fully understand what regression testing is all about.

    Software engineers simply cannot be trusted to do more than small unit level testing! We get into a pattern of behavior, we know what to expect, and simply do not stress test the system.

    Thats why I like hiring sales people and 2-year olds to test my code at the unit/integration level.

    --

    Old age and treachery almost always overcome youth and skill.
    1. Re:speaks more to TESTING by dgb2n · · Score: 3, Informative

      Testing is critical.

      Others would argue that testing alone may not suffice. Particularly for these kinds of mission critical applications, nothing short of formal methods of software engineering will suffice. Formal as opposed to natural language specifications can reduce ambiguity. Safety conditions can then be derived and verified through rigourous mathematical proofs.

      Of course none of this obviates the need for testing but it can lead to a more predictable system.

    2. Re:speaks more to TESTING by billnapier · · Score: 5, Funny

      Thats why I like hiring sales people and 2-year olds to test my code at the unit/integration level

      You didn't need to repeat yourself

    3. Re:speaks more to TESTING by Junks+Jerzey · · Score: 4, Insightful

      It really amazing how many software project managers that don't fully understand what regression testing is all about.

      Not in important fields like telecom. In those fields you live and die by testing, and you can be held accountable for bugs found in your code. If there are too many, you might be in for it.

      What's shocking to me is that almost no open source authors or advocates give a hoot about automated testing of any kind. The only free software I've found with a test suite is gcc. As much as I hate to say it, there's a good chance that the relative inexperience of most open source authors is a factor here.

    4. Re:speaks more to TESTING by slamb · · Score: 5, Informative

      What's shocking to me is that almost no open source authors or advocates give a hoot about automated testing of any kind. The only free software I've found with a test suite is gcc. As much as I hate to say it, there's a good chance that the relative inexperience of most open source authors is a factor here.

      Perl is really good about this. The Test::Harness and Test::More modules make it very easy to write test suites, so CPAN modules have lots of automated tests. It might even be a requirement to get a module into CPAN; I'm not sure.

      PostgreSQL has regression tests.

      There's a really nice test environment for Java code called JUnit. Lots of stuff is using it. Lots of articles about how to write effective tests. There's a project to develop mock versions of common objects (servlet requests, SQL queries) that fail in interesting, predefined ways. I'm using a C++ workalike called CppUnit in one of my projects.

      The Boost code has automated testing.

      There's a project called qmtest.

      The Wine people have recently started using regression tests.

    5. Re:speaks more to TESTING by qslack · · Score: 5, Funny

      What do you have against 2-year-olds!? That was simply uncalled for.

    6. Re:speaks more to TESTING by Kidbro · · Score: 3, Insightful

      What's shocking to me is that almost no open source authors or advocates give a hoot about automated testing of any kind. The only free software I've found with a test suite is gcc.

      Bullshit

    7. Re:speaks more to TESTING by rutledjw · · Score: 2
      Not in important fields like telecom
      Really? I knew a person who worked on the DSL project at Qwest (when DSL was BRAND new). She told me they did integration with another portion of the system and rolled it out without testing it since Qwest had told customers that DSL would be rolled out on that day. Then it was the standard process:

      • Deveopers protest
      • Developers ignored
      • System crashes (within 5 minutes, although the fact that it came up I thought was impressive)
      • Developers blamed
      I'm sure that wasn't a common occurance. Although as a Qwest DSL customer it cleared up a lot for me...

      ;)

      --

      Computer Science is Applied Philosophy
    8. Re:speaks more to TESTING by Fweeky · · Score: 2

      > Perl is really good about this. The Test::Harness
      > and Test::More modules make it very easy to write
      > test suites, so CPAN modules have lots of
      > automated tests.

      Ruby also has a number of unit test frameworks which are used by a surprising (or not, depending on your PoV) amount of software written in it.

      ruby <name_of_library.rb>, and fairly often you'll see it run some unit tests that were wrapped in an if ARGV[0] == __FILE__ block.

    9. Re:speaks more to TESTING by 0x0d0a · · Score: 2, Insightful

      Regression tests are no fun. Open Source is about having fun writing software.

      The easier the test-suite-making-tools are, the better.

  6. another bug page by blooher · · Score: 5, Insightful

    Software Horror Stories linked from the post's link

  7. Hi-tech toilet swallows woman by DeadSea · · Score: 5, Funny
    Of all of them this is my favorite. It doesn't say if it was a software bug or not though.
    [Source: Article by Lester Haines, 17 Apr 2001, via Brian Randell]
    A 51-year-old woman was subjected to a harrowing two-hour ordeal [on 16 Apr 2001] when she was imprisoned in a hi-tech public convenience. Maureen Shotton, from Whitley Bay, was captured by the maverick cyberloo during a shopping trip to Newcastle-upon-Tyne. The toilet, which boasts state-of-the-art electronic auto-flush and door sensors, steadfastly refused to release Maureen, and further resisted attempts by passers-by to force the door. Maureen was finally liberated when the fire brigade ripped the roof off the cantankerous crapper. Maureen's terrifying experience confirms that it is a short step from belligerent bogs to Terminator-style cyborgs hunting down and exterminating mankind.
  8. Pain & Suffering caused by fatal VxD errors an by mcwop · · Score: 2, Redundant

    the BSOD. I still have nightmares.

    --

    "I don't think it's selfish, to eat defenseless shellfish." -NOFX

  9. Re:especially important in healthcare.. by sisukapalli1 · · Score: 3, Insightful

    I believe more patients' lives are lost because of mistakes by doctors/hospitals/nurses, or sheer negligence. In some parts of India, for example, private hospitals are afraid to admit victims of accidents or crimes because the hospital itself might get into some trouble. Personally, I have seen doctors giving stupid advice, and people losing lives.

    To put things in perspective, fatalities caused by human errors (non programming related) outnumber those caused by software errors by orders of magnitude, in many fields (except, say in launching unmanned space vehicles).

    S

  10. Re:And no one will ever know... by ultramk · · Score: 2, Funny

    I seem to remember PowerBook 3500C was known to catch fire. Not a bug, per se, but it could have killed somebody. I know I almost threw mine out the Window and that could have caused serious damage.


    I know a lot of people who threw out Windows when they got their Macs...

    Michael-

    --
    You catch enchiladas by picking them up behind the head and holding them underwater until they don't kick anymore -VeGas
  11. Pentium bug in perspective by Alomex · · Score: 5, Informative
    Just to be clear, all processors out there have bugs. The pentium bug is in no way exceptional. The only reason it deserves to be there is beacuse the list is called "a collection of famous software bugs that caused large scale disasters".


    The pentium bug is certainly famous because every idiot and its brother think it is rare for a CPU to be buggy. The second condition in the list is "caused a large scale disaster". This condition is, sadly, also met. It caused a large scale public relations disaster for Intel because once again said idiots thought that a CPU bug is rare.

    1. Re:Pentium bug in perspective by jrstewart · · Score: 3, Informative

      Just to be clear, all processors out there have bugs. The pentium bug is in no way exceptional. The only reason it deserves to be there is beacuse the list is called "a collection of famous software bugs that caused large scale disasters.

      What is exceptional is that instead of just announcing a new erratum (which is what Intel and most cpu makers normally do in such a case), Intel tried to bury the problem, initially denying that it existed and then denying that anyone would ever run into the problem. This really pissed off the numerical computing community and destroyed confidence in the accuracy of intel's floating point unit. That's why it was a public relations fiasco.

      see:

    2. Re:Pentium bug in perspective by Alomex · · Score: 2

      How could software distributed under the GPL be any more free?

      Simple: remove the condition that all mods have to be made public.

      Finally, let me address the use of the term "viral" to describe the GPL.

      Let me quote from devlinux.org: The GPL is "viral" in the sense that one cannot combine GPLed work with other work governed by different licenses. If one were to enhance a GPLed work, then your enhancements would also fall under the GPL terms. Like viral marketing.

      One last thing. Please note that I'm not saying Open Source code or even the GPL are bad (in fact I have contributed to Open Source projects myself). My .sig simply points out that the marketing by-line of RMS is, shall we say, inexact.

    3. Re:Pentium bug in perspective by fishebulb · · Score: 2

      The GPL is giving the right and responsibility that you must make any public modifications available in source. The priviledge the original author gave to the modifier, must in turn be given to the user of the modified software.

      It is a privilege to be allowed to modify source and redistribute it. But you must give that same oppurtunity then. A person complaining about the GPL wants to use someone elses hardwork without restriction. Other licenses exist that allow that. Modify software under those licenses then, and ignore GPL software

    4. Re:Pentium bug in perspective by danro · · Score: 2

      Well, call me old fashioned, but I want my processor to count both faster and more correct than me... ;)

      --

      "First lesson," Jon said. "Stick them with the pointy end."
    5. Re:Pentium bug in perspective by pete-classic · · Score: 2

      AC, you have never been good with details.

      You certainly can "combine" GPL software with non-GPL. That is a true statement. You may link (run or compile-time) with any software under any GPL compatible Free Software license.

      You certainly can "combine" GPL software with any non-GPL. That is a false statement.

      You certainly can "combine" GPL software with proprietary. That is also a false statement.

      Furthermore "You can link intact libraries." is also false when using the assumption that we are talking about proprietary software (which you seem to be), UNLESS the copyright holder has made a "clarification" (commonly known as an "exception") as L. Torvalds has done for the Linux kernel.

      It might help you in the future to 1. read the GPL 2. work on your reading comprehension skills and 3. study the fundamentals of rational though.

      Good luck AC. We're all pulling for you!

      -Peter

    6. Re:Pentium bug in perspective by Reziac · · Score: 2

      AMD also "buried" some bugs, most notably a whole bad batch of K6-2 300MHz CPUs from Sept. 1998. That one, I have personal experience of, and had inside information that it was indeed a known issue. Even so, to this day AMD denies it ever happened, and won't warranty the chip. (Got a K6-2 that linux inexplicably won't run on? That's one of the symptoms.)

      Then there was that recent bug AMD tried to blame variously on nVidia, VIA, and [I forget who the 3rd victim was] before finally owning up to it.

      Anyway, don't think Intel is alone in trying to deny when they have a problem product, or that just because a company is the geek-approved "underdog" they must be 100% honest.

      --
      ~REZ~ #43301. Who'd fake being me anyway?
  12. One that we did - killing long distance nighty by mesocyclone · · Score: 5, Interesting
    Back in 1973 we built a system for hotel reservations that had over 1000 mini-computers distributed in hotels all over the US. These computers periodically dialed an 800 number in to get outstanding messages (it was cheaper for them to dial in than for us to dial out to them).


    I wrote the algorithm that scheduled the dialins. It used a pseudo-random approach during the day, weighted by outstanding traffic.


    But at night, there was period during which we had to unload all messages before the next day's processing. During this time, the pseudo-random algorithm was replaced by a deterministic one that assigned computers time slots.


    The computers also had auto-rety in the case of failure, so each call could result in several if it were blocked.


    Unfortunately, during coding I had put in the number of modems answering phones at 20 (as an arbitrary number for testing). During the hectic rollout, this never was changed to the actual number which was much smaller.


    Once the system came on line, every night at 1AM portions of Omaha (which included lots of call centers) would lose all long distance service for a couple of hours, as all these computers called in and retried several times.


    Eventually the phone company figured it out and contacted us, and we discovered and corrected the discrepancy.


    Another issue was that we had a number of hotels that were using pulse dialing (this was a long time ago in a galaxy far far away). Sometimes these would be off by one due to the inherent unreliability of pulse dialing, and the result was a lot of calls to certain numbers related to the 800 number, all in the middle of the night.


    BTW... as far as I know, this was the first large widely distributed commercial computing system to use switched telephone circuits for communications (but no doubt some other grey-haired slashdotter knows of another).

    --

    The only good weather is bad weather.

    1. Re:One that we did - killing long distance nighty by mesocyclone · · Score: 3, Informative

      No, it was a we. Someone else knew about the number of lines. They didn't give me the number.

      --

      The only good weather is bad weather.

    2. Re:One that we did - killing long distance nighty by edibleplastic · · Score: 3, Interesting
      We had a similar situation where we accidentally ddos'd our university's engineering school. We were working on a file-sharing service that had over 600 people sharing at any one time. The lead programmer made a change to how the clients and main server pinged each other in order to make it more compatible with firewalls. The way he did it was that the client would send out a ping, the server would catch it, and throw it back, and so on. The problem was that he forgot to set a delay for this.

      One night our system vanished from the web. Our clients couldn't connect, the website was gone, and we couldn't ssh in. Later on we found out what happened. As more and more clients auto-updated to the new version they began pinging the server to alert it to its presence. It in turn responded, and soon it was doing nothing but sending and receiving pings. To over 700 computers. As fast as it could.

      Somewhere between 700-800 clients the router died, bringing with it the internet connection to the entire engineering school. Somehow we were never disciplined and everything was brought back online within the next day or so. Now that's something to put on a resume: Effectively launched a 700+ system DDoS on own university. Now remember kids, make sure you trust the company that makes your P2P software!

  13. links by Lord+Omlette · · Score: 2

    Click on the links, otherwise you don't get all the details. French-friendly exocet missile? Huh? Unless you click through you don't realize that the British radar thought the exocet missile was a friendly munition since the British arsenal included exocets. Any munition headed straight at you is probably not friendly.

    --
    [o]_O
    1. Re:links by scott1853 · · Score: 3, Funny

      Incoming missile sir! What do we do?
      <officer> Don't worry, it's one of ours.
      <private> But sir, it's still going to HIT us!

      This not only sounds like something that belongs in a Dilbert strip, but also the basis for the logic that allows the spreading of all these e-mail viruses.

  14. My prof at Georgia Tech stressed this a lot by delphin42 · · Score: 5, Informative

    He was considering making Fatal Defect required reading for the C programming course I took. From Amazon.com:

    In Fatal Defects: Chasing Killer Computer Bugs, Ivars Peterson describes dozens and dozens of hoary computer bugs and gives biographical sketches of the bug detectives who located and fixed them. This book, which reads like a novel, is both entertaining and informative. Many of the bugs that Peterson discusses are not in computer programs per se but in the human systems that run and operate the computers. Very often the operator fails to understand what the computer program requires as input and types in an incorrect command. The computer then executes the command, with potentially disastrous results. Fatal Defects has important lessons for both those who design computers and those who use them.

    He also insisted that we not call them bugs. "They are ERRORS, calling them bugs makes it sound like they are cute little accidental things that pop up when actually they are programming mistakes."

    --
    -- Adam
    1. Re:My prof at Georgia Tech stressed this a lot by Peyna · · Score: 2

      I bet he's one of those people that refers to automobile accidents as 'collisions'. It's just a term that has been in use so long that we still use it. I'm sure most people know that nearly all automobile 'accidents' are preventable at some point, just like we all know that a bug is the result of a human error. It's just the origin of the word that made it the way it is today. 'Bug' more defines how it appears from the user's perspective. They are seen as odd quirks, etc. It is notable that most people still know where to place blame for them, most of the time (i.e. blaming Windows for a bug in a particular piece of software running on it).

      --
      What?
    2. Re:My prof at Georgia Tech stressed this a lot by cornflux · · Score: 2
      along the same lines, but aimed at the design and requirements phase of software is The Case of the Killer Robot by Richard Epstein.

      it was required reading in my CS department's capstone class at college... good book, good requirement.

    3. Re:My prof at Georgia Tech stressed this a lot by Waffle+Iron · · Score: 2
      He also insisted that we not call them bugs. "They are ERRORS, calling them bugs makes it sound like they are cute little accidental things that pop up when actually they are programming mistakes."

      When my boss comes around and pesters me about problems with the code, I tell him: "They are FEATURES, calling them bugs makes it sound like they are accidental things that pop up when actually I never make programming mistakes."

    4. Re:My prof at Georgia Tech stressed this a lot by Tablizer · · Score: 2

      (* [professor] also insisted that we not call them bugs. "They are ERRORS, calling them bugs makes it sound like they are cute little accidental things that pop up when actually they are programming mistakes." *)

      That is not very good training for the real world. They should teach *more* BS and euphemisms, not less.

      BS is a matter of survival, not luxury. I wish I had more BS classes and less geek classes. The geek technology changes and becomes mostly irrelavent, but BS skills have been nearly the same since Grog tricked Lork into being dinosour bait.

  15. Read comp.risks by kzinti · · Score: 5, Informative

    Make reading the ACM's RISKS digest a part of your regular routine, and you'll hear about these kind of software-related problems and many others - usually shortly after they happen. The RISKS digest is available on Usenet as comp.risks, as a mailing list, and on the WWW at http://catless.ncl.ac.uk/Risks. A new issue is published on a semiregular basis, every one to two weeks. It's not only informative but interesting too.

    --Jim

  16. Happy to hear it... by Anonymous Coward · · Score: 3, Insightful

    Sure, some people here gripe about this not being newsworthy. But as a hardware guy, I am happy to see that software guys are finally going to be held to some sort of standard.

    In electronics, if your hardware has ONE little problem, it's almost bankruptcy time. Remember the Pentium FP bug? And how it would have affected very little? Remember the hoopla, people wanted new processors, etc..

    But software bugs? Who cares! It's NORMAL, it's EXPECTED. Well, geeks and nerds, time to get your asses in gear and live up to the same standards mechanical and electrical engineers have been living up to for decades.

    I'm tired of being held to a standard of perfection that the software people (who make more money than me!) don't even KNOW about.

    1. Re:Happy to hear it... by jc42 · · Score: 5, Interesting

      There is one highly relevant difference between the way that we deal with hardware and software. With hardware, inner details, schematics, and the like are usually easily available. Often this is required by law in any critical applications.

      With software, most programmers are writing code to run on systems (kernels, runtime libraries, and the like) that are usually proprietary. The inner details are not just neglected; the companies intentionally keep them secret and prosecute people who leak them.

      As a result, software can't be made reliable, not even in principle.

      We do have a few exceptions, e.g. linux and all the GNU stuff. If *everything* underneath your code is Open Source, then in principle you can examine it and find problems. (It ain't easy, but at least it's doable if your employer will permit the time that it takes).

      But we're facing a major battle just getting Open Source software accepted by a tiny part of the market. In most jobs, you are required to write code for systems whose inner working you are not permitted to know.

      The US government is even using proprietary, binary-only computer systems in secure and mission-critical situations. Anyone who expects the code in such situations to be reliable is either utterly ignorant or actively malicious.

      Myself; I'd welcome rules that make me and other software developers responsible for bugs in our code. If there were such a legal requirement, I could point to it when someone denies me access to the information that I need, and say "I can't possibly write correct code when you are keeping vital information from me. Show me the inner details of these parts of the system, and I'll agree to write reliable code for it."

      Of course, in a couple of cases, when I've gotten my hands on such details, I've proceeded to write a proof that certain things could not be done reliably on that system. "Fix that bug in that library, and I'll vouch for my code. Until then, here's my bug report describing exactly how it will fail."

      Unfortunately, when I've done this, the usual result was that I was looking for another job soon thereafter.

      (One such lost job was when I proved that certain sensors in a nuclear power plant could not be made to work reliably due to their software. But that was 20 years ago; maybe they've fixed it by now. ;-)

      --
      Those who do study history are doomed to stand helplessly by while everyone else repeats it.
    2. Re:Happy to hear it... by jc42 · · Score: 2

      > If the code behind the interface doesn't match the documentation, it's not for you to fix.

      Hey, thanks! That's an incredibly elegant summary of what I was saying. It explains exactly why I can't vouch for the correctness of any of my software.

      (Yeah, I know; you didn't intend it that way. But the fact is you're exactly right. In many cases, even when I know where the bug is, I'm not permitted to fix it. That's the way commercial software development works. It explains a lot.)

      --
      Those who do study history are doomed to stand helplessly by while everyone else repeats it.
    3. Re:Happy to hear it... by Darth_Burrito · · Score: 2

      that its OK to check in changes without bothering to test them at all

      Sometimes it is temporally impossible to test all possible areas your change could have impacted. I've almost never seen anyone actually not do any testing. By the way, you used its when you should have used it's. Sometimes it is more important to get your changes into the source base so someone else can start their work. You don't always have time to check everything.

      sometimes checking in changes that don't even compile

      The vast majority of the time I have encountered a problem like this it is a result of dll hell. Do you have all the latest stuff, have you re-registered any new typelibs? Did some goof register some 2000 specific dll so now it won't work on NT? When you are developing software, you can usually only be sure it will compile on your own configuration.

      If engineers were a tenth as sloppy as software developers,

      In my experience, some of the worst programmers are engineers who routinely have to do a little coding. They tend to write stuff their own way, often reinventing the wheel in the process. They also tend to be overly optomistic. Furthermore, I suspect that Engineers are every bit as sloppy as Software Developers, it's just that they are given enough time to correct their mistakes before release.

    4. Re:Happy to hear it... by Tony-A · · Score: 2

      If every programmer needs to understand every piece of code in the entire system, the size of systems that can be developed is severely restricted.
      That's a bit like having to know every street in a city to be able to go back and forth from home to work. The basic problem is that which subset of the code must be known belongs to the power set, exponentially more complex if you set arbitrary barriers.

      You don't need the code behind the interface if the interface is fully documented.
      It never is. Not fully.
      In say FORTRAN, 2+2 should be 4, but after calling a subroutine with a parameter 2 that gets changed to 7, the result is now 14. I think Ada puts a stop to at least some such, at a price. There's no substitute for being able to find out EXACTLY what is being done.

      Bugs fixed on the wrong end of the interface just causes more headaches when it eventually is fixed.
      Actually I quite agree. Of course you're SOL until (and if) whoever finally fixes the problem on the right side of the interface. If the two sides are never allowed to see on the other side, there's a good chance it will never be fixed. Might as well quit now before you get even further behind.

      So if you can trust the other guy to fix discrepancies between code and documentation, you have no reason to need the source code.
      Bought any bridges lately?

    5. Re:Happy to hear it... by sql*kitten · · Score: 2

      The inner details are not just neglected; the companies intentionally keep them secret and prosecute people who leak them.

      As a result, software can't be made reliable, not even in principle.


      I don't see how you can get from your first point there to your second. Take the software in a typical phone switch, for example, it's proprietary and rock solid. If a phone switch goes down, it's most likely a hardware fault, in my experience. The same could be said for any amount of embedded code, when did the software in your cellphone last crash? Your VCR?

      The reason commercial software has bugs is very simply because people (i.e. the market, the people spending dollars on software) have indicated by their purchasing behavior that they are willing to accept a level of unreliability in exchange for shorter version cycles. The same purchasing patterns don't apply to hardware or embedded systems. The fact that the software is proprietary or not is neither here nor there.

      I will also point out that in the high-end software market, the customer will hold the proprietary source code in escrow, just in case.

      If *everything* underneath your code is Open Source, then in principle you can examine it and find problems

      In theory, you're right, but in practice, how many bugs have there been in sendmail or bind? Open source lulls people into a false sense of security, because everyone assumes everyone else has checked it, and no-one actually does! But again, that fact that the source code is available is irrelevant to reliability - properly designed and implemented software, whether it's open or closed source, can be made as reliable as you are willing to invest resources (time, money, people, etc) in.

    6. Re:Happy to hear it... by Rogerborg · · Score: 2
      • But as a hardware guy, I am happy to see that software guys are finally going to be held to some sort of standard.

      As a software engineer, I for one would be delighted to see mandatory legal requirements on software engineering.

      That way, when my latest Bungie Boss swings into the office shrieking that we have to forget everything we've been doing and hack the product completely differently for a new potential customer, I can suck my teeth and say "Ooh, love to, but if you change the requirements, I have to go back to design stage and re-write my test suites. It's the law".

      Or when a module turns up from an Indian sweatshop with uncommented code, one character variable names, no design and no test suite or results, I can just laugh softly and send it to /dev/null.

      I would love to have those protections. But until it happens, I'll go right on doing what I've been doing, which is to hack and slash away in good faith, with no requirements, wrong requirements and changing requirements, and building systems that the sales guys insist on giving to customers, sometimes before we've even done unit testing let alone integration or regression. Yes, I do mean that. I mean a sales guy and my Bungie Boss stand over my shoulder, and when the code compiles and passes a basic sanity test, they take it and integrate it, because dammit, they have stock options on the line here.

      Believe me, I really don't want to write bad software, but right now I'm left with two choices: write bad software to meet the artificial short term goals of my latest Bungie Boss, or join the "mobility pool". I'd love to have legal protection for sticking to my guns. Bring it on.

      --
      If you were blocking sigs, you wouldn't have to read this.
    7. Re:Happy to hear it... by duffbeer703 · · Score: 2

      Perhaps if you took a more mature approach in bringing up problems to management you wouldn't have been fired 20 years ago, the bug would have been fixes, and you probaly would have found others.

      There a few things more annoying than a techie primadonna.

      --
      Conformity is the jailer of freedom and enemy of growth. -JFK
    8. Re:Happy to hear it... by Tony-A · · Score: 2

      And what happens if you expect code to work EXACTLY the way it does at one point in time, then it changes?
      Well, .... you could switch to Ada. I'm not at all familiar with it except that the original intent was to stop exactly that kind of thing without overspecifying implementation details.

  17. GPL by lostchicken · · Score: 2, Insightful

    GPL raises this to new levels of concern. You can never know where your code will be used. It might just find itself in an cruise missile.

    --
    -twb
  18. See..none of its caused by Code written in VB by cOdEgUru · · Score: 2, Funny

    So much for poor Visual Basic Programmers :)

    Damn, they never get to do fun stuff like this.

    1. Re:See..none of its caused by Code written in VB by felipeal · · Score: 3, Funny

      Actually, those caused by VB left no survivors to tell the story...

  19. I can relate by TheGreenLantern · · Score: 2, Funny

    This one time I missed closing an /a tag on a post, and missed getting a wicked killer First Post.

    --

    It hurts when I pee.
  20. Debugging by yintercept · · Score: 2

    I hope you don't mind a little nit-picking. The thread is titled "Debug Your Code." A lot of the problems listed in the article were for errors that occurred in situations outside the parameters that the programmers were expecting.

    I personally see debugging is the art of making sure the code works and is fulfilling the logical expectations of the programmer.

    These problems show that there is a need to go way beyond traditional debugging, and do aggressive testing outside the programmer's box. Debugging ain't enough. Those dregs toiling away in the testing department might be worth their skin afterall.

    kd

  21. Phoenix by The+Bungi · · Score: 2, Interesting
    It's interesting the list includes the Denver airport baggage system breakdown but not the Phoenix Sky Harbor one. A system designed by I forget which consulting firm over the course of three years at a cost of millions of dollars finally had to be scrapped and replaced with custom software done by IBM.

    It delayed the re-opening of the airport for about seven months or so. After it did finally open the system wasn't working yet so the baggage system had to be operated manually for a couple of months.

  22. Worst I've seen in my work by l810c · · Score: 2, Funny

    I was a consultant at a major bank 3-4 years ago. An FTE made a one line error in a Cobol program for printing bank statements. Everyone in a small town of about 6000 got Their first page of statement and pages 2,3,4 etc. of someone else's statement.

  23. Also Missing: by goldspider · · Score: 2
    Ultima IX: Ascension

    What other piece of buggy software in history ever caused as much widespread collective apoplexy? :)

    --
    "Ask not what your country can do for you." --John F. Kennedy
  24. 32. Therac-25, X-ray by wiredog · · Score: 3, Interesting

    The Therac-25 was an automated x-ray machine that overdosed patients. Fatally. It was a UI bug rather than a software bug. It's dissected in "Killer Defects" (IIRC) by Negroponte (again, IIRC).

    1. Re:32. Therac-25, X-ray by irix · · Score: 4, Informative
      The Therac-25 was an automated x-ray machine that overdosed patients. Fatally.

      Well, not exactly. It was used for cancer treatments, not x-ray imaging. And not all of the radiation overdoses were fatal.

      It was a UI bug rather than a software bug.

      Again, not exactly. The problems with the Therac-25 included hardware issues and some UI problems that lead operators to do some interesting things. They also included some race conditions that were definately software bugs.

      You can check out a reprint of an IEEE article discussing it in depth here.

      Just for some history: AECL, the Canadian government crown corporation who made the Therac-25, spun off its medical operations into private companies in the 1980s. The first was Nordion, where I worked for a summer as a co-op student, produces radioisotopes for medical use. Nordion was bought my MDS. The other company was Theratronics, which was responsable for devices like the Therac-25. It went without a purchaser for many years becuase of the stigma of Therac-25, but it was eventually (IIRC) bought my MDS as well.

      Both companies are in my hometown, and the fallout from the Therac-25 (like the IEEE article) was front-page news when I worked at Nordion in the early 1990s. I just worked on sofware to measure how much of a given isotope to dispense to fill an order, but the whole Therac-25 incident was definately on everyone's mind.

      --

      Do you even know anything about perl? -- AC Replying to Tom Christiansen post.
    2. Re:32. Therac-25, X-ray by ahde · · Score: 3, Funny

      It's not a UI bug, just that some people don't surive the mutation:

      X-RAY METER:
      [Off--Low--Med--High--Glow--Kill--Mutate]

  25. It's Worse: The Patriot Never Worked by GuyMannDude · · Score: 5, Informative
    The Patriot missle defense system never worked -- the bug mentioned in the article is a red herring. The main problem was that the Iraqis had modified the scud with additional fuel tanks. The resulting missle was unstable and would start to break apart in flight. The Patriot couldn't lock on to the missle because it of all the schrapnel. In addition, the scuds are poor missles to begin with. When they fly, they do so with a wobble -- like a poorly thrown football. The Patriots had been tested prior to the war on good-quality American missles which flew in a smooth trajectory. The Patriots simply couldn't deal with a missle that "danced around" in midflight. Bottom line: the Patriots simply do not protect against scuds because of poor design -- not some floating point error. The floating point explanation is analogus to that Coriolis-effect-causes-water-to-swirl-in-the-toile t myth that you find in so many physics textbooks (the Coriolis effect only works on planetary scales). It looks good on paper but if the "experts" had bothered to perform a test they would see that the explanation is dead wrong. The failure of the Patriots to intercept scuds (and the fact that the media never mentions this) has grave implications for our anti ballistic missle shield.

    Don't take my word for it. Do a web search and see for yourself. Here are some references to get you started:

    http://www.fas.org/spp/starwars/docops/rp911024.ht m

    http://www.csmonitor.com/durable/1997/09/08/opin/l etters.1.html

    GMD

  26. Re:Just a matter of time and growth by Anonymous Coward · · Score: 2, Insightful

    Not in todays world.

    You write a program that designs a building that suddenly collapses, but since you copyrighted your program, and patented the logic, I will make the same mistake, since I can't learn from your mistakes.

    People will "learn from other's mistakes" less and less, as more and more become trade secrets, and suits get settled, etc.

  27. Slashdot bug by HEbGb · · Score: 2

    How about that horrid bug that extends the width of the window data by a factor of 8, making it impossible to read?

    Does Slashdot want me to just waste more time at work or what?!

  28. The buggs that didn't happen by MountainLogic · · Score: 5, Funny

    I'm sure we all have those bugs that we catch in bench testing. Mine was forgeting to add a cancel button to the following dialog box:

    "OK to delete database"

    When I caught that one I had visions of a user who had his/her million dollar database deleted charging into our office with a shotgun and ... well, you read the papers. Glad I caught that one before I released it to test.

    1. Re:The buggs that didn't happen by Tony-A · · Score: 2

      Use something like "Programmer is an idiot" unless you are the only possible target regardless of circumstances.
      Pronoun problem. Who is the "you"? Depends on context. And the code will be relevant in contexts other than when it was initially written, being run for starters.
      O.T. Who is the "My" of "My Computer"? I didn't put it there. It's not mine.

  29. Re:Much is very iffy to beaf up list by jgerman · · Score: 3, Informative

    Go check out the Risks Forum, links are available from the ACM webpage. There is plenty of proof and explanation for hundreds of software related mishaps. You're obviously looking in the wrong places.

    --
    I'm the big fish in the big pond bitch.
  30. Many of these are NOT bugs... by alouts · · Score: 2, Interesting
    A good number of these incidents are NOT due to bugs in software but in faulty assumptions input into that software.

    If I misestimate the mass of a planet, is that a software bug?

    If my software sells stock when a certain threshold is hit and yours does the same, and that least do a financial industry meltdown, did my software not work as planned, or is the issue more the dynamics of the market being somewhat unpredictable?

    The tacoma narrows and london milennium bridges are both listed here, yet neither one is a software issue - hell the tacoma bridge collapsed in 1940!


    That said, it is a pretty interesting list, but calling it a list of software bugs and using it to underscore the importance of regression testing software is a bit of a stretch. If anything, it underscores the importance of editing and proofreading your content.

    1. Re:Many of these are NOT bugs... by rob_from_ca · · Score: 2

      Amen. The term "Bug" is usually meant to mean software errors. Software errors can creep in during requirements gathering (missing or erroneous information about the intended application), specification development (erroneous assumptions), or actual implementation (off-by one errors, buffer overruns, etc. or for that matter any of the above also).

      All of the process and infrastructure surrounding software development (ESPECIALLY with mission critical apps) is just as important as the coding itself. Until more colleges and education programs start pounding that into students heads, we are going to continue to have the same problems.

    2. Re:Many of these are NOT bugs... by hughk · · Score: 2
      I have known programmers who faithfully follow the specifications presented to them. Unfortunately, they didn't understand enough to be able to do a sanity check. Often a spec written by someone without a real computer background gets to the point of hand-waving when it covers an award point. A Business Analyst with a real technical bacground would normally try to dig to the bottom of the issue, but many who are not technicaly trained would not realise that the problem hasn't been properly defined.

      The end result is that the programmer implements a system that can not work. The system is tested against the faulty spec, and sure they agree. With luck, the bug will be caught in simulation before it goes into production.

      It often isn't.

      --
      See my journal, I write things there
  31. NO! It is called responsibility. by www.sorehands.com · · Score: 3, Interesting
    Why do you think there are 1000 page EULAs?

    Half of it boils down to, We are not responsible for anything bad, even if we were warned about it and have a fix for it that we are holding on to sell as part of an upgrade.

  32. Software bugs...NOT! by T.E.D. · · Score: 5, Insightful
    I'd call it a bad sign when the first two entries on a page that proports to show famous software bugs are not, in fact, software bugs.

    The bug that caused Airane explosion was a requirements analysis bug. The Pentium FP bug was a hardware bug.

    A quick skim of the rest nets me at least 6 more non-software software bugs
    • 4. Mars Climate Orbiters, Loss (Mixture of pounds and kilograms, 1999) - Specification bug
    • 27. Distributed denial-of-service attacks - Malicious people
    • 31. Florida Voting Chaos - not a damn thing to do with computers
    • 34. Wall Street Crash, October 1987 (Acceleration of the crash) - computers did precisely what their users wanted them to do
    • 42. Great Concert Disasters - WTF?!
    • 43. Tacoma Bridge (not a computer bug)(collapse, 1940) - he said so himself

    After seeing that, I can't really trust the list on things I don't have a good knowledge about.

    Here's a challenge for someone: Go through the list and find out how many (if any) of the listed software bugs are actually software bugs.
    1. Re:Software bugs...NOT! by tswinzig · · Score: 2

      The Pentium FP bug was a hardware bug.

      Was it really a hardware bug? (I mean, a bug related to the physical hardware?)

      After all, most computer hardware is just frozen software.

      --

      "And like that ... he's gone."
    2. Re:Software bugs...NOT! by Tony-A · · Score: 2

      Errr...no. This problem had nothing to do with how well the software was communicating.
      Seems very much like a problem with how the software are communicating. The software, however indirectly, reads the wrong units. This sounds exactly like a communication problem, and anything else being involved doesn't make the software right. There was a "bug" between the contractor and the contracting agency. That bug caused a bug in the final software. There's more than one bug.

    3. Re:Software bugs...NOT! by el_chicano · · Score: 2
      * 31. Florida Voting Chaos - not a damn thing to do with computers
      Florida has everything to do with computers.

      How are the votes counted? What exactly is "chad"? If you don't know by now that they are the little holes punched in an IBM computer punch card then you have not been paying very much attention!

      While it is true that the whole Florida situation is not strictly about buggy software, it could have been averted with better hardware (punch card readers that spit out incorrectly punched cards for manual counting) or better training (have election personnel ensure that there are no hanging chad on the voters' ballots). It also pointed out the importance of well designed User Interfaces, because if some of those ballots had been better designed, the number of errors committed by confused voters would have gone way down.

      IMO Florida pointed out some of the major weaknesses of the current computer voting scheme. Computer vote counting is intended to reduce human intervention in order to reduce fraud. If humans have to handle voters' punch cards more or conduct manual counts then it leads to potentially more opportunities for election fraud by unscruplous election personnel...
      --
      A man who wants nothing is invincible
  33. Patriot and Scud by vondo · · Score: 3, Interesting

    The claim for this one is that a Patriot during the Gulf War failed to intercept a Scud missle and the Scud missile killed 26. Ergo, a software bug killed 26 people.

    Considering that even the military now admits that no Patriot *ever* intercepted an Iraqi scud, this inference is unfounded.

    1. Re:Patriot and Scud by jilles · · Score: 2

      What was going on in this case was that the launching system had a minor but cumulative rounding error in its time measuring. After cumulating for several days, the deviation was big enough to let the launched patriot completely miss its target (timing is essential when traveling at several times the speed of sound) and slam into the ground at the wrong place.

      Whether is a direct consequence of the bug is debatable. But it would maybe have hit its target if that bug had not been present and not slammed into the ground on the wrong side of the front-line.

      --

      Jilles
    2. Re:Patriot and Scud by Fulcrum+of+Evil · · Score: 2

      What was going on in this case was that the launching system had a minor but cumulative rounding error in its time measuring.

      What's interesting is that this isn't a bug. The system in question was designed for a maximum duration of 48 hours between resets, and they ran it for a week.

      --
      "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
    3. Re:Patriot and Scud by Fulcrum+of+Evil · · Score: 2

      A system that can only run 48 hours between resets yet allows launch after more than 48 hours is broken.

      You would rather the missile launcher shut itself down?

      Blaming the user for using the product in a way that the programmer did not intend is bullshit because all interfaces used by humans should be designed for human behaviour!

      In this case, it was the product requirement.

      --
      "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
    4. Re:Patriot and Scud by pdqlamb · · Score: 2

      Maybe it would have hit its target without that bug. But the OP's point, I think, is that that Scud would have been the first one intercepted by a Patriot. Unfortunately, it seems there's a lot more than blind luck required to hit a bullet with a bullet. Why weaken the list, and the impact of the "fix your bugs" message, with this highly questionable item?

    5. Re:Patriot and Scud by Rogerborg · · Score: 2
      --
      If you were blocking sigs, you wouldn't have to read this.
    6. Re:Patriot and Scud by duffbeer703 · · Score: 2

      Destruction of targets with proximity fuses is an accepted method of intercepting aircraft and cruise missiles.

      One of the big problems we encountered was that the Patriot fuses tended to direct the force of detanation towards the center of mass of the target. In the case of a SCUD missile, this is a rocket motor. Unfortunately, the warhead typically survives and drops like a rock to the ground.

      The bad thing about this is that somebody is going to have a SCUD warhead falling on his head. The good part is that the missile will not hit it's intended target.

      --
      Conformity is the jailer of freedom and enemy of growth. -JFK
    7. Re:Patriot and Scud by duffbeer703 · · Score: 2

      I should have added that depending on the interception, ballistic missiles which are travelling at mach 6-20 are sometimes able to escape the damaging part of the blast with little damage.

      Missiles in development like THAAD use smaller missiles with no warhead that attempt to score a kinetic kill by striking the warhead. So far, these missiles are not very reliable at all.

      --
      Conformity is the jailer of freedom and enemy of growth. -JFK
  34. Re:Anyone remember this book? by mccalli · · Score: 3, Funny
    ...this book...mainly addresses the problem of becoming overly dependent on software for real-life, mission-critical applications. Unfortunately the book, published 10 years ago, appears to be out of print

    Ah well you see, that's the problem of becoming overly dependent on paper-based systems for mission-critical applications.

    Cheers,
    Ian

  35. Re:Patriot Scud Time Error by Kintanon · · Score: 5, Informative

    That system just wasn't designed for that purpose. It was VERY well designed for its actual purpose, which was tracking AIRCRAFT going WAY slower than that missile. And it was only rated for 14 hours of continuous usage, not 100. So it wasn't a fault in the program per se, but a misapplication of a system designed for a different use.

    Kintanon

    --
    Check out JoshJitsu.info for Brazilian Ji
  36. Remember the Therac-25? by Len · · Score: 2, Interesting
    This is quite amusing, but a software bug in my field can result in patients lives being lost.
    And it has! Here's Leveson and Clark's fascinating investigation of the Therac-25 incidents.

    (A former Theratronics employee is standing right behind me, but he denies having worked on the Therac-25.)

  37. Re:Much is very iffy to beaf up list by blamanj · · Score: 3, Informative

    Blatant karma whoring...

    The risks forum is available as a moderated newsgroup, or you can subscribe to the e-mail version. See the Risks info page.

  38. Patriot time bug worse than represented. by Ungrounded+Lightning · · Score: 2

    The patriot missile failure was blamed on a roundoff error causing an accumulating time error, resulting in a miss.

    But the bug was more fundamental: The missile and radar computers synchronized clocks when the system was booted, then drifted apart. After a hundred hours the drift from the roundoff was enough to make it lose a target.

    But had the missile synchronized its clock upon launch (or better: target acquisition, to give it time to settle), the tiny roundoff error accumulated in flight wouldn't have mattered. Meanwhile, had the calculation been perfect, differential clock speeds still would have caused a drift.

    --
    Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
  39. Nice by pete-classic · · Score: 4, Insightful
    The actual article links to http://www.byte.com/art/9509/sec7/art20.htm which says:

    THE BUG THAT KILLED

    1985-1987: At least four people died when they were exposed to lethal doses of radiation from Therac-25 linear accelerator machines (made by Atomic Energy of Canada Ltd.), used for radiation treatment of cancer. Software errors caused the machines to incorrectly calculate the amount of radiation being delivered to the patient. The most tragic incident to date of death or injuries to human beings due to defective computer software, [emphasis mine] this incident is a reminder that, as we entrust human lives and health to computers, the seriousness of eliminating bugs becomes a life-or-death proposition.


    and goes on to say:


    SIN OF OMISSION

    1991: American Patriot missiles were fairly successful. However, the failure of some Patriot missiles to track and destroy Iraqi Scud missiles during the Persian Gulf War may have been due to a software problem of the system. During one such Iraqi missile attack, 28 American soldiers were killed in their barracks in Dhahran, Saudi Arabia.


    seven times the loss of human life, but less of a tragedy? I guess they are soldiers so fuck 'em, eh?

    This story is over two years old, so they have had ample opportunity to correct it. The "comment" button on that page just takes me to the front page. Nice.

    Also on that page, "The DoubleSpace automati hard disk comparision software included in Microsoft MS-DOS 6.0 [. . .]" WTF is "automati"? "Comparision" isn't even a word as far as I know, but it looks a lot like comparison. DoubleSpace is disk compression software.

    Ironic that there are such glaring errors in an article about buggy software.

    Well, I wasn't particularly a fan of Byte before, but now I'm convinced that they suck.

    -Peter
    1. Re:Nice by pete-classic · · Score: 2

      The point isn't the value of the bug, it was the point of the value of human life. Yes the soldiers were in harms way, and the patients were supposed to get better from the treatment, not die, but I think that explicitly saying that the deaths of four cancer patients is a bigger tragedy than the loss of 28 soldiers is pretty sad.

      Note that I didn't "compare the two", Byte did. That's precisely my point.

      -Peter

    2. Re:Nice by pete-classic · · Score: 2

      I see that you are trying to be funny, but the fact is that 1. I am a geek (if the fact that I'm posting on /. isn't enough, how 'bout the fact that I'm an out-of-work UNIX consultant?) who used to suffer at the hands of jocks and 2. I served four years in the US Army, including a stint in the Bosnian combat theater.

      So I hope you can appricated how unfunny your comment is to me.

      -Peter

    3. Re:Nice by pete-classic · · Score: 2

      When I write "to date" I usually mean "up to the current date as I write this." But the proximity of the "dateline" to the phrase puts this in doubt in this case.

      Damn vage language!

      -Peter

  40. We have a difficult battle ahead ... by jc42 · · Score: 5, Informative

    Some years back, as a grad student, I saw a bunch of colleagues do a rather unnerving experiment. Much of the number crunching was, as usual, done in Fortran. So they instrumented the compiler to silently test for integer overflow, report when it happened, and also report whether the program tested for it.

    Their result was that roughly 50% of the Fortran programs on the mainframe computer produced at least one number in the output that was wrong due to undetected integer overflow.

    This itself would be bad enough. But a bunch of us followed this up by asking Fortran programmers about it. What we did specifically was to point out that, unlike floating point, where there's an interrupt, integer arithmetic required a separate instruction to test the overflow flag. So testing for integer overflow took extra cpu cycles. Then we asked them whether they thought that software should be modified to always test for integer overflow, as is done with floating point.

    The answer was overwhelmingly that if it took extra cpu cycles, the software should not check for overflow.

    When we pointed out that this introduced the risk of programs producing incorrect results, the Fortran programmers invariably said that didn't matter. Faster is better, even if some of the results are wrong.

    I think of this whenever I read about computers used in medical, transportation, or other areas where malfunctioning software could put lives at risk.I don't believe that the "software culture" has changed significantly in this respect since then.

    --
    Those who do study history are doomed to stand helplessly by while everyone else repeats it.
    1. Re:We have a difficult battle ahead ... by T.E.D. · · Score: 3, Informative
      I think of this whenever I read about computers used in medical, transportation, or other areas where malfunctioning software could put lives at risk.I don't believe that the "software culture" has changed significantly in this respect since then.


      That's precisely why people developing safety-critical apps should be (and quite often are) using Ada, rather than Fortran or C. Not only does the languge put in all the checks you mention (and more), but the "software culture" among Ada programmers is significantly better where bugs and safety are concerned.

      Take a look at Praxis' SPARK for a look at how responsible people develop safety-critical software. The approach takes more effort than the typical "hack something together then bash it into shape with the debugger" approach. But in many cases, it is well worth the cost.
    2. Re:We have a difficult battle ahead ... by jc42 · · Score: 2

      Ummm ... I've used a number of java apps in recent years. So I'm not convinced. The people who designed the language clearly had safety and correctness in mind. But a *lot* of the java programmers don't. Or their bosses don't permit it.

      --
      Those who do study history are doomed to stand helplessly by while everyone else repeats it.
    3. Re:We have a difficult battle ahead ... by Tony-A · · Score: 2

      Sir, methinks you are an optimist.
      Faster is better, even if some of the results are wrong.
      IIRC, Boroughs(sp?) had problems with their hardware running a lot of programs. Seems that a lot of the programs read or wrote memory where it the programs shouln't be accessing anything, and the programs just wouldn't run.

  41. Re:It's Worse: The Patriot Never Worked by Lord+Omlette · · Score: 2

    Christian Science Monitor is usually a reputable paper, if only because they have their own reporters doing the work instead of tacking stuff onto ap/reuters stuff.

    The spaces are due to the slashdot lameness filter. Yay perl.

    --
    [o]_O
  42. Bugs? Typos? by Tower · · Score: 2

    So is it a webpage bug when you see:
    "Pentium Prozessor" and "Pentium Porcessors" in the writeup... [rant] This is the same kind of sloppy work that causes cars to explode, missiles to veer off course, and a busload of nuns to get blown up by a rogue robot (Nun Soup?) [/rant]

    Oh well :)

    --
    "It's tough to be bilingual when you get hit in the head."
    1. Re:Bugs? Typos? by Tony-A · · Score: 2

      It's the scope of the bug/typo/error/whatever.
      Pentium Porcessors is harmless enough (limited scope) unless I start down some association of processing five hogs in an abatoir.
      Sloppy work causes problems. Thankfully, most are limited in scope.
      I understand your rant, but the real problem is that minor errors have big consequences. There's no silver bullet, but the key is to somehow drastically curtail the bad consequences of errors that will inevitably occur.

  43. Re:The Ariane blowup was especially amusing by T.E.D. · · Score: 4, Informative
    Then to make it funnier, turns out the system engineers had decided that since software is infallible, any exception condition would indicate a hardware failure(!), so instead of a reset they shut the affected computer down altogether.


    Not quite. The software was built for the Arianne 4. On the Arianne 4, it was physically impossible for that value to ever get high enough to overflow. So on the Arianne 4 the assumption that an overflow could only be due to a hardware failure was entirely correct.
    If they had known that years later an Arianne 5 would come along, and those engineers would stupidly reuse the Arianne 4 code without testing it once, then perhaps they would have made a different decision. But I think the blame goes on entirely on the Arianne 5 guys, who were *not* the ones who wrote that code.
  44. Coupla Notes by StormyMonday · · Score: 4, Informative
    1. The Patriot time-drift was caused by the system being operated outside of its dsign parameters. It was designed to operate during a Soviet invasion of Western Europe, and expected to have to relocate every 8 hours or so. The spec, therefore, assumed that the software would reboot every 8-12 hours. From my experience with the military, if a programmer had put in a clock algorithm that would track indefinitely, he or she would have been ordered to take it out. (Been there. Done that. Broke the coffee mug.)
    2. The Yorktown crash was the result of mixing mission-critical and non-mission-critical programs on the same box. Big no-no.

    So we have a specification problem and a system design problem. Neither is a pure "programming problem".

    Software crashes are like airplane crashes -- blame the lowest guy on the totem pole. In air crashes, it's the pilot. In software, it's a coder.

    --
    Welcome to the Turing Tarpit, where everything is possible but nothing interesting is easy.
    1. Re:Coupla Notes by brer_rabbit · · Score: 2
      From my experience with the military, if a programmer had put in a clock algorithm that would track indefinitely, he or she would have been ordered to take it out.


      Any particular reason why? Is it just because the specs assume a reboot every 8-12 hours?

    2. Re:Coupla Notes by Tazzy531 · · Score: 2

      In defense contracting, everything has to be followed to the exact requirements. Any deviation pretty much voids the contract.

      I used to work for a computer store that had an order to build computer to be sent to bosnia. The specs included stuff such as which slot certain cards have to be inserted. How the wires must be tied up and which wire to group with which. Location of drives like cdrom..etc.

      --


      _______________________________
      "I'm not Conceited...I'm just a realist..."
    3. Re:Coupla Notes by StormyMonday · · Score: 2

      Two reasons. Minor reason: It Is Assumed that a "better algorithm" will use "more resources", whether it does or not. (Defintions of the terms in quotes are optional.)

      Major reason: Defense contractors cannot make money by building a system strictly as specified. Defense contractors make boxcar loads of money on change orders. The normal sequence here would be:

      1. Build the system strictly to spec.
      2. Write a discrepency report identifying a potential clock problem and outlining a fix.
      3. Get a change order and a boatload of money from the government.
      4. Do it the way it should have been done in the first place.

      Any reasonably sized system will have thousands of these change orders.

      --
      Welcome to the Turing Tarpit, where everything is possible but nothing interesting is easy.
  45. This one was deadly: by _ph1ux_ · · Score: 2

    "Wrong Starting Estimate of Uranus mass in
    Iteration, Data Compression, 1986"


    Caused wife 1.0 to go into panic and terminate all sex threads for the next three weeks.

  46. dumbass, the patriot didnt kill anyone by benploni · · Score: 2

    a fucking scud did. The patriot bug prevented it from helping, but it didnt kill anyone. Sheesh.

  47. Always works right on my system by nomadicGeek · · Score: 5, Funny

    My software always works perfectly on my system. Zero bugs.

    I have no idea what the hell the users do to it to screw it up.

  48. Reminds me of mine... by thrillbert · · Score: 4, Funny

    When after sitting down for 36 hours straight when I first learned to program in C, I wrote a small, but usefull, payroll program. By the end, during the function that would print out the check, I added "Press any key to continue, any other key to abort". Lucky for me I never released that program.

    ---
    All comments are not factual unless stated otherwise.

    1. Re:Reminds me of mine... by Tony-A · · Score: 2

      "Press any key to continue, any other key to abort".
      Funny, yeah. But it beats a dialog box with just an OK box.

  49. USS Vincennes Incident was NOT software related by kylef · · Score: 3, Informative

    There were many things that went wrong during the incident, but one of the FEW things that worked correctly was the AEGIS weapons system on board the guided missile cruiser. The error lay in the crew's mistaking the range information reported on the radar screen with altitude information. As a result, the CO thought that the incoming contact was flying straight towards his ship and decreasing in altitude (preparing to attack).

    Blaming a "cryptic display" is hardly a software bug if anyone is familiar with radar screens. That's why we train people to read them!

  50. Re:It's Worse: The Patriot Never Worked by hij · · Score: 2
    It is true that many of the scuds broke apart during re-entry. Some of them did not. Even if all of them did break up, the round-off error is still important. Because the timing system uses a relative time the truncation error associated with arithmentic of large numbers is still deadly, and it is still important.

    The bizarre thing is that the fix is that a patriot installation literally has to re-boot on regular intervals in order to reset the internal timers. The bug is real, and it has to be dealt with.

    --
    Believe nothing -- Buddha
  51. Software bugs...YES! by wurp · · Score: 2

    We have to accomodate our users and unknown environments. When a reasonable user makes your program do something bad, that's either a user training problem or a software bug. You didn't check input data carefully enough, or you didn't provide a good user interface, or your requirements were bad, etc. All of those are software bugs.

    The Pentium bug, I'll agree with. Florida voting, I'll agree with. DDOS is software bug. Software run in an environment like the web has to accomodate malicious users. Wall Street Crash, software bug. There was a set of conditions in which the software did bad things.

    1. Re:Software bugs...YES! by ejasons · · Score: 2, Insightful

      Wall Street Crash, software bug.

      If the software performs according to its specifications (which I assume this software did), then it's not a "software bug", it's an error in the requirements.
    2. Re:Software bugs...YES! by room101 · · Score: 2

      Actaully, if you rtfa (I didn't read the entire thing either), it says that it was an improper for loop to generate this lookup table. This I think was a software problem: they used software to generate this lookup table. It doesn't really matter what the did with the data (burned it into silicon or printed it on a report).

      --
      room101 -- how much can you stand before they break you?
      (they always break you eventually)
    3. Re:Software bugs...YES! by Tony-A · · Score: 2

      "burned in" != Software.

      ROM BIOS

  52. He forgot my favourite... by felipeal · · Score: 2

    ...the ice cream bug!

  53. Re:It's Worse: The Patriot Never Worked by Wavicle · · Score: 3, Insightful

    Your first link is a translation of a patriotic Israeli article cheerleading the competence of their military. It doesn't necessarily make what they're saying false, but does make it suspect.

    The second link is way low on content, I'm not sure how to judge it. All it says is "we looked at a bunch of videotapes and arrived at this conclusion". And then goes on to mention the bitter dispute between the U.S. and Israeli military over why the system didn't work so well in Israel.

    I'm not sure I'm going to buy either argument. I know enough about flight characteristics to question the assertion that the scuds were so good at jinking and chaff the patriots (which were originally designed to hit jinking, chaff releasing aircraft) couldn't hit them.

    If the scuds were dropping debris because extra fuel tanks made them unstable:

    1) Why wasn't the wobble a pronounced problem at launch when the extra weight would have completely thrown off the trim characteristics of the missile?

    2) Dropping "debris" is a bad thing, and it's only a matter of time before doing so results in an uncorrectable failure of the missiles flight aerodynamics. Why weren't most of them failing earlier?

    3) Missiles don't fly in smooth trajectories nearly as often as you think. They jink to try and make anti missile systems (like say the Phalanx close-in weapons system) miss them or think they are dead and not worth any more attention.

    Even if the patriots did fail, why would that have grave implications for our anti ballistic missile shield? SCUDs are cruise missiles, not ballistic missiles. Why do you think those big computers at Norad can accurately predict where the warheads will hit just after boost?

    --
    Education is a better safeguard of liberty than a standing army.
    Edward Everett (1794 - 1865)
  54. Re:Millennium Bridge - Kansas City skywalk by victim · · Score: 5, Interesting

    Human effects on bridges is hardly a surprise. Recall in 1981 when the Kansas City Hyatt's skywalk collapsed, killing 114, because the pedestrians were dancing (and the design was altered to ease construction). You'd think that would have been enough of a wake up call to the millenium designers to consider human motion. more info

    Armys break cadence when marching across bridges, at least as far back as Napoleon's time. Presumably they learned that the hard way.

    On a more personal note, I have participated in the unintentional destruction of a gymnasium. 80 or so people crowded together in the middle, bouncing up and down, and then "down and down". We fractured the engineered wooden joists. Fortunately it failed gracefully. Just sagged down about 4 feet in the middle.

    What I'm trying to say, not particularly directly, is "don't give the designers of the bridge a pass because this new phenomenon struct their bridge". Chastise them for risking people's lives and wasting resources by neglecting the loads placed on bridges.

  55. something bad from an unexpected input is a bug. by systemaster · · Score: 2, Insightful

    In one of my programming classes an instructor had a phrase that applies to this. "Bullet proof your code" Meaning whatever the user enters the program should work right.

    Problems often come up in programing for input. Normally you have an expected range of input, and if your program works at all and the input is in the expected range you get the expected output. BUT what if the use enters the ABSOLUTE maximum value, is your variable size large enough? what about the absolute min, often zero, does it still work right. Your not going to try and divide by that zero or something that will fail. What if a negative number is entered.

    Those are all basic unput checking questions...but its the general idea. Bullet proof your code, or at least try to make it so.

    --
    LinuxWorx
    Spelling errors are intentional as are gramatical error
  56. The Patriot Problem was well known by AppyPappy · · Score: 2

    During live coverage of a Scud attack, one of the Patriots veered sharply to the left and hit an insurance office. The cause was said to be an error due to a time leak. The longer the Patriots were "online", the more leakage occured.

    --

    If you aren't part of the solution, there is good money to be made prolonging the problem

  57. Computer-Related Risks by Peter G. Neumann by Malic · · Score: 3, Interesting

    This IS the text on this very sort of thing. I love techno-"oops, that's not right, is it?"-horror stories and this book is filled with them. I REALLY recommend this book! Here's an example of the page after page of entries it contains:

    Making Rupee!

    Due to a bank error in the current exchange rate, an Australian man was able to purchase Sri Lankan rupees for (Australian) $104,500, and then to sell them to another bank the next day for $440,258. (The first banks' computer had displayed the Central Pacific franc rate in the rupee position.) Because of the circumstances surrounding the bank's error, a judge ruled that the man had acted without intended fraud, and could keep his windfall of $335,758.

    Computer Related Risks - Peter G. Neumann - ACM Press - 1995


    The bottom line here is "computing is, in a technical sense, a risk". Actually, technology - of any kind - is a risk. Which I suppose leads us to remember that life is a risk.

    At which point, I'll just stop rambling and point you all to Amazon to buy the book.

    --
    I swear by MacOS X. Although I use to swear *at* MacOS 9...
    1. Re:Computer-Related Risks by Peter G. Neumann by KernelHappy · · Score: 2

      Eh I've seen worse.

      The first piece of code I ever QA'd back in the day, that was interesting. The long short of the story is my code went in, and people returning stuff were being credited with several million dollars instead of the $20 or $30 they were supposed to get. I was only working at this job for 2 weeks and when I heard about this problem I started gathering my stuff into a pile expecting to be fired on the spot.

      Turns out my code was good someone had changed the configuration during the cutover and the other companies code was so poorly written it was subtracting the return amount from 0 in an unsigned int and putting that amount back in peoples account.

      All said and done something along the lines of $3.5 trillion was accidentally refunded to people but they caught it early enough and were able to undo it. I heard that they only lost about $15K in actual funds (not including panic, man hours to fix, etc).

      --
      -- Button up, your ignorance is showing
  58. CUI by Ozan · · Score: 5, Insightful

    I think most of the bugs in software are the result of "Coding Under Influence". Wether it is a strict time-limit, ambiguous specifications, no sleep or other disturbances, it leads to blatant dumb assumptions or similar faults. Everyone knows that driving under influence is dangerous and can lead to accidents. Why do "software architects" think this is different when someone writes important programs?
    I think part of the problem is that writing software is a rather new handwork in comparison to e.g. metalworking. Programmers don't have a union, often they work under poorer confitions than workers at conveyor belts if you consider the higher responsibility they have.

    1. Re:CUI by Reziac · · Score: 2

      As part of a software testing team on a private project, I worked very closely with a set of professional coders for a two year stretch. In my observation (and mind you, I personally found 75% of the bugs listed in the changelogs), bugs came mostly in two types:

      1) Plain old mistakes: "Ooops, I didn't notice that!"

      2) Barn-blindness or hypersensitive ego: "You're crazy, I never make mistakes like that."

      The former were usually relatively trivial bugs, or were blatantly obvious. The latter tended to be serious or fatal with extended use of the program, tho were often not visible in a short test.

      --
      ~REZ~ #43301. Who'd fake being me anyway?
  59. more here by 3-State+Bit · · Score: 3, Informative
  60. F15 equator bug by darkonc · · Score: 2
    The web page has it as being the F14, but I remembered a posting from that time that said it was the F15 (and it makes more sense, since the F15 was one of the first fly-by-wire aircraft, while the F14, is (I think), pretty much fly by cable).

    In any case, the SERIOUS problem was that when you flew over the equator, the computer would suddenly 'realize' that you were upside down from where you wanted to be and try to immediately turn the aircraft over to the 'proper' orientation. It was said that the aircraft would have survived the maneuver, but the pilot's neck would not.

    Luckily this was found during simulations If it had happened during a real flight, it could have taken a long time (and lots of fatalities) to figure it out.

    On a lighter note, there is apparently a subroutine -- phonetically referred to.. It was either wait_on_wheels() or weight_on_wheels(). In either case, it was added after some slap-happy test pilot tried retracting the landing gear while sitting on the runway (resulting in millions of dollars of repairs).

    --
    Sometimes boldness is in fashion. Sometimes only the brave will be bold.
    1. Re:F15 equator bug by Nehemiah+S. · · Score: 2, Informative

      The prototype F-22 was also lost due to a sign error in the code which controlled the thrust-vectoring nozzles during landing. Technically it was chalked up to pilot error, since he was supposed to lock the nozzles down before beginning the landing procedure, but it is something that should have been considered in the code.

      Frm the unclassified accident report:

      "At the time of the crash, Morgenfeld had been carrying out a planned go-around, and he had just switched on his afterburners and had retracted his undercarriage at less than 50 feet off the runway with thrust vectoring active. At a speed of 175 knots, the aircraft began an uncommanded pitchup followed by a severe stick-forward command from the pilot. The aircraft then entered a series of pitch oscillations, with rapid tail and thrust nozzle fluctuations, exacerbated by control surface actuators hitting rate limiters causing commands to get out of synchronization with their execution.

      An investigation later showed that Morgenfeld had ignored a test-card that required that the vectoring nozzles to be locked into position in just such a configuration that he had found himself at the time of the crash. However, most engineers had also ignored this instruction since they thought it to be unnecessary. At the time of the accident, the aircraft had made some 760 flights and had logged 100.4 hours in the air."

      neh
      aero geek :)

      --
      ... and there is no doubt, that one day he will be
      where the eye of his telescope has already been
  61. OT: The Christian Science Monitor by Squirrel+Killer · · Score: 2, Informative
    Note: I am NOT a regular reader of the Christian Science Monitor.
    That's too bad, you should be. The CSM is highly regarded non-partisan, non-denominational, very independent paper. It is one of the few sources of quality international news in the US (aside from the internet.) While I won't go so far as to say that it is completely unbiased, it certainly is one of the least biased news sources I know of, and their coverage is usually well-balanced. For more info about the paper, check their About the Monitor page. If nothing else, the page is indicative of how independent of the church is the paper.

    -sk

  62. Lots of testing in source installs by A+nonymous+Coward · · Score: 2
    Not saying they are *good* tests, but many source install instructions have the basic steps
    • ./configure
    • make
    • make test
    • make install

    I always run these.

    I had a job writing regression tests. I have not looked into any of these install tests. I doubt they are as thorough as they could be, but they have failed once in a while, and I have always investigated the failures.
  63. Re:Millennium Bridge - Kansas City skywalk by lingsb · · Score: 2, Informative

    Your assumption about the nature of pedestrian motion that caused the bridge wobble is incorrect:

    They did take into account pedestrian movement on the bridge; they didnt take into account pedestrian motion on the bridge locking in to the motion of the bridge:

    1) Pedestrians walk on bridge
    2) Bridge wobbles slightly
    3) Pedestrians adjust their walking to be in phase with bridge
    4) Bridge wobbles more

    This was a new phenomenon, due to the lightness of the construction of the bridge. It is now fixed, by the addition of dampers.

    --

    -BB

  64. Re:It's Worse: The Patriot Never Worked by joib · · Score: 2

    Uh oh. The SCUD is very much a ballistic missile. Therefore most of your points are moot as debris follows the same trajectory as the rest of the missile, disregarding air resistance of course.

  65. Yorktown and divide by zero errors by Fastball · · Score: 2
    I remember reading about the USS Yorktown a couple years back. I laughed so hard I almost came apart.


    I wonder what experiences anyone else has had with divide by zero "glitches." Anybody else have a similar experience?

  66. Re:Millennium Bridge - Kansas City skywalk by igrek · · Score: 5, Funny

    In the old USSR (Stalin times), there was a standard bridge acceptance test:
    1) put project managers, lead architects and engineers under the bridge;
    2) put heavy loaded trucks on the bridge.

    That was real extreme testing.

  67. Re:Just a matter of time and growth by cicadia · · Score: 2, Funny

    You'd better not -- I patented the logic behind those mistakes; if you even think about making the same mistakes, I'll see you in court!

    --
    Living better through chemicals
  68. Re:To shrink word files by gblues · · Score: 2

    Actually, it's a three-step process:

    1. Create your document.
    2. Save as RTF.
    3. Open the RTF in Wordpad and immediately save the file.

    For some reason, Wordpad saves RTF files smaller than Word does. Go figure.

  69. Re:Millennium Bridge - Kansas City skywalk by Captain+Nitpick · · Score: 2, Informative
    Human effects on bridges is hardly a surprise. Recall in 1981 when the Kansas City Hyatt's skywalk collapsed, killing 114, because the pedestrians were dancing (and the design was altered to ease construction). You'd think that would have been enough of a wake up call to the millenium designers to consider human motion.

    The Hyatt's skywalk collapsed soley because of the change in design. The design change caused the walkway to fail to meet building code. Some civil engineers who studied the disaster were surprised it could support its own weight, much less the weight of the pedestrians.

    Quoting from a Kansas City Star article.

    The National Bureau of Standards concluded failure was just a matter of time. "The walkways," its probe found, "had only minimal capacity to resist their own weight."

    The dancing people were by and large on the floor below the skywalk, participating in a dance contest.

    The mistake that caused the Hyatt disaster was not one of failing to consider human motion in the design, but failing to consider the effects of seemingly minor changes in design.

    --
    But then again, I could be wrong.
  70. Re:It's Worse: The Patriot Never Worked by YouAreFatMan · · Score: 2

    As a former ADA (Air Defense Artillery) officer in the U.S. Army, I can tell you that the Patriot Missile system was designed primarily to shoot down enemy aircraft. It's ability to kill missiles is a secondary feature.

    --
    Robotiq.com is heavily tested on animals
  71. Re:more phone frolics by TheSquareRoom · · Score: 2, Funny

    In the early 1990s, before this part of the Eastern U.S. had ten digit dialing, our SCO server would dial out, at 1:00 am, to all the little Pep Boys stores in PA and New Jersey in an attempt to update their inventory tables. Alas, one programmer forgot about the New Jersey area codes, and of course there are some overlapping 7 digit numbers between the two states. Oh, and did I mention that the system was coded to KEEP TRYING every ten minutes minutes until it was successful? Heh, heh...at least it wasn't my phone they were ringing at one am...

  72. Re:Millennium Bridge - Kansas City skywalk by erasmus_ · · Score: 2

    Yikes. I'd call in sick that day.

    Of course the bad side of this is that if it collapses, you've just lost
    1) The bridge
    2) People who were most familiar with the project, and could fix it
    3) The heavy loaded trucks that could've been used for something else

    But I guess the idea was to have this be scary enough so that it would never happen. Ever actually hear of a bridge failing this type of test?

    --
    Please subscribe to see the more insightful version of th
  73. Re:Millennium Bridge - Kansas City skywalk by dillon_rinker · · Score: 3

    1. You'd have lost this anyway - that's the point of the test.
    2. They designed a failed bridge, and you want them to design the new one?
    3. These are cheap when you have hundreds of millions of slaves.

  74. Re:It's Worse: The Patriot Never Worked by Physics+Dude · · Score: 2, Funny
    ... no one has ever died as a result of the Coriolis effect

    What are you saying! The Corilois effect is one of the main causes of huricanes!

    ;)

  75. Re:It's Worse: The Patriot Never Worked by Kynde · · Score: 2

    Coriolis-effect-causes-water-to-swirl-in-the-toile t myth that you find in so many physics textbooks (the Coriolis effect only works on planetary scales).

    I know this is offtopic, but "planetary scales" ? Please... I certainly hope no one quotes you on that one. Man-sized pendulum is just a one example with what one can easily detect the coriolis effect. Long range cannons are another thing where the coriolis effect is also taken into account and there are many many others.

    Just because the bath tub doesnt show the coriolis effect you shouldnt jump the gun and start talking about "planetary scales".

    --
    1 Earth is warming, 2 It's us, 3 it's royally bad, 4 we need to take action NOW
  76. Re:especially important in healthcare.. by ComaVN · · Score: 2, Funny

    Yes, crashing planes and scuds hitting army barracks are funny, but patients losing their lives is not.

    --
    Be wary of any facts that confirm your opinion.
  77. Re:It's Worse: The Patriot Never Worked by Toddarooski · · Score: 2
    Somewhat off topic, I suppose, but... this is exactly what moths do when hungry bats gets too close. Apparently, when they're hit by a strong enough sonar signal, they lose control of their nervous system and have little wing seizures. This makes their flight so erratic that the closing-in bat usually misses them. (If you'd like to read more, here's a link for ya)

    Given how the current trend in a lot of scientific fields is to borrow good ideas from mother nature, I'll betcha this is on somebody's drawing board somewhere.

    --

    "Do you expect me to talk?" "No, Mr. Bond. I expect you to die!"

  78. Re:It's Worse: The Patriot Never Worked by Jonathan_S · · Score: 2, Insightful

    Even if the patriots did fail, why would that have grave implications for our anti ballistic missile shield? SCUDs are cruise missiles, not ballistic missiles. Why do you think those big computers at Norad can accurately predict where the warheads will hit just after boost?

    Um, no. The SCUD is the theater ballistic missile not a cruise missle. It looks like a WWII German V2. See this page for more info.

  79. Some software that works... by gorillasoft · · Score: 2

    Some software that works.

    420,000 lines of code, and only one error found in each of the last three versions.

    In the last eleven versions, 17 errors altogther were found.

    Note how much money it costs to produce software of that quality, and you will see why software usually has bugs - especially when you add in the short development cycles that management wants today. Damn the testing, full release ahead!

  80. Reaction About Reactors by virg_mattes · · Score: 2

    > This small generator can be made pretty damn indestructable (blackbox anyone?)

    This is a nice thought, but we're talking about spacefaring vehicles, not airliners. There isn't an airplane built that goes as high as these vehicles. The problem actually isn't a downrange crash from a failure within launch frame (the casing for the radioactives will withstand this sort of failure), but a reentry-style fall from a failure in the second/third stage. From that height, the generator is going to be falling very fast and very hot, and solid iron has trouble surviving these conditions (as would most black boxes, and anything else not specifically designed to withstand de-orbiting). So, while it's not a very big problem (from sub-orbit, it's hard to hit a populated area), it's still very possible to drop radioactive material in a populated area such that contamination is likely.

    I agree that there has been a lot of fearmongering about using radioactives for spaceships, but we learned in AE classes that using the term "indestructable" in conjunction with anything that leaves the atmosphere is usually a mistake.

    Virg

  81. test suites in free software by brlewis · · Score: 2

    BRL has a test suite, as does Kawa. I don't think test suites in free software are so uncommon.

  82. Re:It's Worse: The Patriot Never Worked by 5KVGhost · · Score: 5, Insightful
    The failure of the Patriots to intercept scuds (and the fact that the media never mentions this) has grave implications for our anti ballistic missle shield.


    I'm pretty sure the media has mentioned this, beyond those two media links you already posted, I mean. The issue has been debated since the first Patriot experiences during the Gulf War.

    But I don't really see how this has "grave implications" for an anti-ballistic missile shield. The effectiveness of the Patriot missile used during the Gulf War era is in doubt, but a that does nothing to invalidate the general concept of destroying a ballistic missile with another interceptor missile. It certainly isn't easy to do, and there may be better ways to accomplish the same goal or things more worthy of our limited resources, but to claim that it's somehow physically impossible is both disingenuous and incorrect.
  83. Accidental Funny by virg_mattes · · Score: 2

    > We, as programmers, should make sure that our software are comunicating properly.

    Am I the only one who sees humor in saying, "...software are communicating properly" in a comment about communicating properly? Anyone?

    OK, I'll shut up now.

    Virg

    1. Re:Accidental Funny by Tony-A · · Score: 2

      Am I the only one who sees humor in saying, "...software are communicating properly" in a comment about communicating properly? Anyone?

      Ok, I'll bite.
      "...software is communicating properly" means that (one piece of) the software is communicating (one thing) properly (at the moment). It is laughable that anyone would expect anything more.
      "...software are communicating properly" implies some sort of plurality in what is communicating properly, which is closer to the high end of the continuum between "sometimes does something right" and "never does anything wrong".

  84. Re:Uranus by tswinzig · · Score: 3, Funny

    Wrong Starting Estimate of Uranus mass

    But I thought Uranus is a hole...


    Any hole sufficiently big enough is bound to have some mass in there, somewhere.

    --

    "And like that ... he's gone."
  85. Re:Much is very iffy to beaf up list by jgerman · · Score: 2

    That may have more to do with the source of your facts than the situations themselves. My mistake though, you came across a little cavalier in your attitude toward potential damage due to bugs.

    --
    I'm the big fish in the big pond bitch.
  86. Re:It's Worse: The Patriot Never Worked by GuyMannDude · · Score: 2, Insightful

    I'm pretty sure the media has mentioned this, beyond those two media links you already posted, I mean. The issue has been debated since the first Patriot experiences during the Gulf War.

    I guess I'll have to take your word for it but I think all the mass media has done is "mention" it. Pretty much everyone I tell about the failure of patriots is either in shock or replies with "That's not true! I know they work! I saw them destroying scuds on CNN!"

    It certainly isn't easy to do, and there may be better ways to accomplish the same goal or things more worthy of our limited resources, but to claim that it's somehow physically impossible is both disingenuous and incorrect.

    I never said that it was physically impossible. Four minutes before your post I made a reply to another's comments. I realize that you probably didn't get to see my 2nd post before posting yours. So at the risk of being modded Redundant, here's my answer:

    "My comment about the Patriot failure being a bad sign for our upcoming missle defense shield was to point out that if we can't hit relatively-slow-flying scuds, how are we possibly going to hit speedy ICBMs? We haven't even solved the theatre ballistic missle problem yet. So we're years away from being able to intercept WMD-bearing ICBMs."

    GMD

  87. Also, be sure to debug your debugging.... by deathcow · · Score: 2

    A friend of mine implementing logic in a process control system, overseeing water control for an entire valley, repeatedly dumped over a million gallons of water doing a few test runs.

  88. Re:It's Worse: The Patriot Never Worked by geekoid · · Score: 2

    "the Coriolis effect only works on planetary scales"
    actually if you take a round pan, put a plug in the center, fill it up with water, let it stand for about a 10 days, then pull the plug, you can see the Coriolis effect. Any still body of water that sits still long enough is subject to that force. However that force is pretty weak, so in vibration or motion will disturb it.

    --
    The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
  89. reminiscent of the original Mac bomb dialog by Preposterous+Coward · · Score: 2

    How did it go? The system freezes up and posts a dialog that says something like "Unexpected system error: -72." And there's only an "OK" button...

    --

    "Biped! Good cranial development. Evidently considerable human ancestry."
  90. Re:It's Worse: The Patriot Never Worked by Wavicle · · Score: 2

    Okay, let's consider I completely missed that the SCUD is a ballistic missile. I thought it wasn't because I thought SCUD stood for Subsonic Cruise Unarmed Decoy. Which apparently there is such an acronym, but it doesn't seem to represent these missiles. I concede that point.

    So we're back to the SCUD being a ballistic missile. So if the missile is ballistic, following a ballistic trajectory, why was it wobbling in flight? The bulk of the missile, sans debris, should have been in a ballistic trajectory. The scud has controllable fins, so it does have some measure of cruise guidance, I would think if the missile were coming apart in flight, the control surfaces would be among the first to go, and then the missile would again be in a purely ballistic trajectory. Why was it wobbling? If it started tumbling end over end, I wouldn't expect it to survive more than 100 miles. Whatever destabilization occurred needed to start while the missile still had thrust, or the control surfaces were still intact.

    --
    Education is a better safeguard of liberty than a standing army.
    Edward Everett (1794 - 1865)
  91. Re:To shrink word files by danro · · Score: 2

    For some reason, Wordpad saves RTF files smaller than Word does. Go figure.

    Easy, wordpad was too small to get all the bloat in.
    I'm sure they will have fixed that in the next version of windows ;-)

    --

    "First lesson," Jon said. "Stick them with the pointy end."
  92. Re:It's Worse: The Patriot Never Worked by danro · · Score: 2

    Well, lets hope they don't shoot during reboot then...

    --

    "First lesson," Jon said. "Stick them with the pointy end."
  93. No need to make new ones... by danro · · Score: 2

    I think there are at least one country that is holding on to old (supposedly outdated) Surface To Sea missiles for this exact reason.

    Test (where they were presumably used as target practice) showed that they were very difficult to hit due to their erratic flight.
    This "feature" (and now we are talking eature in the MS sense, folks) was considered to compensate for their low accuracy.

    Calculations indicated that they would have a higher kill ratio than a better missile would.
    Wierd!

    --

    "First lesson," Jon said. "Stick them with the pointy end."
  94. Re:Millennium Bridge - Kansas City skywalk by danro · · Score: 2

    Trust me, when all involved know of this beforehand, the bridge won't collapse!
    When you have to trust your work with your life, you do a good job.

    No matter how wicked this sounds, it probably worked... I doubt many people was killed.
    (Of cours Stalin more than compensated for the lack of bloodshed in other ways...)

    --

    "First lesson," Jon said. "Stick them with the pointy end."
  95. Re:Not Bugs, Time Bombs by Tony-A · · Score: 2

    I had a prof who was stressing someone else's opinion that we change from the word "bug" to the phrase "time bomb" so that people would get a better feeling for what they really were -- incorrect sections of code just waiting to mess you up.
    I'd vote for Booby-Traps. There's lots of boobies waiting to be trapped.
    Possibly, an accident looking for a place to happen.
    But seriously folks, the problem with bugs is that they sometimes get together with other bugs and then produce spectacular results. I've even seen a triple. Three bugs. Any one of them absent, it would be impossible to find anything wrong.

  96. gcc not the only one by Goonie · · Score: 2

    GnuCash does.

    --

    Any sufficiently advanced technology is indistinguishable from a rigged demo
    --Andy Finkel (J. Klass?)
  97. Re:MOD THIS UP by halflinger_n · · Score: 2, Informative
    There is also

    http://sunnyday.mit.edu/therac-25.html

    Which includes links to the author's other papers and publications.

  98. Scratch Monkey by Ratbert42 · · Score: 2

    Don't forget to mount a scratch monkey.

  99. An excellent link . . . by himi · · Score: 2

    See here

    Not in depth theory, but an excellent explanation . . .

    himi

    --

    My very own DeCSS mirror.
  100. Re:Just a matter of time and growth by Tony-A · · Score: 2

    ...undertaken there will be mistakes. But we learn from each one ...
    Do we?
    With Open Source and well publicized bug lists, maybe, some of the time.
    With Closed Source, surely you jest.

  101. Re:Debugging by Tony-A · · Score: 2

    the program should only do what it was designed to do, not more or less.

    throw data at a program that it shouldn't get
    Exactly. Otherwise it's like a boat that stays afloat only in a harbor on a calm day.
    And a cracker's exploit of a buffer overflow is maybe the kindest way of experiencing the bug. If you don't think so, explore the consequences of a shipping clerk triggering the same bug on a production system. Horrible thought, but the "black hats" may be the only friends you've got in the business.

  102. Re:Debugging by Tony-A · · Score: 2

    And for the life of me when will programmers realize
    that just because the software works on their
    box it may not work on another machine.


    Just installing Visual Studio,Source Safe etc, makes
    testing useless on your main machine.
    Program on your main machine.
    Test on a clean machine.


    Something is horribly broken here.

    If you must develop and test for such an environment, you must test on both a clean machine and a machine with all the "goodies" loaded. It doesn't really work to tell a customer that he needs to get rid of all his other software, reinstall Windows, and load your software.
    And no its not easy, you need to test against all combinations of wierd stuff to ensure that it will actually run.

  103. There will always be a bigger bullet. by leuk_he · · Score: 2

    You can protect your program against simple input errors.

    I know this as the monkey test:
    Put someone behind the keyboard (just pull a sales from the other end of the building). Let hime type/click away. He(/she) shouldn't be able to do read damage.

  104. Re:Millennium Bridge - Kansas City skywalk by mpe · · Score: 2

    Human effects on bridges is hardly a surprise. Recall in 1981 when the Kansas City Hyatt's skywalk collapsed, killing 114, because the pedestrians were dancing (and the design was altered to ease construction).

    Actually this wasn't a "human effect" the walkways would have collapsed anyway, as built, simply due to the force of gravity. Since the bolts holding the upper walkway were taking the load of both. (Also the bolts were right through where two pieces of metal were welded together. Thw whole thing simply wasn't built as designed.)
    The Millennium Bridge sway was due to people moving on it, it was, AFAIK, built according to the design. Just that the design was bad.

  105. Re:Millennium Bridge - Kansas City skywalk by mpe · · Score: 2

    In the old USSR (Stalin times), there was a standard bridge acceptance test: 1) put project managers, lead architects and engineers under the bridge;

    More recently a Moscow court sentanced an architect to live in a building he designed. Most likely this is a Russian, rather than Soviet, principle.

  106. Re:Millennium Bridge - Kansas City skywalk by mpe · · Score: 2

    The mistake that caused the Hyatt disaster was not one of failing to consider human motion in the design, but failing to consider the effects of seemingly minor changes in design.

    It certainly wasn't the first case where an apparently minor change made in construction caused a structure to collapse. Nor is it likely to be the last.

  107. Purposeful Accident? by virg_mattes · · Score: 2
    > "...software are communicating properly" implies some sort of plurality in what is communicating properly, which is closer to the high end of the continuum between "sometimes does something right" and "never does anything wrong".

    I have to argue the usage here, because while your argument holds logical merit, his usage was incorrect in this context. His entire sentence read
    We, as programmers, should make sure that our software are comunicating properly.
    This usage is not right, since in this context, if he wants to indicate plurality he should use the plural form "softwares" (which is awkward itself, but it the only grammatically correct choice for this construct). He could reword the sentence "...software communicates properly" but as he used it, it's not proper communication, hence my detection of irony.

    Virg
    1. Re:Purposeful Accident? by Tony-A · · Score: 2

      I have to agree with you about usage.
      The irony is that correct usage leads to incorrect usage.

  108. Re:It's Worse: The Patriot Never Worked by mpe · · Score: 2

    I am not expert in aerodymnamisc but in general the turbulences above wings increases the lifting forces but also drag forces so you increase the fuel consumption (you can see that during plane landing where for lower velocities the wings are reshaped)/

    The purposes of flaps (and leading edge slats) is to lower the stalling speed and increase lift at low speed. Otherwise you'd need much longer runways. (Though probably not quite as long as that at the KSC).

  109. Re:Not really a bug though... by bluGill · · Score: 2

    Works as designed is the excuse for a LARGE number of bugs. Just because it was desgined that way doesn't mean it isn't a bug, just that the bug goes deeper than software, into the design.

  110. Re:stairway to heaven by el_chicano · · Score: 2
    I've always considered whipping out lighters during a gig to be a hanging offence.
    IFF you are not sparking up a doobie of course! :->
    --
    A man who wants nothing is invincible
  111. Re:Millennium Bridge - Kansas City skywalk by Paul+Jakma · · Score: 2

    hmmm.. it wasnt that the cable could support only half the tension, it was that the joints (those for the upper cable to upper walkway joint) had to take double the load.

    in original design joints for each level supported only the load of that level (only the roof joints supported entire weight). by splitting the rod and having 2 joints on the upper level, the upper cable to upper level joint had to support the weight of both the walkways - which it couldnt do. :(

    --
    I use Friend/Foe + mod-point modifiers as a karma/reputation system.
  112. Dependant Upon Meaning by virg_mattes · · Score: 2

    Well, this does depend on how you define "disperse". It's unlikely that the fuel would be thrown all that far when the reactor remains hit the ground, but since a good portion of the shielding would have been peeled off by the fall, the resulting lump of slag would be (both in a radioactive and thermal sense) quite hot, so nobody could stay within a quarter mile of it safely. In a remote location in the Sahara desert, it wouldn't pose much of a threat, but if it fell in a New Jersey suburb, it could displace quite a few people.

    Virg

    1. Re:Dependant Upon Meaning by virg_mattes · · Score: 2

      A quarter mile is minimum accepted safe distance for radioactive contamination, according to Civil Defense code. This may be a rule from back in the '50s, but I'm pretty sure it's still law. I'll try to find the information, but the last time I saw this was in a printed book we found in an abandoned fallout shelter, so I don't know if it's on the 'Net.

      Virg