Slashdot Mirror


Why Computers Suck At Math

antdude writes "This TechRadar article explains why computers suck at math, and how simple calculations can be a matter of life and death, like in the case of a Patriot defense system failing to take down a Scud missile attack: 'The calculation of where to look for confirmation of an incoming missile requires knowledge of the system time, which is stored as the number of 0.1-second ticks since the system was started up. Unfortunately, 0.1 seconds cannot be expressed accurately as a binary number, so when it's shoehorned into a 24-bit register — as used in the Patriot system — it's out by a tiny amount. But all these tiny amounts add up. At the time of the missile attack, the system had been running for about 100 hours, or 3,600,000 ticks to be more specific. Multiplying this count by the tiny error led to a total error of 0.3433 seconds, during which time the Scud missile would cover 687m. The radar looked in the wrong place to receive a confirmation and saw no target. Accordingly no missile was launched to intercept the incoming Scud — and 28 people paid with their lives.'"

626 comments

  1. Poor QA by slifox · · Score: 5, Insightful

    It's pretty pathetic and negligent that software that controls explosive missles was not tested for over 100 hours of operation. That's a standard Quality Assurance procedure for even the simplest low-budget hardware...

    It's also pretty pathetic that the system designers implemented a broken design and did not foresee this problem. High-resolution timekeeping has been accomplished pretty successfully already...

    I wonder how much time and money was spent in research and development for this thing
    It doesn't seem like we're getting a quality product for the likely huge sum that was paid for it...

    1. Re:Poor QA by Anonymous Coward · · Score: 0, Troll

      I'm sure had *you* been on the team, this never would have happened eh? All the other problems associated with making a missile do what no other missile had done before, what many, many people said could not be done, you would have solved all those problems too eh?

      What's pathetic are Monday Morning Quarterbacks who get winded just getting up for a beer.

    2. Re:Poor QA by Anonymous Coward · · Score: 5, Informative

      This particular story took place in 1991, and most of the code for Patriot was written in the 70s - needless to say, software QA was a little more lax back then. The fix for this problem was out a couple days after the incident.

    3. Re:Poor QA by betterunixthanunix · · Score: 2, Informative

      I want to know who programmed a system that allowed floating point errors to accumulate over time in a critical calculation. I hope they did not receive a degree in computer science, or that if they did, it was not from my alma mater.

      Seriously, what programmer has not heard of floating point errors? That has to be one of the most common phrases I have ever heard in relation to programming; even the EEs and MEs I have met are familiar with the concept.

      --
      Palm trees and 8
    4. Re:Poor QA by commodore64_love · · Score: 4, Informative

      >>>It's also pretty pathetic that the system designers implemented a broken design and did not foresee this problem. High-resolution timekeeping has been accomplished pretty successfully already...

      I sorry.

      j/k.

      We had a similar problem with an Aegis design, and it was a major headache for us Hardware engineers to try to convince the Systems Engineers that counting in Binary time was more logical than counting in 0.1 second increments. The SEs kept insisting that their computers at home accurately count in seconds and we hardware engineers should be able too. The HE manager and the SE manager were butting heads for about a month over this issue, until finally an upper-level manager handed-down a decision in favor of the HE manager and binary-based counting/requirements documentation.

      I guess in the Patriot situation, the decision went in the opposite direction. Hence errors we introduced.

      --
      "I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
    5. Re:Poor QA by Stele · · Score: 4, Funny

      I guess in the Patriot situation, the decision went in the opposite direction. Hence errors we introduced.

      Ah, so you're the one responsible!

    6. Re:Poor QA by Anonymous Coward · · Score: 5, Interesting

      Hindsight is almost 20/20. Except that the original purpose of the Patriot was to shoot down much slower aircraft, flying parallel to the earth, not ballistic missles. This new use for Patriot was essentially experimental and had had been rushed to war - and in war you run into alot of unexpexcted circumstances. For example, conventional doctrine in the 1980's required Patriots to move constantly on the battlefield to avoid air attack. The clock would then reset when repositioned. No one expected a Patriot in air defense mode to stay stationary for 10 hours let alone 100. But in a missle defense role they did. There is a good GAO report on this.

    7. Re:Poor QA by Rising+Ape · · Score: 3, Insightful

      Seriously, what programmer has not heard of floating point errors?

      I had a similar issue with some code of mine for physics analysis. While I had heard of floating point errors, they're a lot more subtle than it first appears, and I ended up falling victim to one. Fortunately I discovered it before it actually let to any serious problems, it just resulted in wasted time.

      Not everyone with a need for programming has a CS background and enough experience to be aware of all the potential problems. You'd hope that someone working on a missile system would have though.

    8. Re:Poor QA by dbIII · · Score: 4, Insightful

      Oh really? The problem with these systems is that they have never worked in anything other than rigged tests and are just silicon snake oil.
      I remember having this same discussion where there was a story here about some sort of Israeli space lasers that could apparently even shoot down artillery shells. Only a few months after that a very large number of thirty year old rockets dumped at discount price by Iran for being obsolete came flying over the border from Lebanon. Since then a lot of even slower rockets came out of Gaza. The success rate of this amazing new space toy matches that of the Patriot - zero.

    9. Re:Poor QA by OeLeWaPpErKe · · Score: 5, Insightful

      Mod parent up ! This idiotic article blames computers for programmers using numerical approximation algorithms illadvisedly.

      which is stored as the number of 0.1-second ticks since the system was started up. Unfortunately, 0.1 seconds cannot be expressed accurately as a binary number, so when it's shoehorned into a 24-bit register — as used in the Patriot system — it's out by a tiny amount. But all these tiny amounts add up. At the time of the missile attack, the system had been running for about 100 hours, or 3,600,000 ticks to be more specific. Multiplying this count by the tiny error led to a total error of 0.3433 seconds, during which time the Scud missile would cover 687m. The radar looked in the wrong place to receive a confirmation and saw no target. Accordingly no missile was launched to intercept the incoming Scud — and 28 people paid with their lives.'"

      So in a system that should have clocks synchronized to less than a microsecond nobody bothered to run "ntpdate" even once in hundred days ? And surely the military has better clock synch than a stupid home pc ? This is stupidity, also known as "human error", causing those deaths. It's a case of "the correct answer to the wrong question".

      What is always brought up as a "computer problem" is the crash in Paris of a jet due to infighting between the human pilot and the autopilot. Of course, there the ultimate mistake was the pilot's : he had forgotten to turn off the autopilot to land. It was set for cruising altitude (3km), and the pilot was trying to land. This resulted in ever more desperate attempts by the autopilot to get the plane to gain height, which eventually resulted in a total loss of lift for the plane, which naturally resulted in the plane hitting the ground nose-down and a big fireball. The computer did exactly as instructed, it's just that the pilot's (unintentionally given) instructions were stupid, and the fact that it took the pilot over 3 minutes to realize just how stupid he had been.

    10. Re:Poor QA by Anonymous Coward · · Score: 0

      success rate of this amazing new space toy matches that of the Patriot - zero.

      You're claiming the Patriot's success rate is zero, really? Were you alive during Desert Shield? Take a minute to learn what the fuck you are talking about.

    11. Re:Poor QA by TheRaven64 · · Score: 4, Informative

      Everybody knows that they exist, fewer people know how to avoid them. Lots of early multimedia frameworks, for example, were written using floating point timestamps and developed this exact problem (add some fraction repeatedly for each audio and each video frame, and after an hour the two tracks are noticeably out of sync). Now, they use a numerator-and-denominator form which is simple to add without rounding errors and so you only get them when you convert to floating point for comparison.

      Even fewer people realise how compiler and hardware dependent they can be. For example, if you do a sequence of floating point operations on x86 then the values will stay in 80-bit registers until they are stored out to a variable. If you compile the same code for a newer machine with SSE or for another architecture then you will get 32-bit operations on your 32-bit floats and so you'll have less precision. A lot of compilers will even generate different precision between debug and release builds.

      --
      I am TheRaven on Soylent News
    12. Re:Poor QA by OeLeWaPpErKe · · Score: 5, Interesting

      The Iron dome system works perfectly. It's just not capable of protecting any kind of large area. It can, however, make a military base invulnerable to rocket fire, and they're working on making the system mobile, to protect tanks. The only real problem left for doing this is the power requirements.

      For ships, another such system exists, and protected the ships perfectly well from those same rockets fired by hizbullah. It's "protection range" ? In the largest deployment about 200 square meters.

      There is also the problem that a downed missile presents. What is a "downed missile" ? Well it's a large collection of very-high speed pieces of metal that have been heated up by a large explosion that's about to crash into the ground. So far so good.

      So what is "the ground" in the case of a hizbullah or hamas missile launch ? Well it's the center of the city that's controlled by the terrorists. It's their human shields. Markets, schools, you name it. So a successfull missile intercept is reported in the press as "Israel fires a rocket into a palestinian kindergarten". That is, by the way, the literal truth, even if the rather important detail of a rocket's presence above said kindergarten is left out. In the deployed missile intercept installations "the ground" is chosen to be something else, like the ocean surface.

      Missile intercept systems are no solution for terrorism. Most unfortunately, the only solution for those rocket attacks is preventing they're fired in the first place. Which obviously requires either palestinians police their own terrorists, or someone does it for them (that's called "occupation").

      These systems work, they are deployed successfully in the field. They're no silver bullets, and any bullet that's fired, whether a missile or a missile-intercept-missile, will eventually hit the ground at rather high speeds. Which makes their use above urban environments result in civilian casualties.

    13. Re:Poor QA by Uzik2 · · Score: 2, Interesting

      Agreed completely!

      Why did this thing not get designed with continuous feedback on position instead of a closed loop with cumulative errors?

      Also, it's not the computer that sucks at math. It's the guy who decided a cheaper programmer was more cost effective than a good one. Turned out not to be a very wise decision.

      --
      -- Programming with boost is like building a house with lego. It's a cool but I wouldn't want to live in it
    14. Re:Poor QA by Hal_Porter · · Score: 5, Insightful

      There is a good GAO report on this.

      This one?

      http://www.fas.org/spp/starwars/gao/im92026.htm

      Wow. People complain about the US government. Still look at the transparency. The GAO wrote a very readable report for the House Of Representatives and now we can all read it on the web. It's not unreasonable to think that the US's vast military superiority over everyone else on the planet is at least in part due to this sort of thing. I don't think any other government would do this - mistakes in the military would just get covered up as state secrets and anyone who tried to talk about them would get locked up or worse.

      --
      echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
    15. Re:Poor QA by iamacat · · Score: 1

      Typical geeks. A tabletop LCD clock is obviously pretty good at counting time in seconds way for way longer than 72 hours. You could have just included a 10Hz crystal and told software engineers to store time as the number of ticks rather than ticks * 0.1f. Or any frequency crystal where fire_interval * x / y is a good approximation of 0.1 for some x and y in range of hardware integer arithmetic capabilities. You add x for each tick and increment the time variable/subtract y while the result is greater than y. Problem solved!

    16. Re:Poor QA by Shinobi · · Score: 4, Insightful

      To be honest, from working in two specialist fields(HPC system level programming and embedded applications(particularly sensor stuff), I've experienced that CompSci grads are more likely than CompEng or EE grads to make errors like this. A large part of it is simply that CompSci nowadays is too high-level and abstract, many of them don't know very much about how computers ACTUALLY work other than as a theoretical model.

      A common remark is "Why should I need to know that, the compiler will take care of it better than I will anyway", completely forgetting that the compiler is only as smart as the programmer who coded it is. So you can get what I ran into with an odd appliance based around the SH-4 processor I was hired to fix some performance problems with. It ran fixed point integer and decimal math, and was ported over from ARM. But it only reached about 25% of maximum theoretical performance, while the ARM reached around 80%. Turns out GCC was at fault, using a generic method that wasn't suitable for the Super-H architecture. And the CompSci had no clue about such things.

    17. Re:Poor QA by jafiwam · · Score: 1

      It's pretty pathetic and negligent if the Patriot missile system was put online as an anti-missile system.

      It wasn't that in the beginning. It's an anti-AIRRAFT system, for bombers no less, stuff that isn't changing direction quickly, and certainly isn't going over the speed of sound (not that Soviet shit anyway).

      Missiles fly at a whole different speed category than manned aircraft, where this issue could get tested over and over and not have an impact on a hit.

      If anything, this problem come from "retool old stuff for new purpose" and the accompanying difficulty, not from testing. The Patriot system worked well for it's _designed_ task. Unfortunately, hitting a ballistic missile is not in it's _designed_ task.

      If you haven't noticed, intercepting a ballistic missile is not an easy task, or all that new-fangled stuff would have worked.

    18. Re:Poor QA by dave420 · · Score: 2, Informative

      There was no evidence of them hitting a single target. None.

    19. Re:Poor QA by dbIII · · Score: 0

      Your google link takes me to wikipedia and this is exactly what I am talking about:
      "Throughout the war, Patriot missiles attempted engagement of over 40 hostile ballistic missiles. The success of these engagements, and in particular how many of them were real targets is still controversial. Postwar video analysis of presumed interceptions by Prof. Postol suggests that no Scud was actually hit;"
      That is what the F* I am talking about.
      Expensive failures are very embarrassing and result in much PR which may be all that those that do not pay much attention to the news hear.

    20. Re:Poor QA by TCPhotography · · Score: 4, Interesting

      1. The Patriot version used in the Gulf War (round 1) was not designed to be used against Tactical Ballistic Missiles (like SCUDs), but against opposition aircraft. A fighter isn't going to be flying as fast, and thus the error is going to be much smaller, which means the missile would probably still find the plane.

      2. The Patriot has a quite good record against SCUDs (after the software upgrades). Much better than the Soviet SA-2s did against B-52 raids in Vietnam.

      3. Systems don't always work right the first time, and if you do a full on test to start with, and something goes wrong, it's a lot harder to find where the error is than if you test one part at a time.

    21. Re:Poor QA by Antique+Geekmeister · · Score: 1

      Amen. I've had a number of instances in my career where computer science students and graduates were very carefully taught _never_ to look beyond their own little set task, their own little function, to dig deeper for where the errors might be creeping in. The result is not only ignorance of the lower levels of actual digital computation, but a refusal to check results from those other modules or awareness that such errors occur.

      And oh, dear God, if I never have to peel apart another badly written Perl module that's re-inventing the wheel for numeric calculations and getting the rounding wrong or introducing new fencepost errors, I would.... I'd have to stop insulting Perl programmers for at least a week.

    22. Re:Poor QA by Alef · · Score: 2, Insightful

      Even if a flawed design would have worked in the intended usage scenarios as you speculate, given the option of writing a correct program and an incorrect program with no significant difference in effort, why would you ever consciously consider choosing the broken solution from the start? This sounds more like plain and simple incompetence to me.

    23. Re:Poor QA by gyrogeerloose · · Score: 1

      Seriously, what programmer has not heard of floating point errors? That has to be one of the most common phrases I have ever heard in relation to programming; even the EEs and MEs I have met are familiar with the concept.

      Even I am familiar with the concept and I was a freakin' art major. There's no excuse at all for someone working on the code of a mission-critical project to make that sort of mistake.

      --
      This ain't rocket surgery.
    24. Re:Poor QA by Anonymous Coward · · Score: 0

      The patriot missile system was an anti-aircraft system, and at that it did a great job. Protecting against SCUD missiles was a secondary duty shoe-horned in after the fact.

    25. Re:Poor QA by mindstrm · · Score: 1

      It also depends on what their design goals were - perhaps they met them?

      It's easy to look back in hindsight and say "That's obvious" - but if as a previous poster said, they were designed to shoot down slower targets, and as another said, the system was required to re-boot and re-synch every 36 hours, and this was ignored..... that's design.

      Someone signed off on the design and use case.

    26. Re:Poor QA by mindstrm · · Score: 1

      Did they have another system to put in place?

    27. Re:Poor QA by beelsebob · · Score: 1

      Let me correct that for you:
      It's pretty pathetic and negligent that software that controls explosive missles was not proved correct.

    28. Re:Poor QA by Anonymous Coward · · Score: 0

      "No one expected a Patriot..."

      We had a failure of IMAGINATION, not software, not hardware.

      All of the technical skills existed and the equipment was well capable of meeting a different criteria of being stationary for longer periods.

    29. Re:Poor QA by commodore64_love · · Score: 1

      "Just" include a crystal? Ya know there are differing levels of accuracy for crystals. For an operating time of 5 days you'd need 10 hertz +/- approximately 0.0000001 accuracy to stay within 0.1 second real time. It's been awhile but as I recall we spent $200 for it because the one included with our off-the-shelf CPU board wasn't good enough.

      As for tabletop clocks they use the 60 hertz line frequency, which is regulated by the government, hence the accuracy. Although even though they are not perfectly accurate. My alarm clock loses about 5 seconds every month. My VCR clock loses around 1 second each month.

      --
      "I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
    30. Re:Poor QA by Alef · · Score: 4, Insightful

      I don't think any other government would do this - mistakes in the military would just get covered up as state secrets and anyone who tried to talk about them would get locked up or worse.

      Eh. Forgive me, but do you have any basis whatsoever for this claim, or are you just being arrogant?

    31. Re:Poor QA by Anonymous Coward · · Score: 0

      Very good points. If I could I'd mod this up.

    32. Re:Poor QA by jimicus · · Score: 1

      There may be a glimmer of truth to that. When I was at uni I recall the teaching was all about object oriented design and how wonderful it was - to the extent that at least one lecturer I remember didn't actually understand anything that wasn't OO.

      Well and good, but the whole point of object orientation is that you shouldn't have to think outside your own modules. I don't think I've ever seen a scenario IRL where any given developer could honestly say this was entirely practical.

    33. Re:Poor QA by Anonymous Coward · · Score: 0

      Wow. People complain about the US government. Still look at the transparency. ... I don't think any other government would do this.

      Undoubtedly the USA is the best overall government to live under in the world. Yet, it is FAR from optimal, even under its own supreme, founding law.

      A thing can be the best of its kind and still need major improvement.

    34. Re:Poor QA by KZigurs · · Score: 1

      In fact it was released a couple of days BEFORE the incident. Report states that it failed to be deployed due to logistics issues

    35. Re:Poor QA by Anonymous Coward · · Score: 0

      Worse yet, the software bug was known and the fixed code had been rushed to the theater of operations. However, at the time of the strike, the Patriot battery at Dahrain had not been upgraded yet. However, the workaround (turn the thing off once in a while to reset the time error) was known and should have been implemented. Oops (twice).

      The only reason the Patriot worked at all was that most operators had EE degrees and could beat the thing into submission.

      The Patriot was, at best, a propaganda weapon. Postel got screwed when he tried to point this out.

    36. Re:Poor QA by Jeppe+Salvesen · · Score: 2, Insightful

      I think the guy has a point (altough he's being a bit nationalistic about it): Transparancy is key in order to learn from mistakes. You can say many different things about the US of A, but the US of A is good at open hearings.

      --

      Stop the brainwash

    37. Re:Poor QA by slifox · · Score: 1

      If you compare the percentage of poorly-written perl code (versus well-written code) to the percentage of poorly-written C/C++ code, I bet you won't find a statistically significant difference.

      Perl just makes it really easy to publish a module in a centralized location (CPAN), whereas C/C++ code is spread all over the place.
      Just because it's in CPAN, doesn't mean it's quality, nor that it's been tested and is production-worthy.

      Please don't judge perl based on some bad code you've read. It takes a good programmer to write good code -- perl just is less strict about how you must write code, and so the programmers must keep themselves to a proper set of standards. A good programmer can utilize perl's flexibility to produce some very simple and powerful code, without sacrificing quality or maintainability.

    38. Re:Poor QA by noidentity · · Score: 3, Interesting

      Even fewer people realise how compiler and hardware dependent they can be. For example, if you do a sequence of floating point operations on x86 then the values will stay in 80-bit registers until they are stored out to a variable. If you compile the same code for a newer machine with SSE or for another architecture then you will get 32-bit operations on your 32-bit floats and so you'll have less precision. A lot of compilers will even generate different precision between debug and release builds.

      I ran into this when someone was using my library with DirectX. I was initializing a filter kernel and using double-precision calculations, but apparently DirectX put the processor in single-precision mode, so all my double-precision calculations weren't done as such. Same compiled code, just a run-time difference. I took the opportunity to improve the algorithm to work even with single-precision floats, which was probably good to do anyway.

    39. Re:Poor QA by Anonymous Coward · · Score: 0

      It is easy to forget that the Patriot Missile system was not designed to be an anti-missile system at all. It was designed to intercept aircraft.

      It was shoe horned into this new role in the Gulf.

    40. Re:Poor QA by SpinyNorman · · Score: 3, Informative

      Someone posted the actual GAO report on this, which makes a bit more sense than the gibberish TechRadar arcticle.

      http://www.fas.org/spp/starwars/gao/im92026.htm

      The way the system is sure it's tracking the target it was given is by predicting where it should be seen next based on speed and diretion, and then only looking for it in a window ("range gate") around that predicted position. The window is a point in space-time and therefore has time coordinates as well as space coordinates, and the problem was that the Patriot system apparently used absolute time since power on to specify the time coordinate, hence the error accumulation. The problem could have been avoided simply by using a time coordinate relative to the last tracked postion rather than an absolute one.

      The GAO report also blames the 24 bit registers of the 1970's era hardware as limiting accuracy which is just garbage. A good excuse to a politician perhaps, but there was nothing stopping them from using a 64 bit, or whatever, math library if that would have helped.

      Of course the Patriot was being used outside of it's original requirements spec when being used to target SCUDs, so it seems someone really screwed up in not reviewing the design beforehand and determining it's limitations (and fixing them) rather than finding out after the fact when 28 people are dead as a result.

    41. Re:Poor QA by Anonymous Coward · · Score: 0

      Seriously, what programmer has not heard of floating point errors?

      You'd be surprised. I've worked with people who didn't understand why you can't just use == to check for equality between two floating point numbers... I'll post anonymously to protect my employer's reputation.

    42. Re:Poor QA by Hal_Porter · · Score: 2, Informative

      I think the guy has a point (altough he's being a bit nationalistic about it)

      I'm actually English and I live in Taiwan. I've got no plans to ever live in the US, so it's not really about nationalism.

      --
      echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
    43. Re:Poor QA by Anonymous Coward · · Score: 0

      If you are unsure about something, the best way to get an answer in an internet discussion is to make a bold claim. Someone will correct you if they have grounds to believe you are wrong. In this case, no one has yet come up with a better example outside the US.

    44. Re:Poor QA by Anonymous Coward · · Score: 0

      I tried reading the article and all i can say is why do i get the feeling high level programmers are underpaid.

    45. Re:Poor QA by Antique+Geekmeister · · Score: 1

      Whoa, slow down. I judge Perl, as a language, as teaching poor practices because there are _so many_ ways to do the most simple tasks, and because it tacitly encourages local rewriting of modules that are then woven into other people's modules. For examples, look at the dozens and dozens of modules at CPAN for handling time. Oh, dear, many of those are pitiful, and it's very difficult to decommission them. Bad C and C++ code tends not to _propagate_ this way.

      The result is an incredible waste of time when someone like me has to go clean up the debris. Perl is an extremely powerful scripting language: I wish that bash had a fraction of its flexibility and string handling capability, or that C was remotely as easy to write and test a small module with or to load and review modules.

    46. Re:Poor QA by lq_x_pl · · Score: 1

      Rockets and Mortars? Meet the MTHEL. :-)

      --
      An internal system operation returned the error "The operation completed successfully.".
    47. Re:Poor QA by Jeremi · · Score: 5, Insightful

      The computer did exactly as instructed, it's just that the pilot's (unintentionally given) instructions were stupid, and the fact that it took the pilot over 3 minutes to realize just how stupid he had been.

      Sounds like a user interface problem to me. Given the potential consequences of that particular user error, the fact that the autopilot was still engaged should have been made more obvious to the pilot. (e.g. when the plane computer sees that a struggle is going on between the autopilot and the manual controls, it should prompt a loud, un-maskable synthesized voice shouting "THE AUTOPILOT IS ENGAGED, YOU IDIOT!")

      --


      I don't care if it's 90,000 hectares. That lake was not my doing.
    48. Re:Poor QA by tuxicle · · Score: 1, Flamebait

      Most unfortunately, the only solution for those rocket attacks is preventing they're fired in the first place. Which obviously requires either palestinians police their own terrorists, or someone does it for them (that's called "occupation").

      This is exactly what's wrong with you and most of the Israeli government. First, you call it "unfortunate" that the only solution is to prevent the rockets being fired. Bit of a clue there. In any case, the only way that can be done in any permanent manner is to not give Hezbollah any reason to fire rockets in the first place. Not "occupation".

    49. Re:Poor QA by Hal_Porter · · Score: 2, Informative

      I don't think any other government would do this - mistakes in the military would just get covered up as state secrets and anyone who tried to talk about them would get locked up or worse.

      Eh. Forgive me, but do you have any basis whatsoever for this claim, or are you just being arrogant?

      In the UK people have been locked up for breaching the official secrets act. Fair enough you may say, but many of them seem to have been guilty more of embarrassing the government than releasing information which hurt national security.

      http://news.bbc.co.uk/2/hi/uk_news/216868.stm

      Now the UK is not particularly bad at this sort of thing. In far less free societies like China people have been executed because they "might" have commented on the health of senior leaders and quoted information which was publicly available. In fact most of the charges against them are never even released -

      http://fairuse.100webcustomers.com/itsonlyfair/latimes0243.html

      The nonconfidential version of the verdict released to the family March 24 reveals only two of eight "top secret" charges, any of which could result in the death penalty. One relates to charges from a witness that Wo "might" have intentionally passed on information about the health of senior leaders to Taiwan, Chen and Michael Rolufs said.

      A second alleges that Wo collected technical information on missiles for the Taiwanese. The other six such charges were not revealed. The verdict also claims Wo received $400,000 from Taiwan.

      Chen seriously doubts that Wo had access to confidential information on senior leaders' health status and notes that the verdict's use of "might" suggests a lack of certainty. On the more serious charge of obtaining technical information on Chinese missiles, the verdict suggests Wo got information from magazines. But these were all from a publicly accessible library, Chen said.

      --
      echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
    50. Re:Poor QA by Anonymous Coward · · Score: 4, Insightful

      So in a system that should have clocks synchronized to less than a microsecond nobody bothered to run "ntpdate" even once in hundred days ?

      Do you want to be the one to explain to the generals why their stand-alone, truck-based mobile air protection system needs a hard-line network connection to work?

      The real idiocy is here:

      Unfortunately, 0.1 seconds cannot be expressed accurately as a binary number, so when it's shoehorned into a 24-bit register

      Taken charitably, the article writer has oversimplified to the point of obscuring the point. It's perfectly possible to represent a 0.1-second tick in a 24-bit register. There's an overflow about once every 19 days. The problem is doing calculations *with* that number, and that takes knowing what the hell you're doing. Given the problem the system designers were trying to solve with Patriot, this should not have been a problem.

      And surely the military has better clock synch than a stupid home pc ?

      You'd be surprised how hard clock accuracy is to get right, *especially* under military conditions. A drift of 0.3433 seconds over 100 hours works out as an accuracy of 1 part in a million, give or take. Besides, the problem here wasn't clock drift, so it's a irrelevant.

    51. Re:Poor QA by Anonymous Coward · · Score: 1, Insightful

      nobody bothered to run "ntpdate" even once in hundred days?

      If I understand this correctly, running ntpdate or something similar would not have helped - the data type used to store the time since system power up is floating point, and the smallest representable unit just gets bigger with every second the system is up. After 100 days, thigs apparently get so bad that the best you can get is 0.34s, and I suspect that the error would approach 1s if you let the system just sit there for a year. I suspect that at some point, the system clock would just (appear to) stop, because the increments are below the representable precision.

    52. Re:Poor QA by owlstead · · Score: 1

      The error was something that should have been caught by the system designers. I'm in no doubt that the Patriot system is a large project, and the designers could have been more careful - even taken the limited initial target.

      Computer scientists are required when creating a floating point API. Computer Scientists are those whom this article was for.

      The developers (programming is only a part of a development process) are those who should have used the API created by the Computer Scientists, using the specs created by the designers.

      In extreme cases one person can do multiple roles of course, but for larger projects it might be inadvisable; you'll need persons that are professionals in their particular field.

      That said, my role has been designer/developer for 8 years, and I agree that I'm severely underpaid.

    53. Re:Poor QA by quickOnTheUptake · · Score: 1, Insightful
      No he didn't. Had you finished the article you might have seen these lines:

      But all of today's computers are universal computing machines, which means that they can solve any problem involving logic and maths.
      So if a processor's internal instructions can't operate on large enough integers or on floating point numbers with sufficient precision, it's always possible for the programmer to implement arithmetic routines that will.

      So computers might suck at maths, but there's always a solution available to circumvent their inherent weaknesses. And in that case, it's probably more accurate to say that computer programmers suck at maths – or at least some of them do.

      --
      Mod points: Guaranteed to remove your sense of humor.
      Side effects may include gullibility and temporary retardation
    54. Re:Poor QA by khallow · · Score: 1

      No one expected a Patriot in air defense mode to stay stationary for 10 hours let alone 100.

      The point is that they should have. This isn't a matter of hindsight, but of easy to anticipate operation modes.

    55. Re:Poor QA by russotto · · Score: 1

      It's pretty pathetic and negligent that software that controls explosive missles was not tested for over 100 hours of operation.

      Said missiles were possibly not intended (or specified) for 100 hours of continuous operation.

    56. Re:Poor QA by russotto · · Score: 1, Insightful

      In any case, the only way that can be done in any permanent manner is to not give Hezbollah any reason to fire rockets in the first place. Not "occupation".

      Yeah, like Hezbollah needs a reason to fire rockets. Not much the Israelis can do -- besides cease to exist -- to eliminate reasons for Hezbollah to fire rockets at them.

    57. Re:Poor QA by Anonymous Coward · · Score: 0

      While poor QA plays a part, this is a case of computers begin bad at TIME not math. Computer suck at dealing with time in ALL cases, and you can actually prove that this is true.

      This is a case of POOR engineering. There are very good ways of dealing with time now for this kind of computer hardware. Devices like FPGAs can be used to eliminate this kind of error.

    58. Re:Poor QA by russotto · · Score: 1

      We had a similar problem with an Aegis design, and it was a major headache for us Hardware engineers to try to convince the Systems Engineers that counting in Binary time was more logical than counting in 0.1 second increments. The SEs kept insisting that their computers at home accurately count in seconds and we hardware engineers should be able too.

      There's nothing wrong with 0.1 second increments, so long as they're represented as an integer number of deciseconds (or centiseconds or milliseconds) -- just like those computers back home.

    59. Re:Poor QA by Anonymous Coward · · Score: 0

      Unfortunately, 0.1 seconds cannot be expressed accurately as a binary number, so when it's shoehorned into a 24-bit register - as used in the Patriot system - it's out by a tiny amount.

      This statement is so vague as to be essentially meaningless.

      Of course 0.1 seconds can be expressed accurately as a binary number.

    60. Re:Poor QA by Alef · · Score: 1

      You want examples. By all means, here are a few incident reports I came up with after a quick search. Unfortunately they are all written in swedish. More are available at this site.

      I'm not saying the US is bad at transparency, but the assumption that no other country is is simply wrong.

    61. Re:Poor QA by Anonymous Coward · · Score: 0

      He may be arrogant, but that doesn't make him wrong.

    62. Re:Poor QA by Arthur+Grumbine · · Score: 1

      And we finally have commodore64_love's true identity. Yoda!! Of course, we should have seen it all along, you have to be ancient to still be in love with the Commodore 64. I keed, I keed...

      --
      Now that I think about it, I'm pretty sure everything I just said is completely wrong.
    63. Re:Poor QA by Skeptic+Al · · Score: 1

      Sorry, that's idealistic crap. Engineering is all about making tradeoffs based on the intended usage. Over-engineering costs time and money. Most of the time, there is no way to design for every IMAGINABLE use case.

      For example, there is no way to build a completely earthquake-proof building. You might be able to design one (if you knew for certain that there is an upper bound for the magnitude of an earthquake), but no one would have enough money to build it.

      By taking your comment to the extreme, one should be able to use pebbles fired from slingshots to shoot down ballistic missiles. Why hasn't anyone IMAGINED doing that?

    64. Re:Poor QA by Anonymous Coward · · Score: 0

      Hindsight is almost 20/20. Except that the original purpose of the Patriot was to shoot down much slower aircraft, flying parallel to the earth, not ballistic missles. This new use for Patriot was essentially experimental and had had been rushed to war - and in war you run into alot of unexpexcted circumstances. For example, conventional doctrine in the 1980's required Patriots to move constantly on the battlefield to avoid air attack. The clock would then reset when repositioned. No one expected a Patriot in air defense mode to stay stationary for 10 hours let alone 100. But in a missle defense role they did. There is a good GAO report on this.

      Nice to see that someone remembered about the drastic change in the role of the Patriot system from a mobile system to a stationary system and pointed it out.

      People tend to forget that things on battlefields tend to get the shit shot out of them if they sit still for very long (well to be truthful they get the shit shot out of them if they move too much as well.)

    65. Re:Poor QA by maxume · · Score: 1

      Most clocks have a quartz crystal in them. Especially the ones that have battery backup (I can't find an alarm clock in my house that doesn't have a spot for a battery...).

      --
      Nerd rage is the funniest rage.
    66. Re:Poor QA by Tacvek · · Score: 4, Informative

      Yes. The issue here sounds like they had a system clock counter that was an integer, that counted the number of 0.1 second clock ticks. Then they wanted to convert this to a floating point number in 24 bit IEEE format, They simply multiplied 0.1 by the integer in the register. Of course, that still sounds like too large an error top have occured from just that, but lets pretend it did.

      There are several issues here. For missiles travelling at such speeds, using a system clock counter based on 0.1 second ticks sounds terribly coarse to me. Second, since 0.1 seconds are the baseline resolution of the system, the system should have been using floating point numbers where '1' corresponds to a decisecond rather than a second. Then the time counter would be exactly expressible in the floating point format.

      Lastly, if the floating point format really needed to be in units of seconds, rather than deciseconds, the time counter should have been loaded in, having an exact representation, then it should be divided by 10, which has an exact representation. This is all prety basic to anybody who has even a limited understanding of floating point. If you understand the inherent precision of every operation even better than I do, even more improvements would be possible.

      But to be honest, I'm not sure why floating point was used at all here. It sounds to me like fixed point may have worked just fine for most of these problems. (Of course, fixed point has its own set of rules ensuring maximal accuracy. )

      --
      Stylish sheet to fix many problems in Slashdot's D3: https://gist.github.com/801524
    67. Re:Poor QA by am+2k · · Score: 1

      Agreed, although it's still a human error, since the user interface designer is human, too (supposedly).

    68. Re:Poor QA by Anonymous Coward · · Score: 0

      the only solution for those rocket attacks is preventing they're fired in the first place.

      Correct.

      Which obviously requires either palestinians police their own terrorists, or someone does it for them (that's called "occupation").

      You missed the third option, which is for the motivation behind the firing of rockets to be removed.

      You know, remove all those illegal settlements, dismantle the wall that has effectively stolen the real estate property of thousands of innocent civilians, and allow refugees to return to their homes.

      But I suppose it's easier just to send the tanks in again, and then wonder why the rockets keep on coming.

    69. Re:Poor QA by Anonymous Coward · · Score: 1

      Actually, I call horse poo on your comment. I believe that you are in the minority of people who pay particular attention to these details, and actually understands the interaction between hardware and software. Bravo!

      As a CompSci and EE grad myself, who had many years TA'd and tutored CompSci, CompEng and EE courses in systems software, I found that the students on the engineering side of the fence much less caring about the quality of work and thoughtfulness which went into their assignments. A lot of them suffered from logic errors due to careless operand comparisons, careless typecasing to "make it work" and an overall lack of clean structure.

      The reason behind this? A lot of engineering students thought that "I don't care to write code - that's not why I am here... I will have a programmer write it for me." A definitely wrong, and dangerous attitude.

    70. Re:Poor QA by Anonymous Coward · · Score: 0

      My bad. I failed to imagine how stupid some slashdot readers are! I'm learning.

      The fact that the hardware in question could handle this situation if programmed correctly lays the blame 100% on the designers who utterly failed to see the potential uses of their product and by their oversight (or negligence if you wish) failed to accomplish their goal.

      The imagination in question was not a huge leap. This was (I believe) an arbitrary criteria put on the device of operating while stationary for only a short time. The technology to allow for longer durations was only a half thought away. Indeed, I'm surprised they didn't stumble into this while just testing the damned thing.

      As for your other comments: OF COURSE people can make a 100% earthquake proof buildings, given, as you pointed out, an upper bound for magnitude, and OF COURSE it can be afforded. The fact that people don't is not an indication on failure of imagination, it's an indication of risk management. (Thankfully, we know the builders of the Patriot missile system did not think like this.)

      As for slingshots, you idiot, you just imagined it! Now build it.

    71. Re:Poor QA by Eli+Gottlieb · · Score: 1

      I'm sorry, when did a story about computers sucking at math become another Palestine thread?

    72. Re:Poor QA by Eli+Gottlieb · · Score: 1, Informative

      You do realize that Israel hasn't been occupying a square inch of Lebanon and therefore Hizballah doesn't actually have any reason to fire rockets other than "hey let's kill some goddamn Jews" (which is their stated reason for it)?

    73. Re:Poor QA by Entropius · · Score: 2, Insightful

      The US's "vast military superiority over everyone else on the planet" is due to us spending an equally vast amount of cash on our military.

    74. Re:Poor QA by OeLeWaPpErKe · · Score: 4, Insightful

      You missed the third option, which is for the motivation behind the firing of rockets to be removed.

      http://www.youtube.com/watch?v=iNrCMdFoZqQ

      So who do we allow to settle there ?

      The "kingdom of Egypt" (the state of the Farao's) ? (exterminated to the last man by muslims)
      The Hittite Emptre ? (exterminated by the Greeks, Romans, Persians)
      The kingdom of Israel ?
      The Assyrian Empire ?

      Which of these do we restore ? (note that the palestinians, or to be more exact, the arabs only come into play about 4500 years after the Assyrian Empire)

      Which do we restore ? And why do they have more rights than all the others who conquered that piece of land ?

      Note the obvious truth : the Jews controlled Israel about 4300 years before the arabs even left their tiny province ...

      What if some Greek starts firing rockets at the Arabs ? Will you tell them to leave ? He has at least as much right to Israel as they do ? What if the Jews start firing rockets into Jordan (territory that was part of the kingdom of Israel) ?

      And of course, you shouldn't count out yourself. You're an Indo-European living in America. It seems hypocritical in the extreme to tell others to leave conquered lands. Your province of origin is northwestern Iran, every other place on this earth indoeuropeans live (including Europe), is obviously conquered from someone else.

      So when will you give the good example ?

    75. Re:Poor QA by Pikkebaas · · Score: 1

      Hezbollah are Palestinian?

    76. Re:Poor QA by 93+Escort+Wagon · · Score: 1

      Congratulations - I think that was the single most annoying Slashdot post I've ever tried to read.

      --
      #DeleteChrome
    77. Re:Poor QA by Sir_Lewk · · Score: 1

      Hezbollah are Lebonese, Hamas are Palestinian.

      But honestly, they are both assholes so practical differences for someone not concerned about the intricacies of thier politics are irrelevant.

      --
      "linux is just DOS with a UNIX like syntax" -- Galactic Dominator (944134)
    78. Re:Poor QA by Alef · · Score: 1

      Apologies. I started out trying to make a bullet list, but Slashdot's filter wouldn't let me submit it.

    79. Re:Poor QA by 19thNervousBreakdown · · Score: 2, Informative

      Yup. This, combined with the parent's note about high-resolution timing shouldn't have even gotten past the first programmer to write the code. The instant they wrote a line of code that depended on timekeeping that precise, there should have been a review of the time system, or rather before, that should have been thought of in the design phase. And as for floating-point errors, any programmer that isn't aware of those issues needs to be writing ... fuck, I don't even know. Something that doesn't use floating-point numbers I guess. Why the Christ they were repeatedly adding floating-point numbers is beyond me, 99.1% of the time it's possible to either do a direct calculation or do a "resync" of some sort, and for the 0.899999998% of the rest of the time, you can use an arbitrary-precision number library, or a rational number library, or (very) carefully look at the rounding algorithm, or any number of other ways around the issue.

      Anyway, this should have been caught at pretty much every layer, and whoever missed it shouldn't be in the business. Blech.

      --
      <xml><I><am><so><damn>Web 2.0</damn></so></am></I></xml>
    80. Re:Poor QA by Jeremy+Erwin · · Score: 2, Insightful

      Are you sure that the computer was even capable of IEEE floating point? wikipedia suggests that the computer used a 24 bit word.

      Although IEEE 854 float uses a 23 bit mantissa, 8 bit exponent and a sign bit, the wcc might well have used a proprietary scheme.

    81. Re:Poor QA by Neoprofin · · Score: 3, Informative

      One of the other results (the first one that comes up for me actually) claims that in testimony presented to Congress Postol's methodology was called out as flawed based on the fact that three or eight Patriots were launched at every incoming missle and his video analysis is done per interceptor fired completely ignoring the massive odds against more than one interceptor making a hit. The Isreali's independent analysis puts the success rate at 50%.

    82. Re:Poor QA by Ironsides · · Score: 1

      So in a system that should have clocks synchronized to less than a microsecond nobody bothered to run "ntpdate" even once in hundred days ? And surely the military has better clock synch than a stupid home pc ? This is stupidity, also known as "human error", causing those deaths. It's a case of "the correct answer to the wrong question".

      It was 100 hours and this was back in the first gulf war. I'm also pretty sure the computers involved didn't have ntpdate. The incident being referenced was in 1991. Unfortunately, the software fix wasn't available until the next day.

      --
      Fly me to the moon Let me sing among those stars Let me see what spring is like On jupiter and mars
    83. Re:Poor QA by iamacat · · Score: 1

      You don't think the government can splurge somewhat more than $200 for a key targeting component of a missile defense system?

    84. Re:Poor QA by The+Master+Control+P · · Score: 1

      (1/10)^n for integer n is irrational in base 2 and the truncation was unavoidable.

      The real WTF is that they used a 24 bit mantissa where a one part per million error would be fatal, particularly where truncation would accumulate at a linear rate. We've known since the days when "calculator" meant someone doing menial math to carry "somewhat more than twice as many digits as desired in the final result."

    85. Re:Poor QA by danlip · · Score: 4, Insightful

      The computer did exactly as instructed, it's just that the pilot's (unintentionally given) instructions were stupid, and the fact that it took the pilot over 3 minutes to realize just how stupid he had been.

      Sounds like a user interface problem to me. Given the potential consequences of that particular user error, the fact that the autopilot was still engaged should have been made more obvious to the pilot. (e.g. when the plane computer sees that a struggle is going on between the autopilot and the manual controls, it should prompt a loud, un-maskable synthesized voice shouting "THE AUTOPILOT IS ENGAGED, YOU IDIOT!")

      Or if the pilot is pushing hard on the stick the autopilot should disengage (with loud alarms).
      If I tap on the breaks in my car the cruise control disengages, it does not fight me.
      - Dan

    86. Re:Poor QA by Anonymous Coward · · Score: 0

      Or if you have a lot of time and insomnia read volume II of Knuth on Floating Point, or everything you always wanted to know about floating point
      but were afraid to ask. (Of course Knuth is at to fundamental a level for todays programmer)

    87. Re:Poor QA by andereandre · · Score: 1

      well as you say, it was used out of spec. So I imagine they could deploy and hope it would work or not deploy at all. With the latter option the lives would be lost anyway. Not saying they could not have done better (but who is "they"). If I remember correctly there was a lot of phooha around those Patriot batteries, my country (NL) sent one to Israel. That's politicians and their following media making a stance and not be concerned about facts.

    88. Re:Poor QA by Anonymous Coward · · Score: 0

      Actually, all archaeological evidence points to the fact that the Jews are Palestinians. There is no difference between ancient Jewish settlements and Canaanite settlements, even down to the presence of idols, save that the Jewish settlements are smaller and not as cosmopolitan as cities like Jericho. There is no evidence that supports the notion that all of the Jews left Palestine to go to Egypt either. Your claim again?

    89. Re:Poor QA by PPH · · Score: 1

      Its way too late to solve problems like this with QA. Its a system architecture problem. Time base accuracy, drift and subsequent errors were never addressed in the preliminary design. Had they been, some method of ensuring clock synchronization could have been employed. Either highly accurate time bases or some method of synchronizing system time. A study of the tradeoffs of each approach would be done and the best approach selected.

      The Patriot example appears to indicate that the issue was never even considered. You can write the best algorithms given some inputs. But if its not your job to look at the big picture, you've got no way of knowing whether those inputs are valid or just garbage. That requires attention at the 'big picture' level of the system. And that's a large problem in military programs, or vendors who have that compartmentalized mindset. For security reasons, everyone gets their own little piece of the problem to solve. And each little piece gets designed and built pretty well. But managing the system requirements and subsystem interfaces leaves something to be desired. Its supposed to be the responsibility of the general contractor. But too many outfits performing this function want to broker the design work rather than actually do it.

      --
      Have gnu, will travel.
    90. Re:Poor QA by bidule · · Score: 1

      Invasion and colonisation has been a big no-no for a few centuries. All the colonial empires have been dismantled since then. I don't consider living in the past a sane thing, why should you?

      If you knew more about the Middle Ages, maybe you'd understand that the population did not move so much as changed religion over the centuries.

      So in a way, this is a war between returning exiles and their old neighbors who stayed behind.

      Sometimes I'd wish the whole region was nuked down to a glass surface, that'd take care of the problem. OTOH, look at how the Northern Ireland fester is resorbing itself.

      --
      ID: the nose did not occur naturally, how would we wear glasses otherwise? (apologies to Voltaire)
    91. Re:Poor QA by Anonymous Coward · · Score: 0

      Yeshua (Jesus to folks unaware of Maschiach's Jewish heritage) will be coming to answer your question. There is a book that already answers this question but all too many people mock and doubt it, and deny the existence of Adonai ("god").

      All in all it's going to be exciting to watch. Glorious for many, but horrible and terrifying for many, many more.

      For what it's worth, I think Israel will be ultimately restored, and it will occupy the whole land between the tigris and euphrates and the Red Sea. up to the mountains in the north, and southward to the sea.

    92. Re:Poor QA by Anonymous Coward · · Score: 0

      There's a Native American headed to your front porch. He wants to kick your balls in, take your wallet and burn your house down. Don't bother calling the police, as it's his land now - GTFO. Who gives a FUCK who was there 4300 years ago?

    93. Re:Poor QA by 93+Escort+Wagon · · Score: 1

      My comment was made somewhat tongue-in-cheek, so I hope you realize I was sort of annoyed yet laughing at the same time. It's not like you were trying to make things difficult!

      I understand why Slashdot has to list the domain after each link, given what a few people on here insist on linking to... but it did make reading your post almost impossible. It was a bit like trying to solve a puzzle, or interpreting a hidden message using one of those kid's decoder rings.

      --
      #DeleteChrome
    94. Re:Poor QA by Anonymous Coward · · Score: 0

      The "kingdom of Egypt" (the state of the Farao's) ? (exterminated to the last man by muslims)

      That's quite a claim there, I have no idea how you've been modded up...

      Needless to say; citation please.

    95. Re:Poor QA by OeLeWaPpErKe · · Score: 1

      Great to hear your idea of justice being retreating from conquered lands. When do you leave for Iran ? (that's were indo europeans come from)

      One thing's for sure : you won't be posting anything from there, so I guess we'll know.

      And of course, if you're not willing to do this, how can you expect others to ??? Perhaps indians should start firing rockets at your family. That seems to be the way to acquire "justice". Would you find that acceptable ? Indians attempting to eradicate you ? Would you leave ?

    96. Re:Poor QA by itsybitsy · · Score: 1

      Well any calculation of how much time and therefor how much money has been spent in research and development for accurate time systems or systems with failed time systems must itself rely upon an accurate count of time and given that the Patriot system failed in this regard and failed worse and worse as time went marching forward it's likely and probable that any such calculations won't be accurate and will be less accurate as time goes on as it always seems to.

      As for a quality product, it's no worse than Windows, simply reboot it in the lulls between missile launches and it'll be more accurate.

      [:)]

    97. Re:Poor QA by John+Hasler · · Score: 1

      > Did they have another system to put in place?

      No. That's why they deployed it, despite the fact that the antimissile software was still beta. It shot down some Scuds, which is better than the alternative: nothing. It also would have come in handy had Saddam surprised the hell out of everyone and managed to pull off a bomber attack.

      --
      Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
    98. Re:Poor QA by Ihmhi · · Score: 1

      If the Patriot Missile System was running Windows, there would not have been a problem.

      "Your missile defense system was recently updated!"

    99. Re:Poor QA by benjamindees · · Score: 1

      Second, since 0.1 seconds are the baseline resolution of the system, the system should have been using floating point numbers where '1' corresponds to a decisecond rather than a second.

      Integer you mean?

      --
      "I assumed blithely that there were no elves out there in the darkness"
    100. Re:Poor QA by John+Hasler · · Score: 1

      A requirement which would add decades to development time without necessarily eliminating bugs such as this one.

      --
      Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
    101. Re:Poor QA by Jane+Q.+Public · · Score: 1

      Everybody here is blaming the programmers when the Patriot issue was a hardware, not a software issue! The error of ~ 0.3 seconds was not due to software rounding or anything of the sort; it was accumulated error due to imprecision in the hardware clock itself.

      It is exactly the same situation as if the clock on your wall is adjusted improperly and gains or loses a few seconds or so a day. Software has nothing to do with the imprecision of your wall clock... and in exactly the same manner, software has nothing to do with accumulated error in a clock register.

      A more precise hardware clock should have been used, and a larger clock register. But again those are hardware issues and have nothing whatever to do with software. Don't blame programmers in this case. It was a clear case of using hardware that was inappropriate for the job.

    102. Re:Poor QA by Anonymous Coward · · Score: 0

      what about the philistines they exterminated in the name of god to take control of the promised land?

    103. Re:Poor QA by postbigbang · · Score: 1

      Obligatory: Yeah, that's the Microsoft approach.

      Your point #1 seems to be different than history is portrayed.

      Your point #2 also seems to lack evidence, as well.

      Your point #3 flies in the face of 28 people being killed in the citation article. It also betrays upthread QA problems also noted. If this is a realtime system, then it's realtime in a parallel universe. The only saving grace is that as mentioned, there've been some 17 years to fix it. It ought to be able to be run by your average mobile phone. Who knows..... maybe google maps + a side button.

      --
      ---- Teach Peace. It's Cheaper Than War.
    104. Re:Poor QA by Anonymous Coward · · Score: 0

      Probably you are an Israeli Terrorist, specialized in robber the Palestinians Land, only Zionists and their media calls the poor Palestinian people as terrorists.

    105. Re:Poor QA by Jane+Q.+Public · · Score: 1

      Again someone suggests using a software solution (64-bit math library) when the problem was clearly in the hardware.

      A 24-bit clock register is lightweight indeed. If we presume a 0.1-second clock rate (as mentioned in the article), the register would overflow in less than 20 days. I don't know how long they kept these powered up but 20 days is not a very long time.

      But the main issue here was the imprecision of the clock itself. Math libraries had nothing to do with it... the clock rate was simply not precise enough, and that error accumulated. No amount of software massaging is going to correct that, unless you synch with an external clock somewhere.

      They simply chose inappropriate hardware for the job: (1) The hardware clock was simply off. It did not keep time to the requisite degree of accuracy. (2) The resolution of the clock should have been much better than merely 0.1 second. And (3) the clock register was indeed too small to make (1) and (2) practical.

    106. Re:Poor QA by OeLeWaPpErKe · · Score: 2, Insightful

      If you knew more about the Middle Ages, maybe you'd understand that the population did not move so much as changed religion over the centuries.

      No offence, but you really should read a bit about Arab history, and pay attention to just how much ethnic cleansings these people comitted. The population of Europe, you are correct, did indeed merely change religion ("mostly", as there was certainly no shortage of armed conflicts, though they declined over time. Slowly). The population of the middle east was eradicated, several times in fact. Everywhere, muslims have always created conflicts along ethnic lines, even with "fellow muslims" (google "Sudan" or "Darfur", and note just how racist any brotherhood islam supposedly provides really is. And to tell the truth, just walk into a European city and look for a few Turks and a few Moroccans, and note how much they like eachother. See for yourself).

      After researching arab/islamic history, any reasonable person would seriously ask himself what exactly is so terribly remarkable about this German guy from WWII (and don't google "aymin al-husseini", it will not improve your view of these people).

    107. Re:Poor QA by Jane+Q.+Public · · Score: 0, Troll

      They didn't. Puhleeeeeze read the article, people. Even though the article itself appears to (inappropriately) blame the programmers, the timing error was due to a clock that did not keep accurate time! It was NOT a rounding error or floating-point error. The problem was in the design of the system.

    108. Re:Poor QA by Lars+T. · · Score: 1

      Be fair, if even the guys starting the missiles don't know where the things will come down, how should the space laser?

      --

      Lars T.

      To the guy who modded me down from perfect to terrible Karma - Apple haters still suck

    109. Re:Poor QA by Anonymous Coward · · Score: 0

      So weak explanation, the fact is the Jews are robbering the Palestinian land.

    110. Re:Poor QA by commodore64_love · · Score: 1

      Don't hate me just because I only see the web in 16 colors.

      And yes my resume may be pixelated, but the content is still there.

      Besides the Commodore=64 makes one hell of a game machine. It's like an NES, but since somewhere around 5000 games were released for it, I could spend the rest of my life before I play them all. And they are all free. :-)

      --
      "I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
    111. Re:Poor QA by wiredlogic · · Score: 1

      Everybody likes to slam the lack of success with the PAC1 missiles during Desert Storm but nobody seems to want to remember that they were never designed for that application in the first place. They were anti-aircraft SAMs which were rushed into the theater and hurriedly modified to have something that could be thrown up against the SCUD threat. It's a wonder that they worked as well as they did. This timing flaw is significant but it wouldn't have been as easily caught with the slower targets it was designed for.

      --
      I am becoming gerund, destroyer of verbs.
    112. Re:Poor QA by Anonymous Coward · · Score: 0

      Years ago, Control Data Corp. used 60 bit words, on computers that actually worked in decimal. (The internal software did all the adjustments to the mathematically challenged binary).

      We only have this problem, because the current standard is the X86 processor, and poorly designed and executed firmware and math engines.

      The problem is fixable, but, not on the X86 as designed.

      Years ago, I worked on a very expensive system that did not have math processors (really expensive, believe me) The math functions were all written by programmers. There were no error factors. The double precision, floating point and other numbers were correct, and the programs checked to make sure. But, that was because of the standards of quality, not the 'what we can get away with' attitude.

      Since everyone is willing to accept it, that is what they sell.

      Don't blame the company, either the maker of processor or the software. Blame the market for looking at all the geegaws and trinkets, but not the meat an potatoes. This problem has been known for decades. Everyone that knows, ignores it. Everyone that doesn't is not smart enough to ask.

    113. Re:Poor QA by budgenator · · Score: 1

      High-resolution timekeeping has been accomplished pretty successfully already...

      I wonder how much time and money was spent in research and development for this thing

      If memory serves me correctly the patriot (the missile portion of the system) use a 80186 CPU so I'm not too sure if the algorithms for Hi-Res time will even run on it, of course my knowledge of it comes from back when even the phrase "phase arrayed antenna" was classified and we called it SAM-D. The some of the birds it replaced used electron tubes.

      --
      Apocalypse Cancelled, Sorry, No Ticket Refunds
    114. Re:Poor QA by Anonymous Coward · · Score: 0

      Did YOU read the article..?

      It states, rightly or wrongly, that the system did not keep an accurate tick count BECAUSE of rounding errors in adding 1/10 second ticks to a register.

    115. Re:Poor QA by Anonymous Coward · · Score: 0

      You forgot Egypt. The Jews lived there for a long time as slaves, why do not conquer the Egypt as your people only land? Your people live in US for a long time, why not conquer the US as your only land?
      The fact is simple you are cowards, only conquer Palestinian people because they are not armed like your people, you are cowards because Hezbollah expulses your people from the south Lebanon. You are cowards because you are afraid of death, and Hezbollah fights no matter if they will be death.

    116. Re:Poor QA by budgenator · · Score: 1

      So in a system that should have clocks synchronized to less than a microsecond nobody bothered to run "ntpdate" even once in hundred days ? And surely the military has better clock synch than a stupid home pc ? This is stupidity, also known as "human error", causing those deaths. It's a case of "the correct answer to the wrong question".

      First the system predates the internet by a decade or two so no NTP, and secondly that part of the system doesn't care what the time is, it's only interested in an measured interval.

      --
      Apocalypse Cancelled, Sorry, No Ticket Refunds
    117. Re:Poor QA by Anonymous Coward · · Score: 0

      You missed the third option, which is for the motivation behind the firing of rockets to be removed.

      http://www.youtube.com/watch?v=iNrCMdFoZqQ

      So who do we allow to settle there ?

      The "kingdom of Egypt" (the state of the Farao's) ? (exterminated to the last man by muslims)
      The Hittite Emptre ? (exterminated by the Greeks, Romans, Persians)
      The kingdom of Israel ?
      The Assyrian Empire ?

      Which of these do we restore ? (note that the palestinians, or to be more exact, the arabs only come into play about 4500 years after the Assyrian Empire)

      Which do we restore ? And why do they have more rights than all the others who conquered that piece of land ?

      Note the obvious truth : the Jews controlled Israel about 4300 years before the arabs even left their tiny province ...

      What if some Greek starts firing rockets at the Arabs ? Will you tell them to leave ? He has at least as much right to Israel as they do ? What if the Jews start firing rockets into Jordan (territory that was part of the kingdom of Israel) ?

      What are you talking about? If one power invades and occupies a country, and the rest of the world complained, the argument is "someone invaded this country a really long time ago, so we think it's okay if we have a go at it this time"? Or even "we owned this country a really really long time ago, so we think it's time to give it back"?

      History is history, we can't change that. But we can change what is being done now, in our time.

      And of course, you shouldn't count out yourself. You're an Indo-European living in America. It seems hypocritical in the extreme to tell others to leave conquered lands. Your province of origin is northwestern Iran, every other place on this earth indoeuropeans live (including Europe), is obviously conquered from someone else.

      So when will you give the good example ?

      Again, you're using history to defend current actions, as if that's going to make it ethically correct. It's like saying the Jews can wipe out the Palestinians from the face of the earth because some German guy tried to do the same with the Jews. What the hell kind of reasoning is that?

    118. Re:Poor QA by Quothz · · Score: 1

      The "kingdom of Egypt" (the state of the Farao's) ? (exterminated to the last man by muslims)

      Eh? The Kingdom of Egypt was Islamic. The Revolution of 1952 got rid of the notion of state religion, although of course the people of Egypt are still predominantly Muslims.

      Farouk abdicated after the coup, which was not about religion, but corruption and the popular feeling that he was a British puppet. It wasn't a bloodless uprising but it was short, the death toll was light, and AFAIK the only civilian casualties were police who fought in opposition. That revolution paved the way to the modern, relatively liberal Egypt we know today.

    119. Re:Poor QA by Garridan · · Score: 1

      The title of this story should be, "Why people suck at computers". Computers are perfectly good at math, and they can even do math perfectly... but you have to pick your problems carefully if you don't want to overrun the memory or spend ridiculous amounts of time in your computations. For example, if there were 8 or 16 ticks per second instead of 10, there would never be an overflow and your clock would operate perfectly (modulo overflows).

    120. Re:Poor QA by Lars+T. · · Score: 1

      There is also the problem that a downed missile presents. What is a "downed missile" ? Well it's a large collection of very-high speed pieces of metal that have been heated up by a large explosion that's about to crash into the ground. So far so good.

      So what is "the ground" in the case of a hizbullah or hamas missile launch ? Well it's the center of the city that's controlled by the terrorists. It's their human shields. Markets, schools, you name it. So a successfull missile intercept is reported in the press as "Israel fires a rocket into a palestinian kindergarten". That is, by the way, the literal truth, even if the rather important detail of a rocket's presence above said kindergarten is left out. In the deployed missile intercept installations "the ground" is chosen to be something else, like the ocean surface.

      Insightful? He is actually claiming that an anti-rocket-missile started by the Israelis would intercept the rocket within seconds after the start, instead of a few seconds before the impact.

      The real reason his beloved Iron Dome is useless is the very short flight time of the rockets.

      --

      Lars T.

      To the guy who modded me down from perfect to terrible Karma - Apple haters still suck

    121. Re:Poor QA by William+Stein · · Score: 1

      (1/10)^n for integer n is irrational in base 2 and the truncation was unavoidable.

      Whether or not a number is irrational does not depend on the base. The number (1/10)^n is rational in any base. By irrational, maybe you meant "finite decimal expansion"?

      Unrelated: The article starts with the example 599999999999999 - 599999999999998 = 0 in Google. Fortunately some software gives the correct result by default.

    122. Re:Poor QA by Anonymous Coward · · Score: 0

      You're an atypical engineer if you got both the architecture and the compiler stuff. But in the average case, the CS student is more likely to take a compiler class than an engineering student. In fact, the stuff you deride as "too high-level and abstract" is the real meat of a CS degree and turns out to be extremely important in compiler design. Please don't compare the worst of the CS grads to the best of the engineering grads, it's not really fair :)

      Or perhaps you're making the common /. mistake of comparing actual Computer Science people with mere programmers. Any computer scientist is also a programmer as a side effect, but not all programmers are computer scientists. It's entirely possible to get a programming cert without even getting as much real knowledge as a fourth semester CS student...

    123. Re:Poor QA by beelsebob · · Score: 1

      1) Yes it would eliminate this bug, this is pretty much exactly the kind of bug it'll find.
      2) No, it won't add decades, with some half good computer scientists it will add a small amount of time, mostly because of the amount of time saved testing and fixing bugs.
      3) Who cares if it adds time – it saves lives!

    124. Re:Poor QA by geoskd · · Score: 1

      Not everyone with a need for programming has a CS background and enough experience to be aware of all the potential problems. You'd hope that someone working on a missile system would have though.

      I would like to think that it would be a requirement... Furthermore, this would have been easily identified by a proper third party quality assurance program, but it would seem that lining Raytheons pockets was the most important goal of the Patriot system, not actually intercepting enemy rockets.

      In this case, I think that the compensations system for Raytheon was at fault. The contract should have included Raytheon only getting paid for those missiles that successfully hit their target. You would then find the contract bid process to be very telling indeed.

      -=Geoskd

      --
      I wish I had a good sig, but all the good ones are copyrighted
    125. Re:Poor QA by Jane+Q.+Public · · Score: 1

      Did YOU read the article..? It states, rightly or wrongly, that the system did not keep an accurate tick count BECAUSE of rounding errors in adding 1/10 second ticks to a register.

      It does so wrongly. That's not the way they work. The register is a simple counter, and it increments the count once per tick. There is no rounding involved. The ONLY rounding that occurs is when the tick count is later converted to decimal. So the error is simply in the tick count itself... which is a hardware, not software issue.

    126. Re:Poor QA by Lars+T. · · Score: 1

      2. The Patriot has a quite good record against SCUDs (after the software upgrades). Much better than the Soviet SA-2s did against B-52 raids in Vietnam.

      Let's pretend that that record isn't heavily contested, and that the SA-2 isn't 20 years older than the Patriot - The SCUDs didn't have massive ECM systems, while the B-52s had.

      --

      Lars T.

      To the guy who modded me down from perfect to terrible Karma - Apple haters still suck

    127. Re:Poor QA by Teancum · · Score: 1

      This wasn't even poor Q/A as I'm sure this software likely passed all of the necessary reviews.

      This is a failure to understand the problem domain and realize what the limits of software like this might be. The design specs on the clocks that were used in this situation likely had their operational parameters well defined by the manufacturer, but the engineer who built this system completely ignored that they would only be accurate for a narrow time windows.

      This is a specification error, and something that would not have been caught in a Q/A performance review. Perhaps a complete engineering review from top to bottom by a very skilled engineering manager who doesn't have to worry about things like making a profit, but such reviews are rare because they are so incredibly expensive to perform. Only on the most critical kinds of equipment, such as medical devices or something where somebody's life depends on it working will such a review be done.

      This is debatable with military equipment, and often it will meet such standards. However this is something that likely wouldn't have to pass the same level of review as a medical device, and the contracting company involved would have gone broke to engage in that sort of review on all of their products.... where this was a system that was awarded based on the lowest bid.

      Yes, I'd say this was a sloppy system designer as well, and somebody who should be demoted/fired for not properly expressing what the operational limits of this weapons system would be (uptime of 5 hours of operation or whatever it may be). If those operational limits don't meet specifications, then something should have been redesigned in this situation. Odds are that those specifications weren't listed in the first place.

      This also seems to be a difference between American and Asian engineers (culturally... I don't care about skin color here). American engineers tend to think about the actual application domain and "fill in" specifications that are missing. Most Asian engineers (from my own experiences... your experience may vary) tend to be much more exacting to the delivered specifications and don't "think outside of the box" too much. That the Asian engineers tend to deliver product faster is also true... making the Americans seem lazy for even trying to make the effort to understand the problem a little bit better.

      Japanese tend to be thinking more like Americans from more recent experience... so this isn't one size fits all to all Asian countries either. Of course the Japanese have had more time to interact with western nations as well. The Chinese culture is very definitely "do as you are told and don't question the boss, even if he is wrong."

    128. Re:Poor QA by Shihar · · Score: 2, Interesting

      The problem with the Palestine / Israel issue is that Israel is not working towards any solution. What is Israel's long term solution? Have sovereign absolute rule over a few million people in a prison that their citizens can, at will, and with army backing, snatch up pieces for settlement? Oh yeah, that is going to work out. Palestinians either need to be sovereign or citizens of Israel. Israel needs to pick one because the keeping a ghetto of nationless people method isn't working.

      Don't get me wrong. I am sympathetic to Israel in many regards, but they have fucked up the Palestinian issue with epic skills since 1967 onwards. Instead of immediate developing and executing a plan to 'deal with' the conquered land either through integration or by creating a sovereign democracy they opted to basically imprison a few million people form now until the end of time. It should come as a "no shit Sherlock" that 40+ years later these nationless people are pissed.

      Israel needs to rip a page out of the American handbook on imperil power. If your flatten another nation you have three options.

      1) You can integrate them into your nation as citizens and give them some level of enfranchisement as they did to Native Americans, Hawaiians, and Mexicans. This is not a basket of roses method, but as pissed as Hawaiians might occasionally be, I haven't seen many draw weapons or plant bombs.

      2) You could commit to rebuilding a conquered nation more or less in your own image, as the US did in Germany, Japan, South Korea, and Bosnia. This is expensive, but when it works everyone leaves the table more or less happy.

      3) Leave, stop trying to kick over their government with bombs, and accept the fact that these people are going to hate you for what you have done for a while and that only time is going to heal. This was done with Mexico, Vietnam, Haiti, North Korea, Lebanon, half of south America, etc.

    129. Re:Poor QA by Bigjeff5 · · Score: 1

      Actually it's a case of re-purposing an anti-aircraft weapon for anti-missile purposes. The system time error likely doesn't factor in to the equation much at all, because the original design would almost certainly have used the system time as a starting point for the tracking clock, meaning it has at most a minute or two for the error to accumulate.

      The fact is the Patriot missile is a poor design for an anti-missile defense, and as such there are exactly zero verified SCUD kills with the Patriot system. The Patriot uses a proximity sensor to detonate near the target, which then sprays shrapnel at the missile to destroy it. The delay on the sensor is fine for destroying an airplane - it has a very large target the shrapnel can hit and still destroy the aircraft. The delay is not so good for a SCUD missile traveling at Mach 5, however. Chances are the shrapnel, if it even hits the SCUD traveling that fast, will hit the rear of the missile which leaves a live warhead still falling toward earth and likely still on target, depending on the size of the target.

      The poor performance as an anti-SCUD has been known in the military for a long time now, the Israeli's in particular were very dissatisfied with its inability to stop Iraqi SCUDS during the Gulf War. Basically the Patriot system works on aircraft and large, slow missiles. It does not work well at all on fast missiles, never has and probably never will.

      In other words, the problem here was with the management (i.e. the US Military) for shoe-horning the Patriot system into a role it was not designed for, not the designers or programmers who created the system. Sure, we can look back and wish it was designed to be more flexible, but that does not mean you should bash the programmers for creating a system that worked very well for its intended purpose.

      --
      Security is mostly a superstition... Avoiding danger is no safer in the long run than outright exposure. - Helen Keller
    130. Re:Poor QA by megaditto · · Score: 1

      Are you sure this was clock drift? My reading of it was that the precision of the clock deteriorated, not the accuracy.

      They used a limited number of bits to store an unlimited integer, so the precision got worse as the least significant digits were rounded.
      A crude example, if you only have 4 digits to store uptime in seconds, as the time progresses:

      0.000 to 9.999 s -- precise to 1 millisecond, 0.5 ms rounding error
      100.0 to 999.9 s -- precise to 100 milliseconds, 50 ms rounding error
      1000. to 9999. s -- precise to 1000 milliseconds, 500 ms rounding error

      Presumably, at some point the error in distance: (time rounding error error) X (target speed) would place the target outside the targeting envelope of the radar, and hence the target disappears.

      --
      Obama likes poor people so much, he wants to make more of them.
    131. Re:Poor QA by mdarksbane · · Score: 1

      Meh, I've seen people write bad code from both disciplines. It really comes down to what the individual's background is and how much they decide to care about low level stuff. If you're doing business logic and database code, you'll never have to care about any of this, and you won't pay attention in the one required class that mentioned it.

      Of course, adding to this whole problem is the fact that low-level problems like this are a) much less likely to be the cause than your own mistakes and b) sometimes are fairly hard to quantify. Even if you know some of what you're doing, there's a big step between "I think this might be caused by a precision issue" and knowing exactly how big of a problem can be attributed to precision, and which calculations are affected by it. In one project I worked on, we had a visual jitter for months that we just assumed was us pushing against the limits of our available precision (storing locations in meters as floats on a state-sized map) that turned out to be another calculation altogether. Knowing something exists and knowing how big it is are very different problems.

    132. Re:Poor QA by mdarksbane · · Score: 1

      Of course, it doesn't help that for the most part instead of trying to integrate with the new leadership of the country, Palestinian leadership left the country and immediately started plotting the demise of their new rulers. Their arab neighbors weren't exactly a help - instead of integrating them into their own society, they kept them in refugee camps for years while promising that Israel would soon be destroyed so they could go home. The whole thing is a giant fuckup.

    133. Re:Poor QA by Anonymous Coward · · Score: 0

      "You are an ignorant, don't you?"

      Wow, that's quite a fucked up sentence. Almost Engrish-like in its meme potential.

    134. Re:Poor QA by Hognoxious · · Score: 1

      I don't have the reference to hand, but I've heard of at least one a case where a fatal crash ocurred because the pilot accidentally moved the controls and therefore unknowingly disengaged the autopilot.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    135. Re:Poor QA by Xenographic · · Score: 1

      It's not the computers which suck at math, it's us! We all learn real or complex arithmetic, but computers only work with a subset of the rational numbers (and *which* subset they work with depends on the machine and perhaps even the compiler if it changes floats and doubles around).

      The two aren't the same at all and you get a lot of crazy errors if you don't know what to avoid. That said, your link is a very good one. Lots of people think there's nothing wrong with if (a == b) where a & b are some kind of float...

    136. Re:Poor QA by Colin+Douglas+Howell · · Score: 1

      And of course, you shouldn't count out yourself. You're an Indo-European living in America. It seems hypocritical in the extreme to tell others to leave conquered lands. Your province of origin is northwestern Iran, every other place on this earth indoeuropeans live (including Europe), is obviously conquered from someone else.

      While your basic point is valid, this statement is bogus. Most speakers of Indo-European languages are not descended from the speakers of Proto-Indo-European. Languages spread not by biological descent but by people learning to speak them. Yes, the ancestor languages of the Indo-European family were spread partly by conquest, but that usually meant a small group of elite warriors taking over a larger population and bringing their language with them, in the same way that William the Conqueror's Norman knights brought their vocabulary to England. Most of the conquered people simply adopted the new languages.

      But yes, most humans are living on land conquered from other humans. And once you consider non-human inhabitants, every human becomes a descendant of interlopers. For that matter, all the non-human inhabitants are themselves usurpers from earlier ones. There's just no getting away from that.

    137. Re:Poor QA by jonadab · · Score: 1

      > And as for floating-point errors, any programmer that isn't
      > aware of those issues needs to be writing ... I don't even
      > know. Something that doesn't use floating-point numbers I guess

      Application-level networking code. Web, email, DNS, anything like that, you pretty much never see a floating point number, and if you do the precision doesn't matter because it'll be going through sprintf "%0.2d" momentarily.

      --
      Cut that out, or I will ship you to Norilsk in a box.
    138. Re:Poor QA by Anonymous Coward · · Score: 1, Informative

      The problem is the base mathematical principles that computers are also using.

      Any fraction that results in a repeating decimal or is an irrational number is going to have an inherent error due to rounding or truncating--the computer will never be using an exact value.

      Any calculation using pi or e is going to have error due to rounding or truncation for the same reason--the computer will never be using an exact value.

      Even respecting signficant digits, there will still be error in any calculation that has to round or truncate a fractional result.

      Note that this problem is also inherent to pencil and paper calculated mathematics--at the point the calculation has to round or truncate a number with values after the radix point (so that problem can occur in ANY whole number base, not just base 10 or base 2), an exact value is not being used at that point and so that and any subsequent calculation always has error associated with it.

    139. Re:Poor QA by MicktheMech · · Score: 1

      My ninth grade science teacher used to tell us "All error is human error." He was right too.

    140. Re:Poor QA by Anonymous Coward · · Score: 0

      So weak explanation, the fact is the Jews are robbering the Palestinian land.

      You lack somewhat in the area of making a solid argument.

      You might want to improve on that...

      (To be clear: Not saying that you're right or wrong, just saying that you suck at making an argument. As in: Nobody, and I do mean nobody, will be convinced by what you're saying.)

    141. Re:Poor QA by Anonymous Coward · · Score: 0

      Who gives a FUCK who was there 4300 years ago?

      You don't, apparently. Then again, nobody cares about you.

    142. Re:Poor QA by D'Sphitz · · Score: 1

      If you would have actually read the article you'd know he made it quite clear the errors in the examples he cited were avoidable, other than the hardware error in the Intel chip.

    143. Re:Poor QA by RzUpAnmsCwrds · · Score: 1

      Well, to be honest, I'm mixed about your comment.

      On one hand, I think that understanding the hardware is still an important part of producing code that runs well. On modern high-end CPUs, it's more about things like memory accesses than it is about optimizing out every last instruction, but it's still critical.

      On the other hand, it's unreasonable to expect that someone with an undergraduate degree would understand how a modern out-of-order CPU works. At best, we might hope that students get some understanding of data dependencies and simple out-of-order concepts like Tomasulo's algorithm and register renaming. Of course, every student should understand how caches work, why having unpredictable branches is bad, and just how badly you're screwed when you start hitting main memory frequently.

      The thing is, compilers already do a pretty good job with these kinds of optimizations. You can do brain-dead things like access multidimensional arrays in the wrong order, and the compiler will often (but not always) save your butt. You can put function calls all over the place, and many of them will get inlined. You can use constant variables instead of hardcoding values, and it won't matter.

      So, yes, if you're running on a system that's esoteric, or doing something that's unusual, you can sometimes dramatically improve performance by understanding the machine code and the hardware. But when you're running on a platform that's well known (x86 or ARM, for example), using a good compiler (GCC counts), you can be surprisingly dumb and still get good performance.

      If everyone were a systems expert, we would probably all be using VLIW systems right now.

    144. Re:Poor QA by Torodung · · Score: 1

      I believe it starts at a point where the people who own the homes, and have a deed to prove it recognized by an impartial and agreed upon authority, have the right to the land. The "individual land ownership" solution. Some call it partition.

      The "problem" with Israel began when Arabs sold them the land, and then other Arabs decided they didn't like their new neighbors. Wars ensued shortly afterward, because the basic premise of land ownership was not respected.

      Israel continues this proud tradition. No matter what arguments you may make about how intractable the problem is.

      Nothing excuses the kind of violence being perpetuated, by both sides, to decide the issue. It is a civil matter, to be settled in a court, not a military one. All landowners in the area would do much better to assent to an agreed upon third party and find a way to settle the civil matter. This hysteria about which "empire" to restore is the source of the trouble. We should be more concerned about which "family" will be forcibly removed from its home in the name of "empire" next.

      The other option is to leave. Go somewhere where civil law is respected. Lots of folks have done that.

      That's the only options I see. No empire. No single state solution. The people of the region must create impartial civil courts that recognize individual land ownership and live with one another in an uneasy peace. This will almost certainly require a third party.

      The other options are to leave or wait to get blown up, one by one. No establishment of an "empire" will help a damned thing. Jews are not going to find the "promised land" through a military solution, no matter what the book of Joshua says. Those days are gone. The only option is respect of personal property.

      And, of course, the enormous problem with that is that it would require some kind of unified government, which would probably annul the concept of an ancestral Jewish state. If a Jewish state is worth more to the Jews living there than peace, then they are on the correct path, though they haven't killed nearly as many people as they need.

      Setting up an ancestral state has always required mass murder when there is someone already living there. Check the Torah.

      And all who fell that day, both men and women, were 12,000 - all the people of Ai. For Joshua did not withdraw his hand... until he had utterly destroyed all the inhabitants of Ai.

      The less folks are allowed to deny that basic historical fact, the more they'll be willing to negotiate for something else.

      --
      Toro

    145. Re:Poor QA by Macman408 · · Score: 1

      Actually, it sounds like they were using a 24-bit floating point register for the clock, and just adding 0.1 every clock tick. Doing a bit of math, you can see that the article's error of .3433 seconds matches with a floating point format with 17 to 20 bits.
      0.1 (base 10) == 0.00011001100110011001100... etc (base 2)
      There are 3600000 0.1-second periods in 100 hours (base 10)
      Floating point formats vary, but 0.1 (base 10) would be stored as approximately 1.10011001100110011 * 2^-4 in binary. That's 18 bits of mantissa, though some formats (like IEEE) make the leading 1 implicit, so it could be 17 bits. The next two bits are zero, so we can't really tell if they're used or not, so the format could have up to 20 bits in the mantissa. The remainder of the 24 bits would be used for the sign and exponent.
      Now, that binary number is exactly 209715/131072 (base 10). Multiply that by 3,600,000 (the number of clock ticks), and you get exactly 5898234375/16384 (base 10). Subtract out 360,000 (the actual number of seconds), and you get (presto!) -0.343323 (inexact) seconds of error.

      On the other hand, if you count the clock ticks as an integer, then multiply by 0.1 any time you need the time, you're off by at most the least significant bit in the mantissa (or about 2^-18 or so). That would be a much better way of implementing the clock mechanism.

    146. Re:Poor QA by Idiomatick · · Score: 4, Interesting

      Uhh hezbolah was created to defend lebanon in the 80s after israel killed thousands of lebanese and occupied a good chunk of it.

      The last fight between them happened in 2006. Hezbolah kidnapped a few SOLDIERs to trade for PoWs (a common thing since israel has a shit ton of prisoners).

      Israel responded by sending in an army many 100s of times larger than lebanon's they bombed many buildings including hospitals, school, UN bunkers and apartment buildings. Hezbolah fired rockets back to show resistance.

      In the end Israel killed 1200 civilians, 300soldiers, and a significant percentage of the countries economy. Hezbolah killed 120soldiers, 40civilians. Notice the fucking difference in ratios. Oh and the whole time hezbolah conducted rescue missions, gave out food and helped transport people to safety. So fuck off.

      Also: "Hezbollah is now also a major provider of social services, which operate schools, hospitals, and agricultural services for thousands of Lebanese Shiites, and plays a significant force in Lebanese politics.".

      Also hezbolah states that they distinguish between zionists and jewish. Their stated reason for firing rockets is continued resistance against israeli attacks and to put an end to any colonial entity within lebanon. NOT kill jews.

      How the fuck parent got modded up is beyond me. Every single point is a verifiable falsehood.

    147. Re:Poor QA by mbkennel · · Score: 1

      "The computer did exactly as instructed, it's just that the pilot's (unintentionally given) instructions were stupid, and the fact that it took the pilot over 3 minutes to realize just how stupid he had been"

      It is most unequivocally a major software design error when more desperate attempts by the autopilot to get the plane to gain height, which eventually resulted in a total loss of lift for the plane happens.

      All cruise controls on cars shut off when you tap the brakes.

      Recently four people died in San Diego, including the highway patrol officer driving. A late model Lexus got a stuck accelerator input somehow and crashed at 120+ MPH. There was no key, it was a pushbutton. (and the transmission was some newfangled thing that didn't have a conventional 'neutral' postition). The driver was borrowing the car and didn't know to use the entirely unintuitive "hold for three seconds" to turn off engine. The engine was at full open throttle and being a potent modern Lexus, it totally overwhelmed the brakes (witnesses said that the brake discs were glowing red hot and the brake pads were on fire). At high RPMs the brake vacuum goes down.

      The software design error is of course not shutting down the fuel injectors in a "full brake" situation.

    148. Re:Poor QA by Anonymous Coward · · Score: 0

      Note the obvious truth : the Jews controlled Israel about 4300 years before the arabs even left their tiny province ...

      Cite your source please. And ancient holy books are not sufficient.

    149. Re:Poor QA by Anonymous Coward · · Score: 0

      thats weird because i CSC major at a Community college realize that floating points can be a problem even after only being in class for a couple months

    150. Re:Poor QA by Jane+Q.+Public · · Score: 1

      The article strongly implied that the clock was accumulating rounding errors, but it doesn't actually say that. The fact is that nobody, then or now, has ever built system clocks that way. The system clock is nothing but a binary counter that is incremented every clock tick. The only way it can be off by any significant amount is if the clock frequency is not sufficiently precise.

      So "rounding errors" do not accumulate. The clock can be off, but only in the sense that your wall clock can be off: it simply isn't keeping proper time. Because the count is occurring purely in a hardware counter, software rounding or other kinds of software or even firmware errors -- including floating-point math -- have absolutely nothing to do with it.

    151. Re:Poor QA by TapeCutter · · Score: 1

      "How the fuck parent got modded up is beyond me."

      You'll see that all of the random ass-headed cruelty of the world will suddenly make perfect sense once we go inside the monkeysphere.

      --
      And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
    152. Re:Poor QA by Anonymous Coward · · Score: 0

      Do you want to be the one to explain to the generals why their stand-alone, truck-based mobile air protection system needs a hard-line network connection to work?

      Especially when the thing is undoubtedly wired to GPS, which uses a nuts on atomic clock. Timesync should be no big deal.

    153. Re:Poor QA by waddleman · · Score: 1

      The Patriot has at least 8 confirmed kills. Granted all occured in the second Iraq war, but we should stop spreading lies.

      http://www.acq.osd.mil/dsb/reports/2005-01-Patriot_Report_Summary.pdf

    154. Re:Poor QA by Siffy · · Score: 1

      So in a system that should have clocks synchronized to less than a microsecond nobody bothered to run "ntpdate" even once in hundred days ?

      100 hours. 4 days-ish. 3,600,000 deciseconds.

    155. Re:Poor QA by Anonymous Coward · · Score: 0

      The computer did exactly as instructed, it's just that the pilot's (unintentionally given) instructions were stupid, and the fact that it took the pilot over 3 minutes to realize just how stupid he had been.

      Sounds like a user interface problem to me. Given the potential consequences of that particular user error, the fact that the autopilot was still engaged should have been made more obvious to the pilot. (e.g. when the plane computer sees that a struggle is going on between the autopilot and the manual controls, it should prompt a loud, un-maskable synthesized voice shouting "THE AUTOPILOT IS ENGAGED, YOU IDIOT!")

      Actually the more intuitive solution for most pilots (heck most people) would be for the autopilot to disengage when the pilot inputs are detected. This should be accompanied by the appropriate autopilot disconnect annunciation. Unfortunately in this specific case the airplane / avionics designers decided that the pilots needed to perform a specific task to take control instead of exerting control through the normal interfaces. Human factors engineering and design philosophies matter. So do clear specifications of expected performance.

      It is easy to monday morning quarterback decisions made years ago. Most SW developers I know pride themselves on their software and algorithm development skills but very few actually fully grasp the domain they are working in. Similarly the domain experts rarely understand all the implications and limitations of the computing system. This is not a slam on anyone but the rather obvious result of experts in differing highly specialized fields working to solve a problem. Each knows what they mean but not what the other means.

      Also keep in mind that the systems being discussed are very very limited in capability compared to what most of us use regularly. These systems have been around for a long time (several desktop processor generations) and were no where near state of the art when initially fielded.

    156. Re:Poor QA by jrumney · · Score: 1

      Yes. The issue here sounds like they had a system clock counter that was an integer, that counted the number of 0.1 second clock ticks. Then they wanted to convert this to a floating point number in 24 bit IEEE format, They simply multiplied 0.1 by the integer in the register. Of course, that still sounds like too large an error top have occured from just that, but lets pretend it did.

      It sounds more like they compounded the error by storing the ticks as floating point rather than an integer. So each time they added 0.0999999997 to the tick count, the error becoming more and more significant as time went on.

    157. Re:Poor QA by Anonymous Coward · · Score: 0

      That's it. You're getting locked up!

    158. Re:Poor QA by Arthur+Grumbine · · Score: 1

      Besides the Commodore=64 makes one hell of a game machine. It's like an NES, but since somewhere around 5000 games were released for it, I could spend the rest of my life before I play them all. And they are all free. :-)

      Who am I to criticize? I choose to play Runescape instead of WoW or LotR Online. The (gamer's) heart wants what it wants...

      --
      Now that I think about it, I'm pretty sure everything I just said is completely wrong.
    159. Re:Poor QA by DamienNightbane · · Score: 1

      I'd moderate this +1 OH SNAP! if I could.

    160. Re:Poor QA by dave87656 · · Score: 1

      So in a system that should have clocks synchronized to less than a microsecond nobody bothered to run "ntpdate" even once in hundred days ?

      Eh, I don't think these Scuds were connected to the internet at the time.

    161. Re:Poor QA by dave87656 · · Score: 1

      Ooops, sorry, I meant Patriot rockets, not Scuds.

    162. Re:Poor QA by youngburnsy · · Score: 1

      PEBKAC = Problem Exists Between Keyboard And Chair

    163. Re:Poor QA by Rocketship+Underpant · · Score: 1

      "Which do we restore ? "

      Homes and villages to people who were forced out of them would be a good start. Homes that have been bulldozed to make a political point would be another.

      If you think ancient history is an excuse for people's actions in this generation, you're sadly mistaken.

      --
      He who lights his taper at mine, receives light without darkening me.
    164. Re:Poor QA by Anonymous Coward · · Score: 0

      I don't think any other government would do this - mistakes in the military would just get covered up as state secrets and anyone who tried to talk about them would get locked up or worse.

      Eh. Forgive me, but do you have any basis whatsoever for this claim, or are you just being arrogant?

      Pull your head out of your ass. See: Soviet Union Weapon Systems development circa 1945-1992. The sources are numerous, and all blank. If you don't get this, then you should stick your head back into your ass...

    165. Re:Poor QA by Hognoxious · · Score: 1

      if software is designed to use an unreliable clock blindly without any form of checking or verification then yes, it's a software error.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    166. Re:Poor QA by Lehk228 · · Score: 1

      more importantly, even if it did work well, it will always be cheaper to fire a cluster of tinfoil wrapped decoy estes style rockets with every real attack missile, then instead of a radar system that only has to look for "hot, fast, and metal" the system has to do target prioritization and discrimination in the same amount of time. add to that the fact that the type of incoming ordinance isn't known until detection and you have to discriminate two or more different signatures, analyze to see if they are a real threat, and then engage. while making sure not to accidentally judge amixed wave of 2 or more types of real rockets as rockets and decoys.

      --
      Snowden and Manning are heroes.
    167. Re:Poor QA by Hal_Porter · · Score: 1

      Ok, Sweden has the openness. Still they weren't much help in either WWII or the Cold War. Nor really could they have been even if they'd wanted to. Which they didn't.

      You need somewhere that combines an open society with somewhat more adventurous foreign policy, and a lot more heft.

      Sweden is a bit like the Tollan in SG1 - it's an advanced society but one that is too aloof too be much help when you're about to get overrun by the bad guys.

      --
      echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
    168. Re:Poor QA by FireFury03 · · Score: 1

      I don't have the reference to hand, but I've heard of at least one a case where a fatal crash ocurred because the pilot accidentally moved the controls and therefore unknowingly disengaged the autopilot.

      Aeroflot flight 593. The auto pilot disconnected it's control of the ailerons (whilst still keeping on controlling everything else) and didn't provide any warning that it had done so. In fact, autopilots doesn't unexpected things seems to be behind a lot of accidents.

    169. Re:Poor QA by Hognoxious · · Score: 1

      But if you're predicting where a missile will be, it doesn't matter whether the uptime is accurate or not. When you calcuate the velocity you're going to subtract the times anyway, so any accumulated difference is lost.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    170. Re:Poor QA by Alef · · Score: 1

      Ok, Sweden has the openness. Still they weren't much help in either WWII or the Cold War. Nor really could they have been even if they'd wanted to. Which they didn't.

      And I am sure there are other countries with similar openness as well. As for the situation in the Cold War it had more to do with being squeezed in between the NATO block and the Soviet Union, with a country full of natural resources and a very strategic location controlling the Baltic Sea providing access to the Atlantic.

      Publicly picking either side would have been disastrous, resulting in imminent invasion from the other side. Nevertheless, behind the scenes Sweden actually conducted intelligence gathering against the Russians on behalf of the western block.

      Sweden is a bit like the Tollan in SG1 - it's an advanced society but one that is too aloof too be much help when you're about to get overrun by the bad guys.

      Foreign policy of any country has more to do with game theory than good versus evil. You'll be hard pressed to find any government that doesn't either act in (what it thinks is) its own best interests or the best interests of its people (depending on how democratic it is). This is perhaps not what is said in the public rhetoric, but then we are at the stage of convincing and motivating the population.

    171. Re:Poor QA by jrumney · · Score: 1

      I'm astounded that such profound ignorance of the Sheeba Farms issue is widespread enough for you to be modded as informative rather than the flamebait I suspect you intended to be.

    172. Re:Poor QA by Hal_Porter · · Score: 1

      Foreign policy of any country has more to do with game theory than good versus evil.

      Hmm, that's odd. Normally Swedes lecture me how evil the US and the UK are, but carefully avoid mentioning the much more serious evil of their opponents. It reminds me a bit of this Orwell comment -

      Pacifist propaganda usually boils down to saying that one side is as bad as the other, but if one looks closely at the writings of younger intellectual pacifists, one finds that they do not by any means express impartial disapproval but are directed almost entirely against Britain and the United States. Moreover they do not as a rule condemn violence as such, but only violence used in defence of western countries.

      http://www.orwell.ru/library/essays/nationalism/english/e_nat

      Mind you Sweden as a country clearly considers itself above taking any side whatsoever, just in case the bad guys win and (being bad guys) will want revenge. Still that only works so long as the Nazis, Commies and so one get defeated by someone else. It's hard to imagine Sweden surviving in a Europe totally dominated by Hitler and/or Stalin for example. As you say, Sweden has a lot of resources and a good location. It would was also very rich post World War II, and would have been a tempting target for both dictators.

      Once you understand there is a game and it's not good (to say the least) to lose, you are on the way to enlightenment. Well maybe not, but it certainly makes it clearer who your allies are.

      --
      echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
    173. Re:Poor QA by Eli+Gottlieb · · Score: 1

      Sheba Farms is claimed by Syria as well as Lebanon, and Israel can make a valid claim on top of that. Note that AFAIK Sheba Farms is an Israeli civilian zone rather than a military-occupied zone such as the West Bank. You can call me flamebait (no, I actually intended to provide relevant information) when you get all three of those countries with different legitimate claims to sort out who the hell the Farms belong to.

    174. Re:Poor QA by Anonymous Coward · · Score: 0

      So, you think a system controlling a missile is running Linux and is on the Internet??? You know, not all programming is like on your PC in your mommy's basement. NO, ultimately the mistake was the incompetent programmer who didn't know how to write real-time systems. For one thing, you don't use floating-point representations. You use fixed point, which can be represented exactly.

    175. Re:Poor QA by Eli+Gottlieb · · Score: 1

      The last fight between them happened in 2006.

      When Lebanon was completely unoccupied. Why hasn't Hizballah's war of so-called freedom against Israel ended with the Israeli occupation of Lebanon?

      Israel responded by sending in an army many 100s of times larger than lebanon's they bombed many buildings including hospitals, school, UN bunkers and apartment buildings. Hezbolah fired rockets back to show resistance.

      I like how you attempt to portray military competence as immoral and military inability as righteous.

      Also hezbolah states that they distinguish between zionists and jewish.

      You claim that about an organization whose leader has said the following:

      If they (Jews) all gather in Israel, it will save us the trouble of going after them worldwide.

      If we searched the entire world for a person more cowardly, despicable, weak and feeble in psyche, mind, ideology and religion, we would not find anyone like the Jew. Notice, I do not say the Israeli.

      and of course:

      The Lebanese refuse to give the Palestinians residing in Lebanon Lebanese citizenship, and we refuse their resettlement in Lebanon. There is Lebanese consensus on this...we thank God that we all agree on one clear and definite result; namely, that we reject the resettlement of the Palestinians in Lebanon.

      This is not a man who wants to liberate "Palestine", nor to protect his own country from occupation. This is a man and this is an organization built on the idea that all Jews everywhere must die in order to please God.

      Stop pretending otherwise just because their miserable failure provokes pity.

    176. Re:Poor QA by Alef · · Score: 1

      I'm not sure why you think I should have to respond to what other Swedes supposedly have said to you, or what the straw man about young pacifists has to do with foreign policy, so let's just leave it were we are.

    177. Re:Poor QA by smallfries · · Score: 1

      Is that the same William Stein who writes Sage commenting in an article about numerical impression? I'm impressed, slashdot still has style... :)

      Personally I would have used 1/8 second ticks instead of a Python symbolic algebra package, but I'm old school like that.

      --
      Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
    178. Re:Poor QA by Anonymous Coward · · Score: 0

      2. The Patriot has a quite good record against SCUDs (after the software upgrades). Much better than the Soviet SA-2s did against B-52 raids in Vietnam.

      It really depends on how you define success.
      Operation Linebacker II

    179. Re:Poor QA by smallfries · · Score: 1

      So first you berate people for not reading the article, then when they point out it contradicts you suddenly it's wrong? You have to be a troll....

      Ignore my other reply about accumulated error, it turns that I was wrong. The software problem was actually as simple as the loss of precision in the fixed-point multiply. Very simple but complete description is available here.

      --
      Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
    180. Re:Poor QA by smallfries · · Score: 1

      Of course that is the solution they used that caused the problem.

      --
      Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
    181. Re:Poor QA by DrVxD · · Score: 1

      Hindsight is almost 20/20

      Hindsight should be far better than that; 20/20 is average, not perfect visual acuity

      --
      Not everything that can be measured matters; Not everything that matters can be measured.
    182. Re:Poor QA by Anonymous Coward · · Score: 0

      > So first you berate people for not reading the article, then when they point out it contradicts you suddenly it's wrong? You have to be a troll....

      Ah, but Jane is sure that the design of the Patriot is different than the article describes, because Jane's experience of OTHER systems has been different.

      Jane is not a troll, because a troll is only motivated to stir up debate and get replies. The accurate term for Jane is 'blowhard.'

    183. Re:Poor QA by Bigjeff5 · · Score: 1

      Everybody here is blaming the programmers when the Patriot issue was a hardware, not a software issue!

      You are absolutely correct, but not for the reasons you think you are.

      The Patriot missile system was designed to shoot down aircraft, not missiles. The design of the system, even if it were operating at perfect precision, would not have a high success rate for destroying high-speed missiles.

      The article is bullshit for the same reason, because the Patriot system was successful at destroying aircraft and slow-moving missiles. Frankly, being off by 600 feet is going to blow your chances of killing an airplane traveling at or near Mach 1 almost as much as it will blow your chances of killing a missile traveling at Mach 5.

      The biggest problem here is the Patriot system was designed to spray shrapnel in a cone at aircraft. There are any number of places the shrapnel can hit and successfully down the aircraft. The explosive is a proximity explosive and it has a known delay. This is completely acceptable when downing aircraft traveling at or below Mach 1. When dealing with a missile traveling at Mach 5, however, you aren't very likely to get a solid hit on the missile body with the shrapnel and it is almost impossible to destroy the warhead. In case you don't know how big missiles work, the warhead is the part that goes boom when it lands, and it is very important that your missile defense system destroys or disables that warhead.

      So what would be necessary to make a successful SCUD killer? Probably either a really big warhead and a tight proximity sensor to create a shockwave big enough to destroy the missile entirely, or a multi-warhead missile that could split and cover a large enough area to reliable destroy the missiles. Of course, everything works better with a more precise tracking system, as the closer you can get the better your chances of blowing it up.

      Seriously, bad floating point math, if that was a true factor here, is way down on the list of reasons the missiles failed.

      And if you think computers are bad at math, try calculating Pi or Phi and you'll discover that decimal math has the exact same problems binary math has, they simply occur in different places. Some concepts that seem simple in decimal (like 0.1) are hard in binary, and vice versa. Frankly, people suck at binary math so we never see any advantages.

      --
      Security is mostly a superstition... Avoiding danger is no safer in the long run than outright exposure. - Helen Keller
    184. Re:Poor QA by Anonymous Coward · · Score: 0

      Discussing this ad nasuem is retarded. There was a stupid implementation decision by a novice programmer that was not caught during reviews. Anyone worth a grain of salt would know that repeatedly adding "0.1" to a floating point would become a problem, just like asking you to perfectly record the value of 1/3 in decimal. It could have been fixed in dozens of ways.

      Talking about computers sucking at math is pure hyperbole by people looking for sensationalism. Computers are awesome at math. They can compute literally billions of calculations per second absolutely perfectly. If you tell them to compute the wrong thing, then that's your fault. They are machines with strengths and weaknesses, and using them improperly is operator error. Your car cant swim, your microwave doesnt display what's on tv, your vibrator is not a comb, a saw is not a hammer or a drill, etc. You trying to make them so is not their problem.

      This was a bad programmer writing a piece of code badly.

    185. Re:Poor QA by phoenix321 · · Score: 1

      - Military vehicles usually have military-grade GPS.
      - GPS uses atomic clock signals on synchronized satellites to effectively triangulate a receiver's position.
      - GPS works, as the name suggests, pretty much everywhere around the globe
      - All GPS calculations need clock signals of a high precision.
      - Military-grade GPS even has an extremely precise clock signal.

      Now why on earth should a Patriot launcher (unit price several million) NOT include a military-grade GPS receiver to synch to the satellite network's clock signal? With a GPS receiver, you'd have the world's best NTP source available as long as you can see a bit of sky.

      JDAM missiles can do it and they're one use only at two million quid a piece. So what was the stupid reason they left out that piece on Patriot components?

    186. Re:Poor QA by tcolberg · · Score: 1

      That's because the Patriot missile system was deployed in the early 1980s and developed in the '70s. GPS wasn't fully operational until the 1990s.

      As for the example of the Patriot missiles missing the Scud missiles, that was during the first Gulf War. It wasn't really possible to have a network connection to those mobile launch platforms to do a time sync.

    187. Re:Poor QA by ckaminski · · Score: 2, Interesting

      What I find deplorable is how the US propagadizes about supporting Democratic regimes, but when the Palestinians elected Hamas to power, we refused to have any dealings with them. Talk about fucking hypocritical (spoken as a lifelong US citizen).

    188. Re:Poor QA by Anonymous Coward · · Score: 0

      Uhh hezbolah was created to defend lebanon in the 80s after israel killed thousands of lebanese and occupied a good chunk of it.

        The last fight between them happened in 2006. Hezbolah kidnapped a few SOLDIERs to trade for PoWs (a common thing since israel has a shit ton of prisoners).

      Israel responded by sending in an army many 100s of times larger than lebanon's they bombed many buildings including hospitals, school, UN bunkers and apartment buildings. Hezbolah fired rockets back to show resistance.

      In the end Israel killed 1200 civilians, 300soldiers, and a significant percentage of the countries economy. Hezbolah killed 120soldiers, 40civilians. Notice the fucking difference in ratios. Oh and the whole time hezbolah conducted rescue missions, gave out food and helped transport people to safety. So fuck off.

      Also: "Hezbollah is now also a major provider of social services, which operate schools, hospitals, and agricultural services for thousands of Lebanese Shiites, and plays a significant force in Lebanese politics.".

      Also hezbolah states that they distinguish between zionists and jewish. Their stated reason for firing rockets is continued resistance against israeli attacks and to put an end to any colonial entity within lebanon. NOT kill jews.

      How the fuck parent got modded up is beyond me. Every single point is a verifiable falsehood.

      Shitbag cocksucker. Zionism = Judaism. Fuck you and each and every one of your arab apologists. The prime obstacle to peace with the muslim extremist hordes is the muslim extremist hordes.

      How the fuck something that should have been left as a cumstain on jo mamma's mattress and instead ended up being a misguided poster on slashdot is a damned mystery..

    189. Re:Poor QA by Shinobi · · Score: 1

      Actually, I'm not an engineer by education. My major is in information sciences with a minor in psychology. But I've always been programming, and always had an engineers mindset, how to get things to work in real life, with the least waste of resources etc.

      The keyword there was TOO abstract and high-level. Note that many comp.sci graduates nowadays believe that everything can be done in Python. The problem is, the complexity grows with the abstraction when you get past the superficial ease of use. It is, ironically, harder to write deterministic code in Python than it is in C, and C is bad enough, especially with an optimized compiler. Assembler is easier. Erlang also makes it easier, but there you have the problem of the creators being adamantly opposed to threads, failing to realize that there are problem domains where threads is the best solution.

      Another difference between working with Comp.Sci and Comp/Electrical Eng programmers is, comp.sci thinks for example the Cell processor is difficult to work with. Comp.Eng/EE go "So it's a multi-core DSP, with a fat RAM pipe and decent DMA? Awesome, let's have fun". There's also the issue of experience with working with low-powered hardware. I know the Comp.Engs from KTH, LiTH etc in Sweden have to do some learning on CPU's like Dragonball, or even 68020/386 era CPU's, just so they get some orientation on what embedded programming is like.

    190. Re:Poor QA by Shinobi · · Score: 1

      The problem with that is, it makes programs generalized blobs, and the excessive trust in the compiler leads to less deterministic code, and harder to debug, no matter if you have the source or not. The more tricks the compiler pulls, the more it deviates from what you may have intended to do.

    191. Re:Poor QA by mikep554 · · Score: 1

      Yes, that was an absurd statement. Our military has tried just as hard as anyone else's to keep its failures under wraps. The classic example is the air force bomber that crashed back in the 50's, killing the entire crew. The military said they couldn't release the crash report because the plane was carrying super-secret experimental instruments. About 15 years ago, the report was finally made public. It turns out the plane crashed due to a known faulty system. It was known that the particular plane had not been upgraded, but the decision was made by the brass to continue flying the planes that had not been upgraded. Oh, and the plane was on a completely routine training flight, and was not carrying any secret equipment or cargo. As the cyclons said, this has all happened before and it will all happen again. Our government is no less guilty than any other government.

    192. Re:Poor QA by treeves · · Score: 1

      Well, this sort of thing has cropped up before and it has always been attributable to human error. I hope you're not concerned about this, Dave.

      --
      ...the future crusty old bastards are already drinking the Kool-Aid.
    193. Re:Poor QA by dave420 · · Score: 1

      Like when that US crew shot down an RAF Tornado. Brilliant.

    194. Re:Poor QA by Hognoxious · · Score: 1

      Don't think it was that - I'd have remembered that the pilot was letting his kids fly the plane.

      I was probably thinking of this one.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    195. Re:Poor QA by Anonymous Coward · · Score: 0

      And was the solution to rewrite everything in Super-H ASM, taking months, or to request that a CS enhance gcc, taking probably a week or less?

      Just keep in mind the discipline of Computer Science invented the Compiler.

    196. Re:Poor QA by LordKazan · · Score: 1

      Simple solution to the land line problem. they have a GPS device.. use it's time! IT requires accurate time to get a location fix.. accurate time with even more sensativity than the missile system

      and seriously... 32 bit signed floating point? have they never heard of a "double" in C++..every compiler i've ever seen does 64bit doubles.

      --
      If you cannot keep politics out of your moderation remove yourself from the Mod Lottery.. NOW!
    197. Re:Poor QA by LordKazan · · Score: 1

      it's called system upgrades. there are B-52's flying around with modern GPS units replacing their old radio-based positioning system. the rest of the aircraft's hardware knows nothing about the change, just sees more accurate data

      --
      If you cannot keep politics out of your moderation remove yourself from the Mod Lottery.. NOW!
    198. Re:Poor QA by vertinox · · Score: 1

      Also: "Hezbollah is now also a major provider of social services, which operate schools, hospitals, and agricultural services for thousands of Lebanese Shiites, and plays a significant force in Lebanese politics.".

      I don't mean to Goodwin this, but so did the Nazi and other fascist political party throughout history prior to coming into power. I mean a lot of the National Socialist party members participated in social projects in the 1930s as volunteer work in response to the Weirmar's government inability to fix the economic situation of the depression.

      Hell... Even Al Capone ran soup kitchens...

      But that didn't make them much morally better obviously.

      --
      "I am the king of the Romans, and am superior to rules of grammar!"
      -Sigismund, Holy Roman Emperor (1368-1437)
    199. Re:Poor QA by thickdiick · · Score: 1

      What part of CONQUERED don't they understand? It was conquered, and now doesn't belong to them anymore.

    200. Re:Poor QA by Anonymous Coward · · Score: 0

      So who do we allow to settle there ?

      The "kingdom of Egypt" (the state of the Farao's) ? (exterminated to the last man by muslims)
      The Hittite Emptre ? (exterminated by the Greeks, Romans, Persians)
      The kingdom of Israel ?
      The Assyrian Empire ?

      Which of these do we restore ? (note that the palestinians, or to be more exact, the arabs only come into play about 4500 years after the Assyrian Empire)
      Which do we restore ? And why do they have more rights than all the others who conquered that piece of land ?

      This is the 21st century. Oughtn't we to have outgrown the idea that a particular ethnic group ought to "control" a particular piece of land?

    201. Re:Poor QA by metamatic · · Score: 1

      Yeah, if I had a nickel for every time a programmer has used floating point inappropriately, I'd have $319.19999999999997.

      --
      GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
    202. Re:Poor QA by Idiomatick · · Score: 1

      Oh I agree. But often people portray middle eastern conflicts to be very black and white. It isn't. Were one side pure evil and the other side righteous it would make it a lot easier but it certainly isn't that way.

      Also, it totally is a godwin. Hitler DID do good things, helping poor people or whatever are GOOD things. And using hitler to show that it doesn't prove anything is the definition of a godwin. The only reason your argument has weight is because people find it difficult to think of hitler as anything other than a cartoon villian.

    203. Re:Poor QA by LanMan04 · · Score: 1

      If I tap on the breaks in my car the cruise control disengages, it does not fight me.

      Computer: Resuming computer control of Icarus II.
      Cassie: Negative, Icarus. Manual control.
      Computer: Negative, Cassie. Computer control. Returning vessel to original rotation.
      Cassie: What? Icarus, override computer to manual control.
      Computer: Negative. Mission in jeopardy.

      Sometimes, maybe you *do* want the computer to fight you...

      Sunshine

      --
      With the first link, the chain is forged.
    204. Re:Poor QA by BranMan · · Score: 1

      Very good post. As someone who worked on PATRIOT at one time (Transmitter Maintenance Upgrade project - for about 3 1/2 years) - there were a lot of factors. Some of which escape everyone here.

      1) It was designed and built in the late 60s. Any idea of the state of the art for computers back then? After factoring in that the military does not use the latest - it must be proven and ruggedized - i.e. Mil spec.? The PATRIOT 'brain' if you will was a processor built across a whole cage full of circuit cards, programmed in assembler.

      2) It was mostly with the advent of our TMU project entering production that the PATRIOT was likely to operate for more than 100 hours at a time. They are finicky beasts - and repairing them before the TMU was a nightmare.

      3) It was extremely expensive to operate and fire. So much so that most crews that operate one get one or two live fires at White Sands, and that's it. They just weren't on 24/7 all the time - the transmitter tubes themselves have a limited life and were about 1/4 million dollars apiece - so this was not seen.

      4) They were meant to move around constantly - built for the cold war. If the enemy knew where your AA was located, it was dead. So again, it wasn't run constantly enough for people to see the flaw.

      5) The flaw was found in Israel, and a software patch was just a couple of days too late in the mail to prevent this disaster.

      6) The poor performance was simple economics, nothing more. There were more Scud missiles in existence than PATRIOT missiles. Pretty soon after the shooting started and everyone realized that, they were restricted to one PATRIOT for one Scud - period. The PATRIOT was 'tweaked' to 'miss' so that it disabled the warhead, but the PATRIOT missile is real small compared to a Scud. It could disable it, but nothing will prevent tons of Scud missile from continuing on a ballistic trajectory.

    205. Re:Poor QA by Anonymous Coward · · Score: 0

      I'm so glad that the "stated" reason for randomly launching multiple missiles at population centers isn't to kill jews. That must be just a happy accident.

    206. Re:Poor QA by Idiomatick · · Score: 1

      Yeah we oppose monetary donations of values over 1000$ or w/e it is towards political parties for fear it might fuck with the democratic process. On the other hand its cool to say: "Hey, if you elect these guys you will enjoy the lesser chance of us bombing the shit out of you and get a half billion dollars extra in trade.". Yaaay.

      Another nice hypocritical situation. Jewish people complaining about the holocaust almost daily using it as a shield or excuse. (around 6million dead, 70yrs ago). Comparatively ethnic cleansings in africa trot out similar death tolls every year. Yet they aren't afforded a shred of sympathy compared to the jewish cause. And I realize this point combined with my earlier points will probably make me look like an anti-semite. I'm honestly not, I am just anti murdering millions of people for no good reason.

    207. Re:Poor QA by Anonymous Coward · · Score: 0

      “Hezbolah KIDNAPPED to trade” – Hmm did someone said “I rest my case”? Just imagine if Panama or Grenada would do something like that with US...

      “Israel RESPONDED by sending in an army".
      "Hezbolah fired rockets back to show resistance.”

      And while showing the resistance Hezbolah, in the best tradition of terrorist human shield, fired rocked from various “buildings including hospitals, school, UN bunkers and apartment buildings”
      Now after causing the war, destruction and death of civilians used as human shield Hezbulah is the hero and provides services, destruction of which they caused in the first place and “Plays significant force in Lebanese politics.”
      I wander why.

      “Also hezbolah states that they distinguish between zionists and jewish”

      “Also hezbolah states that they distinguish between zionists and jewish” –
      Yes and our belief in that meaningful statement is stronger with every act of terror against Israeli population. Every casual missile lunched at northern Israel cities, every exploded bus, every café and every victim of suicide bomber attack strengthen our belief in Hezbulah and alike high concern for human life and high standard for human rights and values.

      “In the end Israel killed 1200 civilians, 300soldiers, and a significant percentage of the countries economy. Hezbolah killed 120soldiers, 40civilians. Notice the fucking difference in ratios.”
      Noticed... Please stop firing rockets into densely populated, civilian areas, pretty please, with sugar on top, do not activate explosives in the teenage night clubs, cafés and busses, also restrain for acute desire to kidnap soldiers or civilians would be highly appreciated.

    208. Re:Poor QA by megaditto · · Score: 1

      Rounding errors do not accumulate, but the rounding error becomes larger as the count increases. Once you accept that, the rest follows.

      Say your software calculates a time interval as uptime2 minus uptime1 (rounded to 5 significant figures):

      t1 = 0.90000
      t2 = 0.94000

      t2-t1 = round(0.94000) - round( 0.90000) = 0.04000

      If same calculation is performed 30+ minutes later, you fuck up:

      t1 = 2222.90000
      t2 = 2222.94000

      t2-t1 = round(2222.94000) - round( 2222.90000) = 0.00000 = 0

      --
      Obama likes poor people so much, he wants to make more of them.
    209. Re:Poor QA by Jane+Q.+Public · · Score: 1

      I admit that if the explanation is correct then I was wrong about the error in the clock itself. I felt that was implicit in my next reply. If I owe the first person to whom I was replying an apology, so be it. But my statements from that point on stand.

      I read the explanation from umn.edu, and it changes nothing. So they were using a 24-bit register to keep time, but they were accumulating that number by each clock tick in software, which is a real bonehead thing to do.

      To quote the umn.edu statement: "The effect of this inaccuracy on the range gate's calculation is directly proportional to the target's velocity and the length of the the system has been running." The ONLY way that could happen, is if (1) they were indeed accumulating rounding errors by accumulating EACH clock tick, which nobody worth their salt as a programmer would do, or (2) the clock itself was off. The explanation you linked to states that they did the former.

      If you assume that the umn.edu explanation is correct, it was still a poor choice of hardware design because they were only using a 24-bit register, and rather than converting the ACCUMULATED time from the register into a floating-point number (which would only result in one rounding error, and which is the way any sane person would do it), they were indeed accumulating errors from converting to floating-point at each clock tick. And that is really funny, because I repeat: nobody in their right minds do it that way. It is just plain stupid. My actual problem is that I assumed they were not stupid, so I assumed they would not do it that way. (Read the rest of the thread, and see how many times I have stated that.)

      But even if all that is so, and the explanation is accurate, they STILL made bad hardware decisions, because that 1/10th system clock is still simply not adequate to their needs, nor is the 24-bit resolution.

    210. Re:Poor QA by Jane+Q.+Public · · Score: 1

      "Jane is not a troll, because a troll is only motivated to stir up debate and get replies. The accurate term for Jane is 'blowhard.'"

      Even blowhards are right sometimes. Please see my other replies in this thread.

      If you think you are actually adding something intelligent, why don't you log in, instead of being an Anonymous Coward? Ah, that's right... I keep forgotting about the "Coward" part.

    211. Re:Poor QA by Jane+Q.+Public · · Score: 1

      "Ah, but Jane is sure that the design of the Patriot is different than the article describes, because Jane's experience of OTHER systems has been different."

      No, Jane was sure the design of the Patriot was different than the article describes, because doing it the way described would have been stupid. And, according the the GAO report and the umn.edu article that other people linked to, guess what? They did indeed design the software stupidly.

      According to the text at those links, they made mistakes that not even most first year programming students would make, which they eventually caught and corrected.

      So, pardon the hell out of me for assuming that people who design missiles do not do stupid things. I stand corrected.

    212. Re:Poor QA by smallfries · · Score: 1

      Your description didn't sound like the link that I'd posted, so I've just had to go and refresh my memory of it. Yes, you are in fact largely correct. It is a design flaw, and you could call it a hardware problem as they have picked the wrong hardware for the job.

      When you split the possibilities into two cases, you are making the assumption (as I did when I first read it) that they are describing a floating-point register, and not a fixed-point register. In a fixed-point register there is no need to do the accumulation each step in order to increase the error. When the two multiplicands are scaled prior to the multiplication they are truncated into the same range. This is the opposite to floating-point where they are scaled without truncation, and any truncation happens after the operation.

      So a difference in magnitude between the constant value and the linearly increasing (integer) value creates an increase in both absolute and relative error. Beautiful - it's almost as if they were designing it to fail.

      Bad hardware decisions indeed. I would have gone for either floating-point, or a large enough fixed-point register to tolerate a larger difference in magnitude.

      --
      Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
    213. Re:Poor QA by Jeremi · · Score: 1

      My ninth grade science teacher used to tell us "All error is human error." He was right too

      Nah... I've seen animals make mistakes.

      --


      I don't care if it's 90,000 hectares. That lake was not my doing.
  2. Curse of binary floating point by Carewolf · · Score: 5, Insightful

    Use decimal floating point or simple swich to fixed point. Fixed point not used as often as it should, and many developers don't know how difficult ordinary floiting point really is.

    1. Re:Curse of binary floating point by RichardJenkins · · Score: 2, Insightful

      Indeed, this seems more like naive design decisions than computers sucking at math.

    2. Re:Curse of binary floating point by stonefoz · · Score: 0, Troll

      Fixed point? What would that accomplish? Rounding creep happens, irregardless of data type, every time it rounds the last digit, fixed or float.
      Fixed point is never a good idea, bad idea or not, it does speed up things on limited hardware. A missile isn't "budget" though.
      Yes floats are difficult, every operation moves it farther from original guess, it's just guessing the last digit. Only solutions are to not do fractional math at all, or to reload and adjust values periodically. Time keeping however is a subject that been already well researched. Any embedded platform I've seen has at least a dozen app-notes and a dozen different ways to keep accurate time.

      --
      I think I just cashed out all my cool points.
    3. Re:Curse of binary floating point by Anonymous Coward · · Score: 0

      How does fixed point help you on this? The only reason to use fixed point is to speed up calculations on slow embedded systems that are making lots of realtime calculations and using floating point calculations would be too slow. It definitely does not get you out of rounding errors, in fact in many cases it would be far worse than doing floating point in terms of accuracy.

    4. Re:Curse of binary floating point by Carewolf · · Score: 5, Informative

      Fixed point never rounds when operating in the range and precision for which it is designed. In this case they needed a precision of .1, using INT/10 would be 100% accurate and never give them any rounding errors for this use case.

      So, in other words: You are wrong, and should probably considering using fixed point more.

    5. Re:Curse of binary floating point by NeoStrider_BZK · · Score: 0

      Dificult and imprecise. Lots of developers have bugs in their code that they dont even imagine its due to floating point errors. Im mantaining a ARM codebase with lots of floats and now I can see this into action (beforehand I mantained mostly fixed-point apps).

      There are lots simple acts with floats that can improve accurancy that most people are unaware of and that could have saved lives.
      (ok, same goes to fixed-point)

    6. Re:Curse of binary floating point by Carewolf · · Score: 3, Informative

      With fixed point you can choice the basis of the fraction part. A binary fixed point would not help them, but a decimal fixed point of /10 or /100 would. The algrebra of fixed point is the same no matter what base you choice. This means it is fastest way to get decimal based fraction instead of binary fractions (decimal floating point is best with hardware support).

    7. Re:Curse of binary floating point by noidentity · · Score: 4, Insightful

      Unfortunately, 0.1 seconds cannot be expressed accurately as a binary number, so when it's shoehorned into a 24-bit register -- as used in the Patriot system -- it's out by a tiny amount.

      Sorry, 0.1 seconds can be represented EXACTLY in such a system. It doesn't even need floating-point. Here is how such a system could represent the durations of 0.1 seconds, 25.7 seconds, and 123.4 seconds: 1, 257, and 1234. So like you say, fixed-point works here. No need for anything beyond integers in this case.

    8. Re:Curse of binary floating point by PhilHibbs · · Score: 4, Informative

      Well, in this specific instance a decimal system would have been ok, but it isn't a general answer. The general answer is "make sure your increments are divisible into your number base", if they had used 1/8th or 1/16ths of a second, or even 3/32 of a second, as their timer increment then they would not have had this problem. There's no reason why 1/10th of a second has any magic properties.

      In general terms, all number bases have other number bases with which they are incompatible. The inability of binary to represent 1/10 accurately is just the same as the inability of decimal to represent 1/3 accurately. It's only because we use decimal all the time that we overlook decimal's shortcomings (or instinctively compensate for or avoid them) and then blame computers for binary's incompatibility with decimal.

    9. Re:Curse of binary floating point by Beale · · Score: 1

      Using this case as an example: If you use an integer variable for your tick, it's never rounded. Whenever you use it to calculate a time, then you can multiply it by 0.1 to get a much more accurate number than one obtained with the cumulative error of adding on a rounded floating point 0.1 to a rounded floating point sum every tick.

    10. Re:Curse of binary floating point by ceoyoyo · · Score: 2, Interesting

      Or just keep track of things in increments that make sense in binary. 0.1 seconds is arbitrarily chosen to be nice number in decimal. They should have chosen an arbitrary time interval that is a nice interval in binary, the base they were actually using.

      This article isn't about how computers suck at math, it's about how people suck at math.

    11. Re:Curse of binary floating point by pz · · Score: 2, Insightful

      Well, in this specific instance a decimal system would have been ok, but it isn't a general answer. The general answer is "make sure your increments are divisible into your number base" . . .

      Close. Very close. The general answer is no matter what base you select for time, distance, or any other metric that might accumulate errors, be certain to (a) perform a careful error analysis, (b) include some additional safeguard to control the error if there are potentially large downstream effects.

      Just because these computers counted in, say INT/10, and therefore could represent 0.1 seconds exactly does not mean, for example, that the timebase used to drive that counting was accurate and stable. Errors could still accumulate, although probably in a different modality.

      Kids, long-term error analysis is HARD. Errors creep in through unlikely paths, even when you think you've been super careful as suggested by the parent post. While selecting a good numeric representation helps in controlling error accumulation, it is not a panacea.

      --

      Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
    12. Re:Curse of binary floating point by Carewolf · · Score: 1

      Just to correct myself a bit. Yes, floating point is used way too little. People who knows fixed point gets the point, but to anyone unfamiliar with fixed point I would just like to point out; floating point is default for a reason, for generic math it is much safer and more accurate. Fixed point is best for addition, subtraction and small multiplications, and math where you have wellknown ranges and precision requirement. It has absolute precision when used right, but division and large multiplication requires some knowhow to do right. If you need large multiplications, need to divide with variables or use arbitrary numbers in general, then floating point is easier and safer.

    13. Re:Curse of binary floating point by ezzzD55J · · Score: 1

      The wording was a little inaccurate. They meant to say that the number 0.1 can't be represented exactly, of course. Which is true, of course.

    14. Re:Curse of binary floating point by Anonymous Coward · · Score: 0

      The inability of binary to represent 1/10 accurately is just the same as the inability of decimal to represent 1/3 accurately. It's only because we use decimal all the time that we overlook decimal's shortcomings (or instinctively compensate for or avoid them) and then blame computers for binary's incompatibility with decimal.

      Using a programming language that can handle these cases automatically would help. Lisp for example has no issue 1/3 or even imaginary (2+3i) numbers.

      Part of the problem is we're trying to shoehorn solutions into the mainstream languages that may not be up to the task.

    15. Re:Curse of binary floating point by sulimma · · Score: 3, Interesting

      I believe that the problem was not that 0.1s could not be represented. After all, the article states that there were 0.1s ticks and they likely counted ticks as integers. No problem there.
      However, I gues that 0.1s was no integer multiple of the system clock. If for example the tick should occur after 6,666,666.67 clock cycles, the system likely emitted a tick after 6,666,667 clock cycles. Such a system would accumulate 3.3 clock cycles of error each second.

      The solution is to keep an explicit error term: Use Bresenhams line drawing algorithm. Imagine drawing a line where X are the clock cycles and Y are the ticks. Minimum error integer algorithms are known for decades for this problem and Bresenham is a very elegant one.

    16. Re:Curse of binary floating point by noidentity · · Score: 1

      The wording was a little inaccurate. They meant to say that the number 0.1 can't be represented exactly, of course. Which is true, of course.

      Which is false as well. Your posting itself represents the number 0.1 precisely, and that is stored in plain ASCII. The integer 1 also represents it precisely, where the unit is 0.1.

    17. Re:Curse of binary floating point by Anonymous Coward · · Score: 0

      You might want to check your own English before correcting someone else's English. I'll fix it for you:
      Also, "irregardless" is not a word.

      Regardless of that, has it not been argued on /. enough how language evolves? Some times, it doesn't make sense. In this case, Wikipedia has a decent article concerning the word.

    18. Re:Curse of binary floating point by Anonymous Coward · · Score: 0

      mods, don't judge too harshly. this idiotic post was the result of a meatware rounding error.

    19. Re:Curse of binary floating point by Binder · · Score: 1

      This is why writing software is more than simply learning a language. You also have to learn how the machine works.

      Also, this isn't a problem with binary representation. You have the same issue with decimal representation. Try dividing 10/3 and then performing a bunch of math on it.

    20. Re:Curse of binary floating point by Anonymous Coward · · Score: 0

      Has little to do with the decimal vs. binary base. If there is a fixed granularity, you don't need floating point. Storing the value as an integer representing tenths of a second (rather than a float representing seconds) removes periodic fractions and rounding errors completely.

    21. Re:Curse of binary floating point by slackergod · · Score: 1

      Or, failing that, measure time in .125 (1/8th) second increments instead of 0.1, and then it will align with the binary floating point representation. Voila, no error.

    22. Re:Curse of binary floating point by Rockoon · · Score: 1

      Just about every financial institution has software designers that will patently disagree with you, and they'd be right.

      You use the right tool for the job. A base-2 encoding (such as all those IEEE floating point formats) of 0.1 doesnt work, but a base-10 encoding (such as the datatypes used by financial institutions) does.

      Most financial institutions are 100% accurate to at least 5 decimal digits beyond the decimal point, and in most transactions, spare accuracy is rounded to this limit.

      --
      "His name was James Damore."
    23. Re:Curse of binary floating point by pz · · Score: 1

      Has little to do with the decimal vs. binary base. If there is a fixed granularity, you don't need floating point. Storing the value as an integer representing tenths of a second (rather than a float representing seconds) removes periodic fractions and rounding errors completely.

      Sorry, wrong. What happens if you need to use these values in calculations? Either you will have gross errors due to forcing a particular granularity, or you will have accumulating errors due to repeated rounding. Most likely, you will have both. Using INT/10 in this case is going to fix the problem of accumulating error in the representation of time, but not in any calculations that contain it for exactly the same reasons that a too-short float didn't work in the original design.

      As stated in another reply, error analysis is HARD. Simple solutions, even saying that you'll just use quad precision floats everywhere, rarely work as well as anticipated.

      --

      Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
    24. Re:Curse of binary floating point by Miamicanes · · Score: 1

      Why did it even HAVE to be represented as an exact multiple of 0.1 seconds? As opposed to something a bit more binary-friendly, like 1/8 or 1/16th of a second? Was this just a case of 1980s Waterfall Design Paradigm, where somebody who didn't necessarily understand the precise characteristics of the hardware involved pulled the 1/10 second specification out of a metaphorical hat because it sounded good, then everyone below that point was expected to just unquestioningly implement it instead of asking WHY it had to be an exact multiple of some error-prone value early in the design process?

    25. Re:Curse of binary floating point by PhilHibbs · · Score: 1

      Just because these computers counted in, say INT/10, and therefore could represent 0.1 seconds exactly does not mean, for example, that the timebase used to drive that counting was accurate and stable.

      Ah, but that's a hardware problem, I'm a software guy. :)

    26. Re:Curse of binary floating point by RegularFry · · Score: 2, Informative

      Ok, now go and read the article. The Patriot bug was a problem with fixed point maths. The Ariane bug was integer overflow. The Intel FPU bug was caused by a production error with nothing to do with the arithmetic actually being performed.

      --
      Reality is the ultimate Rorschach.
    27. Re:Curse of binary floating point by 4D6963 · · Score: 1

      Or simpler yet, count the ticks, and every time run a ticks_to_time() function. Arguably less efficient than that Bresenham thing, although at that point that's peanuts we're talking about, and this simpler way it's more robust and less error/confusion prone.

      --
      You just got troll'd!
    28. Re:Curse of binary floating point by 4D6963 · · Score: 1

      Really, there's nothing hard about this problem. Don't keep try to directly keep track of the seconds, keep track of the ticks, then convert them into the time units you need when needed. This way it doesn't matter what impossible fraction of a second each tick represents, no errors will creep in except your usual non-cumulative rounding error when converting (but at this point it's really trivial to make sure you get enough precision out of it).

      --
      You just got troll'd!
    29. Re:Curse of binary floating point by OldTOP · · Score: 1

      There's more to it. Computers making math processing so cheap that we tend to just throw in the numbers without thinking about what's going to happen. TFA mentions that in the last paragraph, but unfortunately most of it is misleading sensationalism.

      When you perform subtraction, you generally lose precision. 1234678 - 1234567 = 1. You start with 7 digits of precision and end up with 1. Look at it this way: if the first number was off by one, and error of about one part in a million, your answer would become 0 or 2 instead of 1 -- an error of one part in one.

      When you are designing a control system, you have to understand what precision you need in your final answer, and you have to know whether each step in your algorithm maintains the necessary precision. If you don't know how to do that, you're not qualified to design the algorithm. Unfortunately, it's quite possible that neither you nor your bosses know that, so you'll design it anyway.

      Handing people a copy of Excel and saying, "Here, this thing will do math for you -- it multiplies and divides and does a whole lot of statistical functions" is like putting them in an airplane and saying "Here, this thing has controls to make it go up and down, left and right -- go fly it."

      It is true that computer software tends to hide the internals of how arithmetic gets done, and as a result it's particularly easy to get into trouble. The problems have been understood since before computers were invented. What's changed is that you used to have to study to find out how to do mathematical computation, and in the process you might learn enough to avoid the problems. Now the software tends to be distributed without even small print warnings that there are problems and you can get into serious trouble if you don't understand how things work.

      --
      The universe was intelligently designed. Unfortunately God was in a hurry so he coded it in Java.
    30. Re:Curse of binary floating point by Anonymous Coward · · Score: 0

      Ironically, fixed point multiplication/division is actually SLOWER on today's hardware than floating point. You need to do the actual multiplication with extra precision and then shift afterward. There is one reason to use it and one reason only: guaranteed 100% accuracy.

    31. Re:Curse of binary floating point by stonefoz · · Score: 1

      So clocks only move at .1 intervals and I never have to use division? What wonderful world to you live in? I expressed it exactly for fix, it's a speed hack that still caries all the problems of float. Not if there is something to be measured that does in fact only happens in quantified increments, then yes, count on those. Time is not one of those things.
      So you are quite wrong about fixed point not rounding. 10/3 is always going to be estimated without storing it in a large-number, fractional representation. The last number is always a guess, else it isn't a measurement. Measurements are always estimated, and counts can be held in exact amounts in binary. Time is not a count however, as time does not only change at specified increments.

      --
      I think I just cashed out all my cool points.
    32. Re:Curse of binary floating point by Anonymous Coward · · Score: 0

      In your case, you are not storing the fractional values .1, 25.7 and 123.4. You are storing the integers 1, 257 and 1234 and then dividing by ten after retrieving them form the system.

    33. Re:Curse of binary floating point by Entropius · · Score: 1

      This, incidentally, is a point that sophomores who have been writing C code for a month and a half can figure out. (My students have.)

    34. Re:Curse of binary floating point by Tacvek · · Score: 1

      Sure, but aren't finical systems using fixed point data types just about everywhere? Additonas and subtractions are the most common operations on fincial numbers as far as i know, with the next most common operating being mutliplications/divisitons within a limited range of values (that is multiplications by more than 1000 are rare, as as multiplications by less than one thousanth are also rare.) Seems fairly ideal for the use of fixed point representations. If floating point representations are used anywhere in a finacial system, I'd hope they would be BCD encoded floating point operations, as the various rounding errors encountered in the use of such a system generally match human expectation fairly closely, since they generally match the types of rounding errors found in doing the same calculations with paper and pencil.

      --
      Stylish sheet to fix many problems in Slashdot's D3: https://gist.github.com/801524
    35. Re:Curse of binary floating point by DiegoBravo · · Score: 1

      The discussion in itself is off-topic. In the real world no continuous magnitude can be exactly measured nor represented. There is nothing as "exactly 0.1 seconds". Any physical quantity is an approximation (like the numbers in digital computers.) The problem was in the system design that apparently (and incredibly) didn't account for that.

    36. Re:Curse of binary floating point by Rockoon · · Score: 1

      They key point is not fixed vs floating point, but rather base-10 vs base-2.

      One common standardized datatype is known as Currency in older Microsoft languages (such as VBA), which is a 64-bit fixed point representation using a base of 1/10000th. Another older type is the floating point BCD you mentioned (there are several different common implementations, but they are not interchangeable due to encoding specifics, such as how sign and decimal point are represented)

      Fixed-point in-and-of itself doesnt solve the problem. Its not a magic bullet. Typically fixed point is a power-of-two in every other area of usage, for obvious efficiency reasons. Many CPU's (especially DSP's) even have instructions for working with these directly.

      --
      "His name was James Damore."
    37. Re:Curse of binary floating point by noidentity · · Score: 1

      And? You are storing precise representations of the values 0.1, 25.7, and 123.4, just as the text of this message also precisely represents those values. What would it mean to store the "real" value, rather than just a representation? Anything in the physical world will be a "mere" representation of concepts like the value 25.7.

    38. Re:Curse of binary floating point by SharpFang · · Score: 1

      1/10 of a second has the same magic properties as 1/50 and 1/100 of a second, if obtained from 50Hz power grid: every device plugged into the same grid gets the same number of ticks. The frequency may float a little up or down, but remains consistent throughout the whole grid, meaning no costly, unreliable and difficult to implement synchronization subsystems.

      --
      45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
    39. Re:Curse of binary floating point by Ironsides · · Score: 1

      Based on what I've seen elsewhere, here's probably what happened: The radar system that went with the patriot battery was already in existence and that is what they used. They wouldn't be able to modify all the systems, so they used them as was. The designers of the patriot battery had the interface specification of the radar system and implemented the patriot software as they choose. While this may not have been a problem for the initial use of shooting down bombers, which fly considerably slower, it became a problem for SCUDS which fly much much faster.

      Also, from here we see that the system wasn't originally intended to be turned on for more than a few hours at a time. So, the added functionality of being able to shoot down SCUDS was due to a waterfall design (which is still in use today), but the knowledge of the clock error was probably lost at that point.

      --
      Fly me to the moon Let me sing among those stars Let me see what spring is like On jupiter and mars
    40. Re:Curse of binary floating point by pz · · Score: 1

      Really, there's nothing hard about this problem. Don't keep try to directly keep track of the seconds, keep track of the ticks, then convert them into the time units you need when needed. This way it doesn't matter what impossible fraction of a second each tick represents, no errors will creep in except your usual non-cumulative rounding error when converting (but at this point it's really trivial to make sure you get enough precision out of it).

      Yes, but now you're assuming that the timekeeping of clock ticks is 100% accurate. I am not familiar with the hardware in question, so while it is entirely possible that it could be using an ultra-accurate timebase that would drift less than seconds over the projected lifetime of the equipment, given the reported quality of the software, I have my doubts.

      In other words, even if the timebase counting is made completely accurate (which it previously was not), that doesn't mean it is veridical. Errors can come in from other sources. From the sound of it, this was a software timebase (huge mistake without closed-loop correction), which means that it's susceptible to hardware clock drift, missed or delayed interrupts in the software, numerical conversion errors when going from hardware to software clocks, etc.

      Just because the proposed correction makes the clock count increment by precisely 0.1 does not mean it's therefore truthful without additional system analysis.

      Like I said, error analysis is HARD.

      --

      Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
    41. Re:Curse of binary floating point by Anonymous Coward · · Score: 0

      You are wrong; you said so yourself "for which it is designed". That's why you have to check for range overflows in critical software. Also his point is that there are other points of failure other than simple arithmetic.

    42. Re:Curse of binary floating point by 4D6963 · · Score: 1

      True, the clock might be unreliable. But you're supposed to know how unreliable.

      --
      You just got troll'd!
    43. Re:Curse of binary floating point by Darinbob · · Score: 1

      Or switch to an interval that doesn't have roundoff problems... Though more issues may arise from that.

      However I don't like the title or implications of the article. The problem isn't that computers are bad at math, it is that some of the engineers are either bad at math or bad at verifying the code. The computer was undoubtedly performing the requested operations perfectly.

    44. Re:Curse of binary floating point by PhilHibbs · · Score: 1

      Patriot anti-missile systems run off the mains?

    45. Re:Curse of binary floating point by jbolden · · Score: 1

      There is another solution, arbitrary precision systems. There you get the safety of fixed and floating combined in exchange for lots of speed.

    46. Re:Curse of binary floating point by Anonymous Coward · · Score: 0

      I'm still trying to back out why the time at one hour is 3599.9966 instead of 3600.0 (per appendix II of the GAO report, off by 0.0034 sec). That's about 14 least significant bits bigger than the mantissa LSB for a single precision float holding 3600 as a value. So OK, per the report time is kept as ticks of 0.1 sec in an integer. If that's true, then one hour of ticks means that the integer would reach 36000. If you introduce an error of one floating point LSB during the conversion, that's 35999.9961 cast as a single float, or off by 0.0039 deci-seconds { i.e., in MATLAB: single(36000) - eps(single(36000)) is 35999.9961 }. Hmmm, that's 0.0039 which is something close to 0.0034. But how you could convert deci-seconds to seconds without also shrinking the error to 0.00039, ten times smaller? Subtracting at the wrong time? Having an intermediate cast to an int?

    47. Re:Curse of binary floating point by SharpFang · · Score: 1

      I wouldn't be surprised the least bit if it was the recommended power source with battery backup.

      100h on battery power is a lot for any device. It would run either off mains or off a car-based generator, into which the whole battery of missiles would be plugged, providing the same synchronization.

      --
      45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
    48. Re:Curse of binary floating point by Falconhell · · Score: 1

      I know I will probably sound dumb, but could someone briefly explain the difference between floating point and intger?

      I am really good a fixing things, but not so good at maths.....

    49. Re:Curse of binary floating point by Anonymous Coward · · Score: 0

      I know I will probably sound dumb, but could someone briefly explain the difference between floating point and intger?

      http://lmgtfy.com/?q=difference+between+floating+point+and+integer

    50. Re:Curse of binary floating point by ezzzD55J · · Score: 1

      Unfortunately, 0.1 seconds cannot be expressed accurately as a binary number

      Is what they're bloody well talking about. As a floating point number. I am well aware how 0.1 could be represented perfectly, but that is not what the post is talking about.

  3. Computers can do math by Anonymous Coward · · Score: 0

    Mathematica. 'nuff said.

    the IEEE specification stuff is literally designed knowning there will be calculation errors. Don't use this" create your own number system like mathematica does for 100% accuracy always.

    1. Re:Computers can do math by daveime · · Score: 1

      Fine, using Mathematica, I challenge you to accurately represent and display 1/3rd as a decimal number.

      Any number that is irrational to a certain base can never be stored accurately using that base.

      There will always be some form of rounding, perhaps not in the internal representation (i.e. using numerator, denominator stored as two arbitrary size integers, 1/3 is easy), then at least when you come to display it back to the user.

      Another interesting on is Pi. How exactly do you store accurately a number that has no known exact formula to generate it, and can only be generated using successive approximations which are "closer" than the last, but can never be 100% accurate ?

    2. Re:Computers can do math by mindstrm · · Score: 1

      There are several formulas to generate PI, to any accuracy required.

      In every situation - there is a limit to the accuracy required. Significant digits and all that.

      You only need PI to the same accuracy as the other measurements you are computing against.

    3. Re:Computers can do math by daveime · · Score: 1

      Yes, but the OP's point was that using Mathematica, any number can be stored 100% accuracy ... while that may be tru, my point was regarding irrational numbers that while *can* be stored accurately, will always involve some form of rounding / precision when it comes to displaying it.

    4. Re:Computers can do math by mark-t · · Score: 1

      Fine, using Mathematica, I challenge you to accurately represent and display 1/3rd as a decimal number. Any number that is irrational to a certain base can never be stored accurately using that base.

      Okay, I'm sorry... but I'm going to be a math nazi. You will not believe how often I correct people on this issue.

      1/3 is not irrational. That in base 10 the number happens to go on forever does not make the number irrational, it simply means that doesn't have a last digit.

      Irrational numbers are real numbers that cannot be exactly represented as a division operation where the operands to the division are integers. That's it. This has nothing to do with how much space it would take to write up the number, or the base you represent it in, or anything else having to do with repeating decimals. Heck, if a number has a repeating decimal, then it is DEFINITELY rational.

      Also... just break the word apart.... "irrational"... inside that word is the root word "ratio"... and so would, if taken completely literally, imply that it describes something that is not a ratio. A fraction *IS* a ratio, so this literal interpretation of the word works to some extent... (although in practice, the mathematical definition is a bit more precise, having the requirement that the numerator and denominator in such a fraction also be integers).

      Oh, while I'm on the subject (because it is often cited by people when I try to correct them on the above issue), 22/7 is *NOT* pi. It's an approximation to pi that happens to be accurate to two decimal places, which is pretty good for most purposes, and is only nominally more accurate than the commonly decimal value of 3.14.... adding one more decimal digit to the decimal number, and using 3.141 is almost 10 times more accurate than the fraction 22/7. The 22/7 fractional representation of pi is used a lot because it is easy to remember and can be faster to use when doing math manually, not because it's such a good approximation.

    5. Re:Computers can do math by fbjon · · Score: 1

      > "Another interesting on is Pi. How exactly do you store accurately a number that has no known exact formula to generate it, and can only be generated using successive approximations which are "closer" than the last, but can never be 100% accurate ?"

      No problem, use base pi. Like so:

      10
         pi

      I.e. 10 in base pi which translates to:

      pi^1 * 1 + pi^0 * 0 = pi
                               10

      Obviously this doesn't make things much easier, but then playing around with bases rarely do.

      --
      True confidence comes not from realising you are as good as your peers, but that your peers are as bad as you are.
    6. Re:Computers can do math by xouumalperxe · · Score: 1

      Here's an experiment. Go to wolframalpha and ask it to calculate sqrt(2) *sqrt(2). It'll tell you that the result is 2. Just to make sure you're not looking at clever truncation of floating point numbers, it'll also say that the result is "two". That is, it "knows" that the result is 2 algebraically, rather than numerically. Internally, it's working with symbolic calculus rather than numerically. If instead you try "N[sqrt(2)]* N[sqrt(2)]" (N is the numerical approximation function), it'll return "2.". That is, it's approximately 2. It'll gladly do the same with Pi. Hell, try fiddling with variations on e^(i pi) and you'll see what I mean. At the end of the day, what this means is that approximations are taking place until you have to make them (or explicitly ask for them). Sure, it's slower. But it's also arbitrarily precise.

  4. Why bad programmers suck at math by Anonymous Coward · · Score: 0

    So, in other words, the programmers for this piece of "mission-critical" software were not aware of floating point arithmetic and error propagation? What does that have to do with "computers" in general?

  5. Fixed point numbers? by Big_Mamma · · Score: 5, Insightful

    Use fixed point numbers? You know, in financial apps, you never store things as floating points, use cents or 1/1000th dollars instead!

    Computers don't suck at math, those programmers do. You can get any precision mathematics on even 8 bit processors, most of the time compilers will figure out everything for you just fine. If you really have to use 24 bits counters with 0.1s precision, you *know* that your timer will wrap around every 466 hours, just issue a warning to reboot every 10 days or auto reboot when it overflows.

    1. Re:Fixed point numbers? by DarkOx · · Score: 2, Insightful

      yea because the missile counter measures failed to fire because the system was doing its scheduled reboot is so much better than the missile counter measures failed to fire because of timer precision

      --
      Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
    2. Re:Fixed point numbers? by dr_wheel · · Score: 2, Interesting

      yea because the missile counter measures failed to fire because the system was doing its scheduled reboot is so much better than the missile counter measures failed to fire because of timer precision

      The OP's suggestion for scheduled reboots could be solved by having redundant systems, no? System X comes up at 0 hour mark, System Y comes up at 233 hour mark. System X switches to System Y and reboots at 466 hour mark; System Y only has 233 hours uptime.

    3. Re:Fixed point numbers? by hitmark · · Score: 1

      how about this then, have two systems, set up so that for a amount of time each 36 hours they process the tasks in parallel, so that the most recently rebooted can take over control while the other reboots.

      --
      comment first, facts later. http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm
    4. Re:Fixed point numbers? by RobertLTux · · Score: 1

      There is the concept of "Fail safe" if the system is down (for reset reloading or whatever) then the folks up in C&C know that its down and can do other things but if its "working perfectly" but is in fact not then the folks in C&C don't know this (and land up going boom).

      besides a properly built EMBEDDED MILITARY GRADE system should not take more than a couple minutes to "reboot" so you have a couple F16s in the air patrolling to watch for incoming "stuff"

      --
      Any person using FTFY or editing my postings agrees to a US$50.00 charge
    5. Re:Fixed point numbers? by Bacon+Bits · · Score: 1

      Humans have had to handle what happens when your second and minute counters overflow for centuries. Teaching a machine to do it can't possibly be that hard. How about you simply write your code to handle resetting the clock's counter back to zero as necessary? You know, like we already do every 12 hours? (Or 24 for military time.)

      As long as you don't overflow, you're not confused about what the clock timer means. Rarely does a precision counter like this really need to know when it started or how long it's been running except in the short term. In this case, from the time the target is detected to the time the interceptor is detonated. You might be able to initialize and start the precision clock as late as when the target is detected. Or perhaps use the sign bit to signify a rollover or something. You can't tell me this isn't a problem that's already been solved hundreds of times in dozens of acceptable ways.

      --
      The road to tyranny has always been paved with claims of necessity.
    6. Re:Fixed point numbers? by ceoyoyo · · Score: 1

      They had much the same recommendation for the floating point system - just reboot it every 36 hours. The operators decided to ignore that recommendation because there was a bit of a war going on.

    7. Re:Fixed point numbers? by Anonymous Coward · · Score: 0

      Wow - you're suggesting buying an entire redundant system to avoid making a little software fix.

    8. Re:Fixed point numbers? by dword · · Score: 0, Troll

      What the fuck is this? Who the hell even THOUGHT of putting this on Slashdot? It's common knowledge among slashdotters, because most of us are programmers or have dealt with programming in our past and by "most" I mean 999/1000 (no joke intended). You can find this in absolutely any book that explains how computer math works. You can find this information all over the Internet and in tons of books. This is not news, it's just a reminder for beginner programmers who've used computers for only a few months. In school we used to do jokes like these, showing our class mates that the Calculator in Windows was broken and the teachers always explained to us why they were broken. My parents who only use Yahoo! Messenger at home and Excel at work know this.

      I know that journalism is sensationalism, but this story just plain sucks.

      Fuck you and you shitty news, Slashdot! I've had enough of your crap and I'm OUT OF HERE. Seriously, how do I delete my Slashdot username?

    9. Re:Fixed point numbers? by Florian+Weimer · · Score: 1

      Use fixed point numbers? You know, in financial apps, you never store things as floating points, use cents or 1/1000th dollars instead!

      As Excel uses floating point, so the "never" part doesn't appear to be complete true.

    10. Re:Fixed point numbers? by Florian+Weimer · · Score: 1

      Eh, Excel uses binary floating point. Financial apps should use decimal floating point or fixed point, though. Float vs fixed doesn't matter that much as long as you've got a sufficiently large mantissa, binary vs decimal is key.

    11. Re:Fixed point numbers? by Anonymous Coward · · Score: 0

      Wow - you're assuming the only benefit to a redundant system is protection against a single software fix.

    12. Re:Fixed point numbers? by RegularFry · · Score: 1

      The systems should be redundant anyway, because routine maintenance needs to happen to every system every invented.

      Besides, you're underestimating exactly how complicated this "little software fix" is, given the hardware constraints.

      --
      Reality is the ultimate Rorschach.
    13. Re:Fixed point numbers? by Anonymous Coward · · Score: 0

      it would be cheaper to use half-interval arithmetic

    14. Re:Fixed point numbers? by RegularFry · · Score: 1

      Use fixed point numbers? You know, in financial apps, you never store things as floating points, use cents or 1/1000th dollars instead!

      Computers don't suck at math, those programmers do. You can get any precision mathematics on even 8 bit processors, most of the time compilers will figure out everything for you just fine. If you really have to use 24 bits counters with 0.1s precision, you *know* that your timer will wrap around every 466 hours, just issue a warning to reboot every 10 days or auto reboot when it overflows.

      The Patriot designers did precisely this (except it was supposed to be reset every 36 hours, not 10 days), and at least 28 people died as a direct result.

      --
      Reality is the ultimate Rorschach.
    15. Re:Fixed point numbers? by 4D6963 · · Score: 1

      If you really have to use 24 bits counters with 0.1s precision, you *know* that your timer will wrap around every 466 hours, just issue a warning to reboot every 10 days or auto reboot when it overflows.

      Or better yet if you're worth your salt make sure the rollover is no problem, and make sure the rollover is tested.

      --
      You just got troll'd!
    16. Re:Fixed point numbers? by maxume · · Score: 1

      Of course, you mean that 28 people were not saved by the system. They died as a direct result of an enemy missile exploding near them.

      --
      Nerd rage is the funniest rage.
    17. Re:Fixed point numbers? by Anonymous Coward · · Score: 0

      So the solution is to build 3 redundant systems to overlap while the other reboots?
      please. Just build a system that doesn't have to be rebooted because they made a programming mistake.

    18. Re:Fixed point numbers? by nomel · · Score: 1

      Or, you could do what you should have learned in your first ASM class, use more bytes! 64bit math is perfectly fine on a 8bit microcontroller, just a little slower. Extend the counter so it's in the range of months or years or tens of decades. Just make sure it's a little longer than some other required scheduled maintenance, like 2 election cycles. You should never reboot a MISSILE DEFENSE SYSTEM because of software, that's just TERRIBLE design. I can't believe you people!

    19. Re:Fixed point numbers? by cathector · · Score: 1

      or better, write software that can deal with the timer wrapping around. it's not that hard.

    20. Re:Fixed point numbers? by syousef · · Score: 1

      Computers don't suck at math, those programmers do.

      THANK YOU. I thought I had lost my mind reading post after post that went into details, suggested fixed point etc. but not made the simple statement you did. The story's statement that "computers suck at math" is drivel. It's not worthy of a 10 year old let alone a computer professional - which many of us supposedly are.

      I am HORRIFIED by how poorly understood floating point is among programmers. The idea that someone would blame the computer for this sort of thing smacks of an unprofessional idiot blaming his tool (when he has simply been using the wrong tool all along).

      --
      These posts express my own personal views, not those of my employer
    21. Re:Fixed point numbers? by sohp · · Score: 1

      You'd think after so many decades of business programming this would become common knowledge. But I can personally attest to a system written (using Java) in the early 2000s that initially used float for money. When I was brought on to the project I pointed this flaw out. Neither the technical lead nor the programmer who had a math degree could be convinced it was a problem. The other programmers, chair-warmers with a COBOL background on their first Java project, ought to have known but they'd just leaned on Money to watch their backs.

      Naturally the rounding errors eventually showed up on undeniable errors and I attribute the eventual abandonment of the project in part to the loss of time caused by having to backtrack and fix all the floats to use Java's BigDecimal.

    22. Re:Fixed point numbers? by RegularFry · · Score: 1

      Reading back, yes. If the trajectory prediction had been correct, the interceptor would have been launched, giving the dead a chance to have been saved.

      --
      Reality is the ultimate Rorschach.
  6. Old news on an ancient design by Anonymous Coward · · Score: 1, Informative

    1) This problem was covered in Risks Digest years ago.
    2) Design and production phase was completed in 1980.

    http://catless.ncl.ac.uk/Risks/10.82.html#subj1

    is a good start for "Why the hell are we using this weapons system the way we are?"

    As memory serves the fix is to restart the system perodically.
    As memory also serves that's been part of the operating procedure for a very long time.

    1. Re:Old news on an ancient design by Lonewolf666 · · Score: 1

      Also keep in mind that the development on that kind of systems takes years. And once they are in service, they tend to stay in service for decades. When I served my 15 months of military service in Germany in 1987, I encountered equipment, including some electronics, that was older than me.

      The architectural decisions on the Patriot missile system were probably made before 32 bit computers in a size/prize range appropriate for mobile systems were available. Hence, the 24 bit register with the poor accuracy.
      A few years later, the designers would have deserved a beating for not using something like the Motorola 68000 and storing the time in a 32 bit integer. In 1980, it might still have been a good idea to do that in software but I'm not entirely sure if the performance would have been sufficient.

      --
      C - the footgun of programming languages
  7. Why the author sucks at math... by allcaps · · Score: 1

    Shouldn't we focus on the fact that without computers, even MORE people would die? This article seems to make the conjecture that somehow these instruments are worthless, but it appears the writer of it sucks at math as well.

    # ppl who would die without computers -MINUS- # ppl who die with computers = # of lives SAVED by computers.

    That second # isn't bad, it was already there before computers came along!

    1. Re:Why the author sucks at math... by betterunixthanunix · · Score: 1

      Except that people tend to rely on computers, and take risks they would not have otherwise taken. I am not saying that the number of deaths resulting from computer errors is going to be higher than other deaths, but that it is not as simple as "every death caused by a computer error is a death that would have happened before computers." If you knew your enemy was launching missiles at you, and you had no missile defense, what would you do to protect yourself? What would you do if you did have missile defense?

      --
      Palm trees and 8
    2. Re:Why the author sucks at math... by iamacat · · Score: 1

      In case of Patriot system, it has always been known to be effective only for a fraction of incoming missiles. So you would presumably run to the bunker in either case.

  8. Stupid article, too by hellfire · · Score: 5, Insightful

    Translation: computers are only as smart as the people programming them... and there's plenty of stupid people out there.

    We knew this. This is no great revelation. So why is this news?

    --

    "All great wisdom is contained in .signature files"

    1. Re:Stupid article, too by Anonymous Coward · · Score: 0

      It's news because this particular brand of stupid got some people killed.

      The obvious solution is to stop bickering over dirt and oil and all the other silly shit we do. But that will never happen. Because of another particular brand of stupid.

      Welcome to the future!

    2. Re:Stupid article, too by Anonymous Coward · · Score: 0

      because "interval arithmetic" as a part of math still needs to be discovered by some - after 40 odd years of its existence...

    3. Re:Stupid article, too by h00manist · · Score: 1

      I had the same reaction - stupid news. Old technical problem with old solutions, but someone thought they had a "catchy headline". And conclusion. It allows for radicalizing - involves missiles, war, deaths, a lot of money, national pride, terrorism, religion, politics, corruption... sex lies and videotape. No big news, all the same, some engineers screw up, some people do wars and weapons, and some people die. If the same error was in a videogame, or in an Intel CPU or Excel spreadsheet calculation error, it would be boring. A thousand monkeys typing, last anyone checked, will not produce any decent code. Or politics.

      --
      Build your own energy sources from scratch. http://otherpower.com/
    4. Re:Stupid article, too by Blakey+Rat · · Score: 1

      Even worse it seems to ignore the fact that, while those 22 people may have "paid with their lives" because of the Patriot error, hundreds or even thousands of others were saved in the cases where the software worked. Is he arguing that if the Patriot system never existed, those 22 people would be alive? No, they'd still be dead, and so would hundreds of others who are alive now.

      "Perspective" seems to be in short supply.

    5. Re:Stupid article, too by Anonymous Coward · · Score: 0

      because lots of people thing that computers are precise, accurate computational devices. It is easy to forget that someone sat down and wrote lines and lines of code that allowed the computer screen in front of them to very quickly spit out an answer to a mathemetical question. To many users, the computer is literally solving the equation by itself accurately.

      I'm not saying anything, I'm just sayin'.

    6. Re:Stupid article, too by d_54321 · · Score: 1

      Although this is an old story, the message is timeless (pardon the pun).
      The news here is for the stupid people who read this as either an important reminder, or first time exposure to a lesson in what not to do.

    7. Re:Stupid article, too by Anonymous Coward · · Score: 0

      A thousand monkeys typing, last anyone checked, will not produce any decent code. Or politics.

      But they will produce job offers for other monkeys! At least, if our HR department's behavior during the dot com boom is a reasonable example.

  9. What?! by jointm1k · · Score: 5, Insightful

    of 0.1-second ticks since the system was started up. Unfortunately, 0.1 seconds cannot be expressed accurately as a binary number, so when it's shoehorned into a 24-bit register

    All they had to do is use integers, where a value of 1 represents 0.1 s.

    --
    You know it makes sense, a little reminder from jointm1k.
    1. Re:What?! by slonik · · Score: 1

      Mod the parent up! It is the best common sense solution every software engineer must know.

    2. Re:What?! by 91degrees · · Score: 1

      Except it's never that simple. Other components rely on the data being in seconds. There's all sorts of hardcoded values in seconds. The specification states 0.1 seconds so it's impossible to change it to something more convenient without restarting at the specification stage. You'll end up looking a the code and asking "who the hell wrote this!?"

    3. Re:What?! by UltraAyla · · Score: 1

      GP was saying they should have done this to start with, not that they should go back and do this. But this software should go back anyway - it's broken.

    4. Re:What?! by TheRaven64 · · Score: 1

      Not a problem. You keep the tick in tenths of a second. You convert to seconds by dividing by ten in a floating point value when it's needed. This introduces a small rounding error, and the next time you do it then it will also introduce a small rounding error. These errors, however, are independent of each other. You can then document the error range of the second counter.

      --
      I am TheRaven on Soylent News
    5. Re:What?! by owlstead · · Score: 1

      If you go this way, use a library that implements fixed point arithmetic. This way you can always see what actual type the variable represents and - in many languages - it can prevent overflows. For Java, you could for instance use BigDecimal. Only if such a class is too resource intensive you could switch to a normal integer, but in that case make sure that the name of the variable is well chosen and document its use everywhere.

      Can you feel the bugs appearing already?

    6. Re:What?! by fermion · · Score: 1
      The corralary to this is that computers neither suck or don't suck at math. Computer don't really know how to do anything much. Look at the 6502, it can ADC, ASL, BRK, CMP, AND, INC, and a few other things. AFAIR, it did all of these things wonderfully. Modern processors basically have the same commands, fancied up a bit, and AFAIK, they do these commands wonderfully.

      So, as always, the only things that suck at math are the people programming the computers. These people also probably suck at many other things, which is why they are essentially working for the government instead of building devices that have to compete on the free market.

      But, seriously, these issues are well known, will pretty good solutions, such as scaling which is what the parent was talking about. There are other solutions such as creating a cross referenced code for certain conditions. There are books to help programmers who have not yet acquired the skill to work with number, such as the Numerical Recipies series. There is software to help programmers who do not want to or cannot learn how to work with numbers, such as IMSL library, which, if memeory serves, allows the user to include a target precision. Then there are thousands of basic introductory computer books that will helpthe novice software developer understand rounding errors and how to control them. For instance, one does not do += on floats, or use == for that matter.

      So the only issue left is the resolution of the hardware. If .1 seconds is not enough, then that is the responsiblity of the people who specified the hardware. I know many clocks that accuracely increment up to 83 times per second.

      --
      "She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
    7. Re:What?! by RegularFry · · Score: 1

      I would say to RTFA, but it's so badly written that it doesn't make it clear that this is precisely what they did.

      The problem is that the system clock was counting in 0.1 second increments, but the targeting maths was being done in units of 1s, and the conversion from one to the other was done with insufficient precision for the operating conditions.

      There are more details here.

      --
      Reality is the ultimate Rorschach.
    8. Re:What?! by Anonymous Coward · · Score: 0

      Yeah, they just suck at programming. And they're blaming the computers for all those deaths.
      I blame whatever idiots programmed it, as I'm a 15 year old who programs for fun, and I know better than to make that mistake.

    9. Re:What?! by 91degrees · · Score: 1

      You're assuming competently designed software. What if it's accessed by a pointer in several hundred places, or you make a copy of it somewhere? That's another variable to keep track of.

    10. Re:What?! by jbolden · · Score: 1

      That's the basic idea of object oriented program.

      tentime # a ten time uses a tenths of a second counter
            long int clicks # clicks represent the tenths
      tentime -> out = clicks
      tentime -> seconds = round (div (clicks,10) + ((mod(clicks,10) > 4 ? 1 : 0) ...

  10. retrospective technological excuses by Anonymous Coward · · Score: 0

    'The calculation of where to look for confirmation of an incoming missile requires knowledge of the system time, which is stored as the number of 0.1-second ticks since the system was started up. Unfortunately, 0.1 seconds cannot be expressed accurately as a binary number, so when it's shoehorned into a 24-bit register -- as used in the Patriot system -- it's out by a tiny amount.

    But all these tiny amounts add up. At the time of the missile attack, the system had been running for about 100 hours, or 3,600,000 ticks to be more specific. Multiplying this count by the tiny error led to a total error of 0.3433 seconds, during which time the Scud missile would cover 687m
    '

    Nonsense, it's perfectly possible to design a computer that can accurately tell the time. What caused Patriot to fail was that over an expended period, the clocks went out of sync, between the various dispersed sub-systems. As Patriot wasn't designed to be switched on for so long.

    Regardless, what isn't possible is is to design a system that can accurately track and shoot down missiles in flight. As the Patriot defence system so patently demonstrated. As I recall, it succeeded less than 50 % of the time. Which begs the veracity of the starwars SDI project. Just another excuse to spend billions on the defence budget.

    1. Re:retrospective technological excuses by david+duncan+scott · · Score: 5, Insightful

      Regardless, what isn't possible is is to design a system that can accurately track and shoot down missiles in flight. As the Patriot defence system so patently demonstrated.

      You're right. Just as the failure of Samuel Langley's aircraft demonstrated that man would never fly, the failure of an anti-aircraft missile to destroy only half of the ballistic missiles (targets moving at what, twice the speed of the targets it was designed to destroy?) demonstrates that ABM's will never work.

      --

      This next song is very sad. Please clap along. -- Robin Zander

    2. Re:retrospective technological excuses by TheRaven64 · · Score: 1

      Which begs the veracity of the starwars SDI project. Just another excuse to spend billions on the defence budget.

      No, the point of the SDI project was to make the Russians spend billions of dollars more on their defence budget than they could afford and bankrupt the country. It and the war in Afghanistan were major contributors to the fall of the USSR.

      --
      I am TheRaven on Soylent News
    3. Re:retrospective technological excuses by dave420 · · Score: 1

      It didn't destroy any during the first gulf war. That's why they sent in guys on the ground to destroy the mobile launchers instead of relying on a missile defence system that didn't do squat.

    4. Re:retrospective technological excuses by david+duncan+scott · · Score: 2, Insightful
      OK, we'll go with 0% success. My point is that the failure of any one implementation does not invalidate the concept. Edison tried hundreds of wrong ways to make a light bulb, none of which demonstrated that the light bulb was unworkable.

      Oh, and the Scud hunting in Gulf One was largely an air exercise, as I recall, and of course they went after the launchers. It's always preferable to destroy the enemy on the ground (or in harbor, or asleep in barracks) then when they're incoming. The Japanese didn't bomb Pearl Harbor because it's impractical to sink ships at sea--it's just easier to hit slow- or non-moving targets.

      --

      This next song is very sad. Please clap along. -- Robin Zander

    5. Re:retrospective technological excuses by Entropius · · Score: 0, Flamebait

      In the process Reagan bankrupted us by spending billions of dollars more on our defense budget than we could afford. We've just not maxed out our credit cards from the Bank of China yet, so it's not as obvious.

    6. Re:retrospective technological excuses by BranMan · · Score: 1

      Actually, they did just fine - given the limitations. But, there were thousands of Scud missiles, hundreds of PATRIOT missiles - math was not on our side. BUT, destroy the launchers, and they'd have thousands of paperweights. That was the reasoning.

  11. Didn't read TFA but... by Anonymous Coward · · Score: 0

    Found this article matching the criteria only dated February 28, 1991.

    It just didn't seem plausible did it... How this correlates to modern computer FP calculations is beyond me.

    1. Re:Didn't read TFA but... by peragrin · · Score: 3, Interesting

      because military computers are 20 years out of date to start with. Heck even the awesome modern land warrior hardware, is 10 years out of tech date. Heck they could probably shave 5 pounds off of the hardware by using modern chips, and displays.

      Military Spec is only good at rugged. up to date with the best is far behind.

      --
      i thought once I was found, but it was only a dream.
    2. Re:Didn't read TFA but... by Wonko+the+Sane · · Score: 1

      Military Spec is only good at rugged. up to date with the best is far behind.

      In the year 2000 the US Navy still had a submarine that used a vacuum tube based system to monitor and control the nuclear reactor.

    3. Re:Didn't read TFA but... by Amanitin · · Score: 1

      Critical mission hardware is always far behind in performance, be it military or civilian. The avionics of the 787 will not run on i7, either. The systems are tested for faultless operation until their noses bleed and that takes time.

    4. Re:Didn't read TFA but... by mirix · · Score: 1

      Yeah, what do these run anyways? I've got cash riding on ancient RCA 1802 / COSMAC. The mil seemed to like them, I suppose because they were available in sapphire / rad hard, iirc.

      --
      Sent from my PDP-11
  12. Practical Analysis by mseeger · · Score: 2, Informative

    The problem seems to be right out of the textbook for "Practical Analysis" (not sure if this is the correct translation for the german "Praktische Analysis"). This was a nandatory course for every computer science degree during my university time (20 years ago). Don't know if this is still the case. It was an eye opener to see how correct formulas and a perfectly working computer could yield absurd results. Several times i was asked for help by people claiming their Excel was broken due to such mistakes.

    CU, Martin

    1. Re:Practical Analysis by Anonymous Coward · · Score: 0

      Also at during my time at uni a few years ago. Course in numerics was mandatory to get a degree in computer science and it started right away with discussing those errors.

  13. Because they are programmed by morons by Anonymous Coward · · Score: 0

    The computer is not at fault here. The problem is the moron who thought floating point representation is a good choice for a fixed point value.

    These problems are just too common. Like some game company discovered weird display in their game. Found out that floating point numbers are not very precise when far away from 0, like in a huge seamless world.

  14. That is the programmer sucking by BoneFlower · · Score: 1

    Any first year compsci student should know that this happens, and should know to choose data types that can represent the data to the needed degree of accuracy.

    A simple struct {int integral_part, int decimal_part}; would do the job for this. Or since you care exactly about .1 second increments, you could even use integral values in the first place. With 24 bits, you can cover 19 days before it overflows, and almost half a day on top of that to provide a buffer if bad guys show up right as the scheduled reset comes up.

    100 hours = 3,600,000 ticks? Wait, summary math is wrong. One hour = 60 minutes. Each of those 60 minutes is 60 seconds. 60 sets of 60 seconds is 60 * 60 = 3,600 seconds per hour. 100 hours means 100*3,600 = 360,000. Either they missed a digit and the system was online for 1,000 hours straight or they added one to the final result.

    1. Re:That is the programmer sucking by Anonymous Coward · · Score: 0

      multiply that number by ten as there are 10 ticks per second.

    2. Re:That is the programmer sucking by kmsigel · · Score: 1

      There are 10 ticks per second.

    3. Re:That is the programmer sucking by p_millipede · · Score: 1

      3,600,000 ticks, not seconds. You forgot to times by 10 ticks per second.

    4. Re:That is the programmer sucking by T-Bone-T · · Score: 1

      The summary math isn't wrong. Try reading the part about ticks per second again.

    5. Re:That is the programmer sucking by Anonymous Coward · · Score: 0

      Forgot to "times by"??? Are you six years old?

    6. Re:That is the programmer sucking by Anonymous Coward · · Score: 0

      Aren't we glad he didn't work on those systems?

  15. "User error"? by wisebabo · · Score: 4, Informative

    I actually read about this specific incidence once; I seem to remember (though honestly not sure) that the design flaw was known and the user manual indicated that the computer needed to be reset every 36 hours. However, in wartime, under attack (there were frequent Scud intercepts), the crew controlling the missile battery opted against shutting it down if even for short time. Maybe even though the manual said it SHOULD be rebooted it did not explain WHY or what the consequences would be.

    1. Re:"User error"? by betterunixthanunix · · Score: 5, Insightful

      So they designed a system that accumulated rounding errors over time, and their solution was to ask the system's users to reboot the system every so often? Somehow, that does not add to my sympathy for these programmers...

      --
      Palm trees and 8
    2. Re:"User error"? by dbIII · · Score: 1

      there were frequent Scud intercepts

      Hang on, do you have a link to a documented case anywhere of a successful intercept in the field by such a system?

    3. Re:"User error"? by Hurricane78 · · Score: 1, Troll

      Yep. That was a epic fail.

      The rule is: If a user *can* do something wrong, he *will*!

      How can they not know that?

      --
      Any sufficiently advanced intelligence is indistinguishable from stupidity.
    4. Re:"User error"? by Joce640k · · Score: 4, Insightful

      I'm calling "Horsepoo" on the whole story.

      a) If they knew enough about it to put "reboot every 36 hours" in the manual they knew enough to fix it.

      b) According to the summary, 36 hours would still be a complete miss (a third of 687 meters is still 229)

      c) A fixed point integer (32 bits) can mark tenths of seconds with complete accuracy for over 13 years.

      d) Leaving aside a,b and c, the story still doesn't make any sense. The system would start the calculation the moment it saw the missile, not 100 hours before it appeared on the radar.

      Now ... at the speed of a scud missile (mach 5 if google serves me), it may be that an accuracy of 1/10th second isn't enough to compute the trajectory accurately enough to intercept it. At that speed you might need 10,000th second resolution or whatever. *That* would be believable (but unlikely - the designers would have to be complete idiots).

      The rest of the article? Yawn. It's the same old recycled story we've been seeing since the 1970s (those of us who are old enough).

      --
      No sig today...
    5. Re:"User error"? by hedwards · · Score: 1

      Strikes me as a bit odd that they would've not realized that the rounding error was going to propagate. Especially considering how these days you would do financial calculations in the integer number of cents in a transaction to avoid this funny stuff.

      At the very least they probably should've calculated out for the period of time that the rounding errors would become larger than the tic and just skipped one to get caught up again.

    6. Re:"User error"? by TCPhotography · · Score: 1

      http://en.wikipedia.org/wiki/MIM-104_Patriot#Success_rate_vs._accuracy

      Postol can be ignored, because he is a known liar. He's a very good liar, but a liar non the less.

      As for ABM, we were doing skin-skin kills of ICBMs in the 60s with NIKE systems.

    7. Re:"User error"? by Reziac · · Score: 1

      If the problem was "being without our missiles for the time it takes them to reboot" why not just do a rolling reboot, a few or even one at a time?? If you need all your missiles at once, you're already in more trouble than you can get out of anyway.

      --
      ~REZ~ #43301. Who'd fake being me anyway?
    8. Re:"User error"? by mabhatter654 · · Score: 1

      because financial based systems (like AS400/system i) use "integer" math to store financial numbers to specific decimal places and just like in math/science class programmers have the standard "paper math" rounding rules thumped in early on. (but mostly because floating point was expensive when these were set up)

    9. Re:"User error"? by dbIII · · Score: 1

      I've read that, but I'm talking about successful missile intercepts and not Tornado aircraft from the same side that do not expect hostile missile fire.
      I also remember the "97% success" outright lie.

    10. Re:"User error"? by at_slashdot · · Score: 1

      I think you misrepresents the facts, the common sense told me that they designed the system as well they know, then they discovered the problem and somebody came up with the rebooting hack to keep the system effective. It doesn't make things better if told like this but it doesn't sound like the designers were so callous like you seem to try to portray them, they probably never said or thought... "it works fine just reboot it every 30 hours or so"

      --
      "It is our choices, Harry, that show what we truly are, far more than our abilities." -- Prof. Dumbledore
    11. Re:"User error"? by Anonymous Coward · · Score: 0

      But that's the "Microsoft mentality" that's been drilled into everyone's heads: Don't repair it if it can be corrected by a reboot. Nobody appreciates uptime anymore.

    12. Re:"User error"? by TheKidWho · · Score: 1

      These missiles don't actually *hit* the target, they explode very close to it to destroy it.

    13. Re:"User error"? by Brickwall · · Score: 1
      The rule is: If a user *can* do something wrong, he *will*! How can they not know that?

      Is that why we call them "lusers"?

      --
      What was once true, is no longer so
    14. Re:"User error"? by RegularFry · · Score: 1

      It's not really a hack so much as required maintenance. I'd be surprised if the tolerance wasn't designed in because processor A with its 24-bit fixed point unit came in under budget whereas processor B with its 48-bit (or whatever) unit didn't. There would still be a required reboot time for processor B, it would just be a longer period.

      I should make that a little clearer, perhaps: no matter what the design, there would have to be a periodic resync, and if the quickest and easiest way to do that in the field is a reboot, then I don't see anything wrong with designing that in from the start as long as it's effectively communicated to the users.

      --
      Reality is the ultimate Rorschach.
    15. Re:"User error"? by Anonymous Coward · · Score: 0

      Yeah, assuming the facts of the story is true, then the designer just didn't know what he was doing. There are so many ways to not have this problem, from basing the system on a fixed point type (think TI) to simply not designing the overall system to use such a gross timestep. I mean, really?

    16. Re:"User error"? by Dr.+Evil · · Score: 2, Interesting

      When you write programs which deal with time like this, you never use floating point math. If your required precision is 1/10 of a second, your units are in 1/10 of a second. You do not resort to floating point. I'd probably use 1/100 or go to 64 bit and use 1/10000 of a second. With a high level language, there are better ways to do it of course.

      The reboot hack is a reasonable workaround in the field, as long as the downtime is documented and understood by leadership, and as somebody mentioned, the severity of the problem needs to be communicated to the field. Ship an alarm clock with the launcher, with clear instructions to reboot it and reset the unit when the alarm clock says so.

      The *requirement* of this kind of field maintenance from overstressed people in the field is a bad idea. When writing disaster recovery instructions for fieldwork in normal systems, I like to remind my coworkers...

      "...these instructions are for the *least* qualified admin, three years from now, at 2:00 in the morning, on Christmas, to be able to do this without assistance, with second line management yelling at them, while everyone else is on vacation, partying, or utterly unreachable. They need to be able to find the instructions, and execute them, with a minimum of stress or doubt as to the accuracy of the documentation."

      I've never done military work, but I can just imagine...

      ...the new guy doing shift rotation on the Patriot system at 2am on Christmas, he never got proper training, isn't sure if the last guy rebooted the system, realizes his cell is dead... now there's been talk of heightened awareness. An alarm goes off. There's a sticker next to it. "For the love of all that is holy, Press this button when this alarm goes off!" Does he hit the button?

    17. Re:"User error"? by Anonymous Coward · · Score: 0

      Really, this sounds a lot like the average computer user from the late 1980's to about ... oh, now, when we're still dealing with a system that accumulates errors over time and the designers have asked us to reboot the system, and even reload the OS, every so often. Granted, the Patriot system is a little more critical than the average user's system.

    18. Re:"User error"? by Yetihehe · · Score: 1

      A fixed point integer (32 bits) can mark tenths of seconds with complete accuracy for over 13 years.

      0.1s (error 0.0000001) + 0.1s (error 0.0000001) = 0.2s (error 0.0000002).
      representing time with complete accuracy for 13years doesn't mean anything when errors accumulate. And 24bit floating point has less accuracy than 32bit fixed point, especially as 0.1 is NOT easily represented in binary.

      --
      Extreme Programming - Redundant Array of Inexpensive Developers
    19. Re:"User error"? by Anonymous Coward · · Score: 0

      > representing time with complete accuracy for 13years doesn't mean anything when errors accumulate

      Dude, did you READ what you just wrote??

      "Complete accuracy" = no errors

    20. Re:"User error"? by Sir_Lewk · · Score: 3, Insightful

      Integer arithmetic does not accumulate error, only floating point does that. Now they may have been using floating point, but his point is they should have been using integer arithmetic.

      Had they been doing so, it could have run for 13 years with absolutely no accumulated error.

      --
      "linux is just DOS with a UNIX like syntax" -- Galactic Dominator (944134)
    21. Re:"User error"? by SharpFang · · Score: 1

      The number provided area of the sky where to aim the radar, which then provided exact tracking for the missile. 229 meters would surely be a far miss with the missile but quite enough for the radar cone. 687 not quite so.

      --
      45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
    22. Re:"User error"? by JollyT · · Score: 1

      The error causes the targeting radar to miss the target, not the Patriot missile. While the targeting radar is a narrow beam, it is still much larger than the missile. Looking at the numbers I would guess 250 meters in diameter or so. After the search and acquisition radar picked up the incoming Scud, the targeting radar would be turned on. This radar would be aimed in the wrong place and not detect the Scud. Therefore, no Patriot missile would be launched by the system because there was no target. The error has nothing to do with the missile missing it's target as it was never launched.

    23. Re:"User error"? by whoever57 · · Score: 1

      The reboot hack is a reasonable workaround in the field, as long as the downtime is documented and understood by leadership,

      That depends on the time to reboot and the level of redundancy of equipment. What would you say if a missile hit something because the officers had told the operators to reboot the system just at that time?

      --
      The real "Libtards" are the Libertarians!
    24. Re:"User error"? by Anonymous Coward · · Score: 0

      You mean just like Windows?

    25. Re:"User error"? by Hurricane78 · · Score: 1

      Eeem, this is no troll. Learn to understand the comment and what mod points are, before you moderate, would ya?

      This is really a well-known fact! Come with me, to any arbitrary company. Say a small shop of 50 people, doing heating installations, or something alike. And we'll prove it.
      Why this example? Because I worked at such companies. And it was impressive, how telling them things 10000 times, would change exactly *nothing*! They re-did the wrong things over, and over, and over, and over, and over. It was astonishing, how they always found new ways to use the software in a wrong way.

      How can someone be serious, and say that this is not exactly how it is? I guess troll moderation are for those who disagree, but don't even have enough of an argument to back it up, to fill a single comment.

      --
      Any sufficiently advanced intelligence is indistinguishable from stupidity.
    26. Re:"User error"? by Dr.+Evil · · Score: 1

      What if a missile hit something because shipping was delayed to retool and repeat testing?

      I agree it's always better to fix it properly, but it depends on time lines.

    27. Re:"User error"? by dbIII · · Score: 1

      Thanks, the secondary links from there and the link from Neoprofin did the job.
      I was caught between the obvious and blatant political lie of 97% success back when nothing was shot down and the 100% failure as stated by Postol and had not been able to find anything else. It appears the problem was not so much with the patriots but with the propaganda that said they were perfect. Since it was obvious that they are not perfect Postol's views had a lot of traction.

    28. Re:"User error"? by TwoBit · · Score: 1

      >> b) According to the summary, 36 hours would still be a complete miss (a third of 687 meters is still 229)

      It's not linear. Read the paper: http://www.fas.org/spp/starwars/gao/im92026.htm

    29. Re:"User error"? by BZ · · Score: 1

      > or go to 64 bit and use 1/10000 of a second

      We're talking about hardware and software designed and built in the 70s. Good luck with that 64 bit thing.

    30. Re:"User error"? by RegularFry · · Score: 1

      The thing is, we're already in an environment where the users are trained that if they don't do things right, Bad Things Happen, most likely to them. We're not dealing with untrained users, they have a real incentive to pay attention. Just as an example, look at any photo of a Patriot launch. There's a *huge* flame plume out behind the launcher. Any team member standing there, or any fuel stored there, will have a really, really bad day if Newbie McNewberson pushes the button at the wrong time. So, there must be procedures, and the procedures must be trained-in.

      Yes, in an ideal world, the whole thing would be idiot-proof. However, in this case, you don't really have a choice. You need to resync every so often. You can't force a reboot automatically, because it only takes that happening once during a high-stress situation for the crew to lose all confidence in the system. The alternative is to show a timer, so that the crew knows that they need to resync at some point in the next 12 hours, say, or that they're 10 hours overdue. Once you've made that decision, unless you adequately communicate to the crew the consequences of not obeying the timer, you've created precisely the conditions that actually happened.

      --
      Reality is the ultimate Rorschach.
  16. Computers don't suck at math, some programmers do by YA_Python_dev · · Score: 1

    The problem is the programmer, they should simply have maintained a count of the ticks in an integer and then multiplied it by 0.1 when necessary. Even better, use a proper data type, not a suckish 24-bit float in a freaking weapon, unless they understand very well what are they doing.

    Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41) [GCC 4.3.3] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from decimal import Decimal, getcontext
    >>> n = 0
    >>> tick = Decimal('0.1')
    >>> for i in range(3600000): n += tick
    ...
    >>> n
    Decimal('360000.0')
    >>> Decimal(1) / Decimal(7)
    Decimal('0.1428571428571428571428571429')
    >>> getcontext().prec = 50
    >>> Decimal(1) / Decimal(7)
    Decimal('0.14285714285714285714285714285714285714285714285714')

    And, yes, I know that Decimal in Python 2.6/3.1 is slow. Will be faster in 2.7/3.2. And there are similar libraries in Java and other languages.

    --
    There's a hidden treasure in Python 3.x: __prepare__()
  17. Worst article ever. by Anonymous Coward · · Score: 0

    Talk about misleading headline. All was at the programmers' fault. The computer did no "bad math".

    Stupid humanist journalists should not be writing technical articles.

    1. Re:Worst article ever. by RegularFry · · Score: 1

      Ariane was a management problem. Patriot was operator error. The Intel bug happened in production.

      What was that about programmers again?

      --
      Reality is the ultimate Rorschach.
  18. Computers are great... when used correctly. by thesandbender · · Score: 1

    The author seems to imply that computers can't do simple base 10 math without errors. That's not entirely true if you have a fixed precision. You use an integer and shift it so there is no decimal portion, in this case you would make your base a 1/10th of second instead of 1 second. Addition, subtraction and multiplication will be error free. You'll still have a problem with division and other operations but in this case that doesn't sound like their primary issue. It wasn't the computer's fault that the designers did not account for the fact that 2.0/2.0 != 1 on almost all FPU's today. It usually just equals a really good approximation of 1 that's "close enough".

    1. Re:Computers are great... when used correctly. by noidentity · · Score: 2, Insightful

      2.0/2.0 != 1 on almost all FPU's today.

      Say what? Citations please. Me thinks one of those 2.0 values isn't really 2.0. Hint: printing a value isn't a good way to get its actual value, because the printing function most likely rounds it to fewer digits than it's actually stored as.

    2. Re:Computers are great... when used correctly. by Anonymous Coward · · Score: 0

      Well, that is not the best example. With binary FP formats 2.0 is 1.0B+01 so it has a perfectly accurate representation, and 2.0/2.0 will be done as (1.0/1.0)B(1-1) and so will be a perfect 1.0 (1.0B+00). A better example would be 1.0/10.0.

    3. Re:Computers are great... when used correctly. by TheRaven64 · · Score: 1

      2.0/2.0 != 1 on almost all FPU's today

      Uh, what? In any binary floating point representation, 2.0 will be represented as 1x2^1. Dividing 1x2^1 by 1x2^1 will involve (integer) dividing the mantissas of the first by the second, giving 1/1 = 1. Then the exponent of the second will be (integer) subtracted from the first, giving 1-1 = 0. The result will be 1x2^0, which is already normalised so the final normalisation step will not do anything. Designing an FPU that would give the wrong answer for that calculation would be very difficult...

      --
      I am TheRaven on Soylent News
    4. Re:Computers are great... when used correctly. by Rockoon · · Score: 1

      the fact that 2.0/2.0 != 1 on almost all FPU's today.

      Facts are supposed to be true.

      You have taken a little bit of knowledge (that you shouldnt normally compare floats for equality) but have raced to the wrong conclusion.

      2.0/2.0 is equal to 1.0 and no modern IEEE-compliant FPU will tell you any different, either.

      If we changed your argument to, say, (0.2*10/2.0) != 1.0 then you would be right, if the programming language specified that these calculations were to be performed in IEEE floats or some other base-2 type (because 0.2 cannot be accurately represented)

      --
      "His name was James Damore."
    5. Re:Computers are great... when used correctly. by Eudial · · Score: 1

      Eh, both 2.0 and 1.0 have exact representations in binary floating point, so there is really no reason why you should see an error.

      2.1 / 0.7, however (both lacking exact representations), is likely to produce a significant error.

      --
      GAAH! MY PRINTER IS ON FIRE!!! PUT IT OUT! PUT IT OUT!
  19. don't blame the computer for bad programming by frovingslosh · · Score: 5, Insightful
    It is absurd to blame the computer (or worse, all computers) for what is bad programming. Computers can store a 1/10 of a second perfectly accurately, as long as it is stored in a variable that counts tenths of seconds rather than seconds. It can easily be stored as an integer that way, avoiding any floating point rounding errors.

    There certainly are cases of bad math in computers, particularly Intel computers. But this isn't such an example. This is just a lazy and stupid programmer who didn't understand what he was really doing who should take the blame for the failure that killed people, not the computer.

    --
    I'm an American. I love this country and the freedoms that we used to have.
    1. Re:don't blame the computer for bad programming by Anonymous Coward · · Score: 0

      If computers suck at math, why don't we have some humie computing trajectory paths?

    2. Re:don't blame the computer for bad programming by Anonymous Coward · · Score: 0

      Best Post Ever

      Computers and the software running on them are made by people, and therefore real people are responsible for any avoidable errors that may occur. This attempt to shift fault from developer to product is like a grocery store blaming the steak for being contaminated, and shows just how inept and delusional those who authorized this decision were.

  20. This problem has been solved since the 1960s by tjstork · · Score: 3, Informative

    I remember this from a numerical methods class in the 1980s. To deal with situations like this, you can do one of three things :

    a) Have a function that you sample as a function of t, so you don't get accumulated error.
    b) Have enough bits so that error won't be an issue. This is actually hard to do because floating point errors do stack up pretty quick if you are not careful.
    c) Or, you can have an error term which you can use to make adjustments along the way to account for a lack of precision. Bresenham's line does that more or less exactly when he does his lines. That's why you had "stair stepping" as the algorithm corrected itself along the way.

    If the OP was correct, then PATRIOT failed because it did none of them. My bet is in reality, they simply underestimated the actual error term, but did everything else correct. This could be because of discrepancies in flight control instrumentation or some sensor, or, they were simply trying to save money on bits and didn't really do the calculation as to how far the missile could be off in an error term length seconds of flight at a particular phase in its flight profile.

    Bottom line is, the engineering discipline exists to solve this problem and is really no different than error handling in any guidance system. Putting a man on the moon, launching an ICBM at target, shooting down a missile, are all essentially the same computer science problem from an error management perspective. The Phd's already nailed this decades ago. There's not a fundamental limitation to computing, in this case, merely, a failure or inability of engineers on this project to apply the correct known answer to this problem.

    --
    This is my sig.
    1. Re:This problem has been solved since the 1960s by Anonymous Coward · · Score: 0

      No, it's not because of some bug, or some software guy being stupid. It's because the procurement specification said something like "shall operate for 12 hours". So, among other countless design decisions, there was one that said, "let's do this, because it meets the requirement, and it would take significantly more effort to do better AND TEST THAT IT WORKS". Remember, if your design is perfect to say, 100 hours, then you need to test to 100 hours. If your 100 hour design is only tested for 12 hours, then it's really no better than the original 12 hour design. In today's dollars, that extra 80 hours of test is going to be $10-20k, just for one person to sit there. Now multiply that by 100s of little decisions of the same scope and magnitude. This is how you get massive overruns and missed schedules.

      So it's really a requirements issue. And, of course, the fact that the Patriot is designed to shoot down AIRPLANES not missiles. It was being used way out of it's design space in the first place.

      Time really is money, and on every product, like or not, there is an explicit budget for both dollars and calendar time that you have to meet. Would you rather have your aircraft defense system need a reboot every 12 hours OR would you like your defense system delivered 2 years later, while the programmers put in all sorts of enhancements to make it just a little bit better.

    2. Re:This problem has been solved since the 1960s by ceoyoyo · · Score: 1

      a is a decent solution. b and c, not as much. There are two other good solutions though:

      d) instead of trying to use a nice number in base 10 as your increment, use a nice number in binary as your increment. Since you're actually doing math in binary and all....

      e) use a counter so you keep track of the number of increments that have passed instead of trying to count in seconds (this one is similar to your a).

    3. Re:This problem has been solved since the 1960s by Anonymous Coward · · Score: 0

      Or you can stop overcomplicating the problem and just use fixed point math.

    4. Re:This problem has been solved since the 1960s by RegularFry · · Score: 1

      If the OP was correct, then PATRIOT failed because it did none of them. My bet is in reality, they simply underestimated the actual error term, but did everything else correct.

      Read this, and take into account that the Patriot system was designed to be reset once every 36 hours to protect against arithmetic drift, but the operators didn't want to switch them off in case a Scud flew over while they were rebooting.

      The engineers didn't fail. The manual writers, or the trainers did.

      --
      Reality is the ultimate Rorschach.
  21. Terrible programming by Anonymous Coward · · Score: 0

    That is just an example of a terrible programmer(s)...if you ever programmed in assembly before floating point processors, especially on 8 bit machines, you'd be very comfortable extending your number of bits using fixed point math. Its work, but not hard...terrible people died because of a lazy or uneducated programming team.

  22. Designed by who? by miffo.swe · · Score: 1

    Why on earth didnt they have a clock source other than the standard one? There are numerous sources of correct time like GPS, radio, NTP, clock servers, atom clocks or add in cards. The worst possible clock source is a standard PC. This system was probably faulty by design since the simple clock hardware in a normal server isnt made for keeping exact time.

    --
    HTTP/1.1 400
    1. Re:Designed by who? by cheros · · Score: 1

      Actually, clocks are the biggest challenge in especially complex systems.

      It starts with the question "whose clock do we use?", which in earlier days used to be the gunner's watch. Now it's generally an atomic reference, but even that has its own challenges when things move really fast - at Mach 2, measuring in 0.1 second resolution you could be off by 60 meters (it's possibly more as 0.1 is actually "somewhere between 0.0 and 0.2"), and that's assuming you acquire target when it's at Mach 2 (as far as I could Google it's closer to Mach 5 on impact, a Scud is a ballistic missile). You've got various sources of latency which you have to incorporate (acquisition, transmission, processing, verification etc.).

      Having said this, such deltas can you detect and compensate for, as proven by successful Patriot anti-SCUD shielding.

      Before I forget: in your PC you actually have TWO clocks. You have a hardware clock which the system uses to set the software clock on boot up. Clever people then use a pool NTP server to keep the software clock accurate.

      Not sure how it is in Windows, but if you run an NTP daemon under Unix it will initially sync like crazy with the NTP sources you have supplied until it some statistics build up, after which the checks and corrections taper off. The reason for this is that the corrections aren't just to set the clock from an external NTP source, the daemon also tunes the internal software clock to run as close as possible in sync with the external sources. That way, if the external reference disappears the clock can continue as accurate as possible until access to the external reference returns.

      I have found the whole Network Time Protocol a fascinating example of how to create self correcting references - it's very clever stuff.

      --
      Insert .sig here. Send no money now. Owner may sue, contents will settle. Batteries not included.
    2. Re:Designed by who? by Anonymous Coward · · Score: 0

      What are you on about? They weren't running Patriots with IBM PC's.

      How much of that stuff was available 10 or more years ago? How much of it was available at a reasonable cost (even for military usage)?

      Don't judge if you don't know the circumstances surrounding the design.

  23. Are there any symbolic math Libraries? by wisebabo · · Score: 1

    I'm not a serious developer and certainly not one that works on mission critical systems but I have a question:

    Are there any symbolic math libraries that allow a program to compute and store its interim values symbolically until the final result was needed? (Like, as an AC mentioned earlier, Mathematica?). Of course there would be an memory overhead (but surely the entire Mathematica kernel wouldn't be needed) and performance might be much MUCH slower than current "binary math" libraries but surely in a day of gigabyte RAM chips and gigaflop CPUs (and Terflop GPUs) the added precision would be worth it?

    So does anything like this exist? Would it be hard to develop (that's a challenge for you out there!)

    1. Re:Are there any symbolic math Libraries? by Anonymous Coward · · Score: 0

      Python

  24. Your tax dollars at work by Herger · · Score: 2, Insightful

    This is not an example of computers sucking at math.

    This is an example of engineers and developers failing to draw up valid requirements, failing to develop to specification, and failing to test against real-world use cases.

    Management undoubtedly shares an equal if not greater portion of the blame here. This is typical military-industrial complex, lowest-bidder contractor mentality at work, just another form of corporate welfare if the government doesn't turn around and punish shortfalls like this.

    1. Re:Your tax dollars at work by Anonymous Coward · · Score: 0

      Someone mod this guy up!

  25. Why Computer Suck At Keeping Time by rocketPack · · Score: 1

    Sounds like the computer did the math just fine, but with a flawed clock.... That's classic GIGO!

  26. May I suggest that it's not the computers... by Anonymous Coward · · Score: 0

    ...that sucks at maths, but some programmers?
    OK, that was an easy shot, but really, don't you agree that today's academic courses in science at large are becoming so specialized so soon that good sense stemming from scientific culture cannot be expected any more?

  27. And this is why... by elnyka · · Score: 1

    And this is why it is a good idea to take a Numerical Analysis course or an Assembly course that lets you play with floating-point arithmetic as part of your CS electives. As much as I'd like to blame today's Java/.NET-oriented CS curricula (which seem to be fashionable now in many universities), it's been quite a while that many universities barely pay any attention (if any) to the details of floating point arithmetic.

    1. Re:And this is why... by SpinyNorman · · Score: 4, Insightful

      It's the reporting that's garbage. It makes no sense at all. A system tracking missiles travelling at Mach 3 is keeping track of time to 0.1 sec accuracy?! Do you really believe that? Wanna buy a bridge?

      0.1 sec at Mach 3 is 100m, so you'd have a hope in hell of ever hitting a 3m long target.

      The problem isn't the people working for the defence company, who are hard-core PhDs with some very serious domain knowledge. The problem is people like yourself who are so math illiterate as not to be able to fact check a piece-of-shit story!

    2. Re:And this is why... by Anonymous Coward · · Score: 0

      ++

      ++

      and more ++, to use a tired idiom.

      And as a part of that education, there needs to be a few assignments that force people to calculate out, by hand preferably, in thirds and in sevenths. Hopefully something interesting enough to not be dismissed out of hand as a make-work assignment. Something that gives actual "this is why it fails because I've seen it fail with my own eyes" experience.

      And then hammer the point home, that, to computers, tenths is like thirds and sevenths is to us.

      And follow that up with ideas on how to handle accumulations. For example, storing quanities in units of 1/10 (or 1/100 if money) as a integer, instead of in floating-point. Or, if there's a choice in how a clock is designed, use 1/16ths of a second. Or, increment the floating-point each time and every 10 invocations increment an integer and assign that integer to the floating point. Or any of a myriad of other techniques. And point out that these techniques work for any fixed denominator, not just decimals.

      No need for a full-blown numerical analysis course. Just a good dose of awareness of the issue, ideally coupled with assignments that visits the issue from time to time. I've always been a big fan of assigning floating-point binary-decimal conversion as one of the programming projects. Almost no one gets it right the first time. But don't grade that one, collect the results, and discuss in class and assign a revision for next week, and this one will count. It works.

    3. Re:And this is why... by Plekto · · Score: 1

      0.1 sec at Mach 3 is 100m, so you'd have a hope in hell of ever hitting a 3m long target.

      The missiles range from 10.7M - 12.29M long It "works" because it gets near the missile and then explodes. They figure that the resulting debris cloud will damage the incoming missile and cause it to destroy itself or knock it sufficiently off-course. It doesn't actually "hit" the missile itself.

      To be honest, it's a kludge of a system hardly any better than flak guns were from WWII at hitting enemy planes. The real problem, as someone stated, was to not have overlapping systems that fired multiple missiles at the same targets.

      Still, even some amount of success rate would be better than not having it at all.

    4. Re:And this is why... by SpinyNorman · · Score: 1

      OK, so 10m rather than 3m, but if your accuracy is 100m it doesn't make much difference.

      Do you really believe that a realtime missile tracking system is relying on a 0.1 sec time granularity?!

      The real real problem with the patriot is that is was built as an anti-aircraft missile not an anti-missile missile and it simply doesn't have enough speed, so all it can do is try to predict the position of an incoming missile and be in the same place when the missile gets there.

    5. Re:And this is why... by Anonymous Coward · · Score: 0

      The Patriot system was never designed to take out targets traveling at Mach 3. It was designed for taking out low-speed aircraft (the design is from the 1960/70's).

      So it seems you are the illiterate one unable to fact check a story.

    6. Re:And this is why... by mybecq · · Score: 1

      The article/summary states: The radar looked in the wrong place to receive a confirmation.
      The radar had the 0.1s accuracy, not the Patriot missile that had to hit the 3m target that was never launched due to the radar defect.

    7. Re:And this is why... by Anonymous Coward · · Score: 0

      Assuming that it explodes at a particular point, not on contact....

      What is the blast radius of the patriot missle?
      100m or less?
      What is the velocity of the exploding debris?

    8. Re:And this is why... by Anonymous Coward · · Score: 0

      The videos of the missiles hitting airplanes that you see in the Hollywood blockbuster movies, while spectacular, have very little to do with the way most of the modern SAM missiles actually work.
      Even a small object at high speed have a huge amount of kinetic energy. It is quite possible for a modern warhead to create a cloud of the debris on the flightpath of the target aircraft that will cause the said aircraft to "malfunction". Oh, and don't forget that military never fire a single SAM missile at a target. The standard procedure is 2 or more.

      Let's not fool ourselves, modern SAMs such as Patriot are quite deadly.

    9. Re:And this is why... by Anonymous Coward · · Score: 0

      The question here is not how fast the objects move, but how fast they can change direction. You don't want to know where the missile is, but where it will be.

      If the designers where not stupid as fuck they use the measurements to extrapolate the position and get a trajectory of each object. To get a good extrapolation it is important how many measurements you have, not how fast you do them. The question is can the missile change it's trajectory faster than you can recognize with 0.1 sec intervals? And I'd say since we are talking about long range missiles and not TIE fighters it should suffice.

    10. Re:And this is why... by pruss · · Score: 3, Interesting

      I could see designing the system to synchronize both launch times and observations with a timer tick (it wouldn't be surprising if the whole system was driven by the timer interrupt), and then you're not going to have an error due to the spacing between ticks.

      I am more bit dubious about the 24 bit thing, though. Was it fixed-point or floating-point?

      I don't think it was a float. What would that be? Maybe 16 bit mantissa, 1 bit sign and 7 bit exponent would seem to be the likeliest bet for a 24 bit float. If so, then after about two hours doing t += 0.1 would stop changing t, and the error would be much bigger.

      So presumably it was fixed point. But if you're doing it fixed point, instead of storing x, you store nx in an int, for some appropriate scaling factor n. But if you're going to do that, surely you'll choose n in a smart way, and in this case the obvious choice, as pointed out by many posters, is n=10. This is not only the obvious choice because it gets you more precision, but it's the obvious choice because the easiest, most obvious and most standard way of coding timers is to just increment a register with each tick. It would be silly, for instance, to let n=2^8, and then increment a register with 0.1*2^8 = 0x20. It would be a very unlikely assembly language programmer who would have put an add reg,20h opcode in interrupt hander code when inc reg would have worked.

      Now maybe at some point the timer value would get converted to a float for computations. But that surely wouldn't be a 24-bit float.

      So maybe the article has mangled things and it was not a 24-bit register, but a 32-bit float, with 24-bit mantissa, 7 bit exponent and 1 bit sign, and the "24" in the article came from the mantissa. That's a much more realistic choice. Still, the standard way to handle timers is to just increment a timer variable. So what I could see happening is this. There is a timer system variable t at full 0.1 second precision incremented on interrupt. (That's how PCs used to work--maybe still do--except the timer resolution was 1/30 sec.) Then for their launch calculations, they do: (float32)t / 10. And now they're going to get nasty roundoff errors as the mantissa gets filled up. At the 36 hour point, t is already about 23 bits long. So when you do a float divide by 10, you'll certainly have roundoff problems. But you're still not going to be more than one tick (0.1 sec) off, because each tick still adjusts the mantissa, while the article says they were 0.36 seconds off.

      So I think something got mangled in the article. Or we had a really unlikely assembly language programmer who had floating point code executed with every tick of a timer interrupt. But even if the interrupt is only at 10hz, that's just completely contrary to the instincts of an assembly language programmer. And this would have been done back in the hey-day of assembly language programming, when one would try to optimize every clock cycle one could. (And, yes, I've worked with timer interrupt handlers, both on the Z80 and the 8086.)

    11. Re:And this is why... by Rockoon · · Score: 1

      32-bit floats have 23 bit mantissa's + the implied bit (always 1) = 24 bits of accuracy

      --
      "His name was James Damore."
    12. Re:And this is why... by Anonymous Coward · · Score: 0

      . . . These systems don't "hit" a target, moreso than explode in front of it, letting it fly into a shockwave and shrapnel cloud. I am not disagreeing that .1 sec seems really really slow, but its only really tracking speed and trajectory. If 5-10 seconds of tracking are required to determine future course, .1 sec aaccuracy might be a real figure. Or at least it seems plausible enough an accuracy to get the cloud of shrapnel in the path of the incoming threat.

    13. Re:And this is why... by SharpFang · · Score: 1

      You don't need more precision to get the missile into the radar cone (which was the problem here).
      You don't need much more precision if your missile doesn't hit the other missile, but creates a 50m wide cloud of shrapnel on its route

      --
      45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
    14. Re:And this is why... by Anonymous Coward · · Score: 0

      Jeez, did you even read the story? It said the 0.1 sec time resolution was needed to get the initial radar lock on the missile. Keep in mind that the radar system employed in the interceptor rocket probably had a very narrow pattern to increase range and/or reduce power requirements. Who cares that the missile is only 3m in length? If your radar beam can detect the missile within the 100m radar swath, it sees it. If the missile isn't within that 100m swath, the interceptor does not see it, and does not fire.

    15. Re:And this is why... by Anonymous Coward · · Score: 0

      Yes,
          I can. Google, the report is on a mil side with a pdf. They are not tracking at that point to shoot it down. They are tracking it to make sure its signature is a SCUD. Hence the reason for 0.1, if you track any faster, you run into errors distorting the signature. It is trying to make sure, it is a SCUD and it finds the objects within a certain boxed 3 dimensional space, it knows it must be a SCUD.

    16. Re:And this is why... by Anonymous Coward · · Score: 0

      The SCUDs actually traveled at about Mach 5. Faster by far than I suspected before reading this article. But even then, 0.3433 seconds does not create 687 meters of travel. When I looked this up on Wikipedia it actually had a direct and less-confused quote from TFA than the OP managed to produce:

      "The discrepancy sounds tiny, but over four days it built up to about a third of a second. In combination with other peculiarities of the control software, the inaccuracy caused a miscalculation of almost 700 meters in the predicted position of the incoming missile. Twenty-eight soldiers died..."

    17. Re:And this is why... by Plekto · · Score: 1

      The real real problem with the patriot is that is was built as an anti-aircraft missile not an anti-missile missile and it simply doesn't have enough speed, so all it can do is try to predict the position of an incoming missile and be in the same place when the missile gets there.

      Exactly as I said. It's basically a WWII era flak shell with a big rocket and some guidance systems strapped on it to get it up to speed and near the target. They should shoot a dozen at each target to make sure. Note - this get close and blow up approach works well for much much slower aircraft which it can easily catch, as you also mentioned. But not for something going mach 4-5+ on descent.

    18. Re:And this is why... by Anonymous Coward · · Score: 0

      Per the GAO report, the 0.1 sec update is the GROUND RADAR's whole-sky search rate and tracking position estimate update. It looks like it has to keep track of multiple targets. The ground radar periodically sends info to the the onboard system telling it where to point the missile's onboard "little" radar so it can see its target (per wikipedia anyway). The GAO report shows there are 4 or so data points needed per sample: altitude, azimuth, range, @ time. It's hard to tell from the report if these are expressed as postions or as positions and velocities, if it's positions you'll need at least 2 samples to estimate new positions. The missile's flight control interrupt rate is likely more than ten times faster than the guidance data point rate. The real-world problem here is similar to looking through a zoom lens on a camera - if I tell you point your 300 mm zoom lens camera to a certain spot at a certain time to see a distant speeding car and my instructions are slightly off in time, you will look at the right place at the wrong time and see nothing. According to the GAO report, the ground radar keeps giving updates but they were always off by enough to put the target outside the missile's view.

  28. Kind of old news isn't it? by Interoperable · · Score: 2, Insightful

    The article contains some interesting examples but all of which have been in programming texts and courses for years. I'm not really sure why it's on /.

    --
    So if this is the future...where's my jet pack?
    1. Re:Kind of old news isn't it? by Anonymous Coward · · Score: 0

      Yes, there have been compilations of such examples... why indeed is this news?

      Nothing to see here, move along...

    2. Re:Kind of old news isn't it? by grouchomarxist · · Score: 1

      Not everyone has read those texts or taken those courses. Many people on /. are self-taught. It would probably be good for them to learn and everyone else to get reminded of the issues involved.

  29. WHY MAGAZINE EDITORS SUCK AT MATH: by Anonymous Coward · · Score: 0

    Let's take the double precision floating point representation as an example. It uses 64 bits to store each number and permits values from about -10308 to 10308 (minus and plus 1 followed by 308 zeros, respectively) to be stored.

  30. Flamebait by Anonymous Coward · · Score: 0

    The how story and most of the posts are one giant Flamebait.

    Nice how all the Slashdot geniuses seem to think they could have done a better job had the *only* been there 20-30 years ago, before most of these would be heroes were even born.

    Then there are morons who get on their high horse about corporate welfare bullshit. Sure, no one at Raytheon gave a shit about our soldiers, they just wanted to make a buck.

    What a disgusting way to start a Saturday.

  31. Ridiculous. Patriots always win. by writermike · · Score: 3, Funny

    Look, you guys can talk trash all you want, but when you say this:

    >>Patriot defense system failing to take down a Scud missile attack

    You're just lying to yourself. The Patriots defense is awesome this year. I mean, was there really ANY point for the Titans offense to show up a couple of weeks back?

    And the Scuds? C'mon man. They let go their best man two seasons ago. The QB can't hit the broadside of a barn and their entire wide-receiver corp has Jello hands anyway. The missile attack is a gadget play, pure and simple. Belichick sees right through that and you know it.

    Haters need to stop all the hatin' and get on the Pats bus!!!!! GO PATRIOTS!

    --
    If Nalgene water bottles are outlawed, only outlaws will have Nalgene water bottles.
    1. Re:Ridiculous. Patriots always win. by gclef · · Score: 1

      Football? Dude, we didn't leave you...you left us. Your card, turn it in.

    2. Re:Ridiculous. Patriots always win. by ikedasquid · · Score: 1

      Dude, this is /. Nobody understands the words coming out of your mouth.

  32. The old Patriot-stories, again.... by Anonymous Coward · · Score: 0

    Screw this. During Gulf-war I, or whatever it's called, Patriot did not 'miss' any target that they fired at - period. The system was never designed to destroy missiles or get direct hits, however. The original missiles were to destroy planes by going off in close proximity to the target - which they do, very successfully. Missiles like Scuds, however, are not always destroyed by this. They tend to just break up, sending the intact warhead off track, slightly. From what I know, this happened in the case mentioned.
    I happen to have worked with this system in the mid-nineties and this was a hot topic, back then. Why the total uptime of the system would mess up tracking is beyond me. The system will track what it either sees or is told to look for. This has nothing to do with rounding errors in time. Our system back then has been online for many days without impaired ability to track anything.

  33. Error in the author's math by ronaldg · · Score: 0

    Quote: total error of 0.3433 seconds, during which time the Scud missile would cover 687m.............. This would mean the SCUD would be traveling at almost 71 million miles per hour! I don't think so............

  34. GMP by tepples · · Score: 1

    The mpz module in the LGPL library GMP (not to be confused with a bitmap image editor) does arithmetic on large integers, and its mpq module represents rational numbers exactly as ratios of mpz integers. For example, 3.14 would become "157/50".

    1. Re:GMP by TheRaven64 · · Score: 1

      GMP is not a symbolic mathematics library, it's an arbitrary-precision library. It does not, for example, let you do square roots or trigonometric calculations and then their inverses later and cancel out the operations guaranteeing that you get the same output as input, which something like Mathematica does.

      --
      I am TheRaven on Soylent News
  35. It's a weapon. by tjstork · · Score: 1

    I don't disagree with what you wrote. One thing is, though, that requirements are very fluid and you have to ask if perhaps the problem is that 10 hours and reboot is a ridiculous requirement from the get go. Soldiers aren't going to sit in a middle of a war zone and turn off the shields.

    Arguably, when specing out systems like this, the solution is probably not to build them because they are really too complex to test for battlefield conditions. But that's crazy. So.. what was the outcome? You put a system out there, make it is as good as you can, and the outcome, in this case, was that the system did intercept some missiles, did save some lives and did pioneer missile interception in a war.

    28 people died because the system isn't perfect, to be true, but how many people lived because the system worked at all?

    --
    This is my sig.
    1. Re:It's a weapon. by dave420 · · Score: 1

      During the first gulf war, no one lived because it worked, as the system was not recorded to have hit any incoming scuds what so ever.

    2. Re:It's a weapon. by Moridineas · · Score: 1

      Having read about that, a MIT Professor claims that either no or only a few missiles were intercepted. Others (US Govt) claim much higher hit rates.

    3. Re:It's a weapon. by Skeptic+Al · · Score: 1

      ... perhaps the problem is that 10 hours and reboot is a ridiculous requirement from the get go.

      Or, it could have been a lack of requirement. If there was no spec about the tolerable down time (5 nines? 4 nines? less?), it simply would not have been designed in. The "reboot every 10 hours" could be a workaround added after the design was completed.

    4. Re:It's a weapon. by Anonymous Coward · · Score: 0

      Having read about that, a MIT Professor

      The MIT professor that wrote it is ideologically opposed to research in missile defense, for his reasons, so therefor, he says missile defense will never work and can't work pretty much no matter what.

  36. They do exactly that! by gbutler69 · · Score: 2, Interesting

    Each battery has overlapping coverage with its nearest neighbors. A proper deployment has overallping fields of fire in both depth and breadth. Surface-to-Air missile defense involves multiple layers of different systems, each specializing in different ranges: Short Range - things like stingers, Medium Range - things like HAWK, Long-Range - things like Patriot. A proper tactical deployment never relies upon a single battery to provide the sole coverage. The problem here was primarily on of tactical deployment. The technical issues can be argued, but, the real failure was a failure to deploy in tactically correct fashion. They sent a battery or two as a "Show of Force", probably overriding the tactical expertise of the officers involved for political expediency. You have jack-asses like Rumsfeld and Cheney (and their ilk) making military tactical decisions when they are not qualified to do so. The REAL failure here is one of politics.

    --
    Over-the-top Response Guy! Giving "Over-the-Top Responses" since 1970.
    1. Re:They do exactly that! by Anonymous Coward · · Score: 0

      Cheney was doing political meddling with tactics, maybe, but Rummy was a businessman during Desert Storm. (This is not to say that he's not an idiot, too.)

  37. tech failure vs people failure by doug141 · · Score: 1

    It was written up as a tech failure (and not a people failure) because newsmen who call their sources stupid lose their sources. As others have pointed out, the answer to your question of why this is news is because of the system failure resulting is death.

  38. "must-be-lit-majors" by John+Hasler · · Score: 1

    The authors of the article? So it would seem.

    --
    Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
  39. Bad design anyway by rnelsonee · · Score: 1

    Programmers' errors/naivete aside, if an error of 0.3433 seconds can mean the target aperture is 687m off, then a resolution of 0.1 seconds - even when working properly - could still be 200m off.

    And I see other comments about using fixed-point. I wonder why couldn't they just use an integer and use deciseconds as their base time?

  40. Alternate Headlines by bipbop · · Score: 1

    Slashdot: Why Programmers Suck at Math

    Okay, that'd be misleading too--I suppose it'd be more accurate to write "How a few incompetent programmers who built a weapon got people killed because they suck at math". Not very headliney? Okay, how about "Military Moron Makes Murderous Machine, But Beginner's Bug Betrays Billions"? I rounded up from 28 to billions, so it should still be inaccurate enough for Slashdot. As a bonus, you still can't tell what the hell the article is about from the headline :-)

    1. Re:Alternate Headlines by Jesus_666 · · Score: 1

      "Dozens Dead Due To Missile Defense System's Faulty Firmware". That headline has the advantage of conveying exactly the wrong image.

      --
      USE HOT GRITS WITH STATUE OF NATALIE PORTMAN (NAKED AND PETRIFIED)
  41. Seriously flawed reporting by SpinyNorman · · Score: 3, Interesting

    There's no way a real-time missile tracking system is going to be dealing with time at an accuracy of 0.1 sec.

    A Patriot missile travels at about Mach 3 (~1000 m/sec) so a rounding error of 0.05, even without any error accumulation, means you'd be off by 50m in position.

    Who knows what the real story is vs the garbage that was reported, but even if there was a cumulative error that's the fault of the programmer rather than a lack of a computers ability to do math. You do your error analysis and use whatever accuracy needed to keep the errors in a tolerable range.

    The part about the system running for 100 hours was pure gibberish. Yes, we can all divide that by 0.1 sec, but what on earth does that have to do with a real-time tracking system tracking a target is acquired a few minutes ago?!

    A better title for the story rather than "computers can't do math" would be "we can't do tech reporting".

    1. Re:Seriously flawed reporting by gardyloo · · Score: 1

      The part about the system running for 100 hours was pure gibberish. Yes, we can all divide that by 0.1 sec, but what on earth does that have to do with a real-time tracking system tracking a target is acquired a few minutes ago?!

      Yes! This was what I was wondering, too. Who the hell cares how long the system had been running already? All that matters is the most recent sequence of (time, position) [ or maybe (time, position, velocity) ] readings. Does anyone have a reasonable explanation whatsoever?

    2. Re:Seriously flawed reporting by mybecq · · Score: 2, Insightful

      A Patriot missile travels at about Mach 3 (~1000 m/sec) so a rounding error of 0.05, even without any error accumulation, means you'd be off by 50m in position.

      Perhaps the tracking radar has a 500m field of view at a range of X km (enough distance to launch a Patriot missile). It doesn't look at the target through a keyhole and just has to be in the general vicinity to detect/confirm the incoming Scud.

      How about if you realized that there are two systems in this story?
      1) Radar (0.1 s accuracy)
      2) Patriot missile (launched after target confirmation by Radar)

    3. Re:Seriously flawed reporting by dcoe · · Score: 1

      Read the GAO report: http://archive.gao.gov/t2pbat6/145960.pdf

      Or, so you can get back to uninformed ranting as quickly as possible, skip to page five and ignore the part about the system's internal clock.

      --
      "If you ain't got a camel, you ain't Shiite."
    4. Re:Seriously flawed reporting by Agripa · · Score: 1

      There's no way a real-time missile tracking system is going to be dealing with time at an accuracy of 0.1 sec.

      I suspect the terms accuracy and precision were confused. The tracking system could very well be designed with a precision of 0.1 seconds (with the associated programming error causing an offset after hand-off which drifts over time) but maintain an accuracy much better than that. While it is unusual, it is not unknown for conversions between analog and digital domains to have integral non-linearity errors much much less than the resolution.

  42. Car analogy by ledow · · Score: 1

    This TechRadar article also explains why cars suck at math, too.

    The timing belt was manufactured to be a few mm too short. But over the course of several thousand revolutions, those mm add up to a massive error, which causes the pistons to strike metal. Thus the car was a write-off.

    It's no fairer to blame the computer than it is the car - some ABSOLUTE PILLOCK didn't design, implement or test their system properly. And *they* caused the 28 deaths, not the computer (and it can't be overstated just how elementary a mistake this is, especially in a military system, and should have been caught by basic code review and testing at every stage).

    I hate stories like this because then you get deep mistrust of computerised systems where they *can* be incredibly useful, and without an adequate substitute. Every time a car won't start because the electronic ignition wasn't designed properly, every time a home computer crashes because someone didn't bother to isolate the apps from the OS well enough, every time something like this happens, people distrust "computers" more and more when what they should be distrusting is damn crappy programming.

    A computer is as close as you can practically get to being perfect. Short of hardware failure (Intel FDIV bugs, bad RAM, corrupt drives etc.), computers do not make mistakes. If they crash, it's because they've been *told* to crash (the fact that you even *see* a blue screen or kernel panic means that the computer is still just blindly following orders).

    There's no excuse for this - it's basic, elementary mathematics and binary manipulation. Some pillock threw a cheap CPU clock and a standard library at a time-critical, life-dependent military problem without even thinking. The programmers should be sacked, the testing teams should be sacked and ANYTHING they've ever created or reviewed should be overhauled to make sure they haven't made even worse mistakes.

    1. Re:Car analogy by John+Hasler · · Score: 1

      You assume that the article is not a "complete pillock".

      --
      Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
    2. Re:Car analogy by Ancient_Hacker · · Score: 1

      I can't tell if you're being serious or sarcastic:

      >The timing belt was manufactured to be a few mm too short. But over the course of several thousand revolutions, those mm add up to a massive error, which causes the pistons to strike metal. Thus the car was a write-off.

      A timing belt has TEETH, which make it a precise digital computer, dividing the revolution of the crankshaft by the right integer in order to drive the camshaft.

      The exact length of the belt is irrelevant-- any extra length is taken up by the tension adjuster pulley's position.

      Nothing to do with floating-point ops at all.

    3. Re:Car analogy by RegularFry · · Score: 1

      ABSOLUTE PILLOCK didn't design, implement or test their system properly.

      Bear in mind that the system was engineered to require a reset once every 36 hours to eliminate arithmetic drift, but the operators failed to do so.

      There's no excuse for this - it's basic, elementary mathematics and binary manipulation. Some pillock threw a cheap CPU clock and a standard library at a time-critical, life-dependent military problem without even thinking. The programmers should be sacked, the testing teams should be sacked and ANYTHING they've ever created or reviewed should be overhauled to make sure they haven't made even worse mistakes.

      Um, no. At best, the trainers and manual writers need re-education. It's their fault for not passing on the equipment maintenance requirements to the end users, who through incorrect action caused the gear to silently fail.

      --
      Reality is the ultimate Rorschach.
    4. Re:Car analogy by wonkavader · · Score: 1

      > A timing belt has TEETH, which make it a precise digital computer, dividing the revolution of the crankshaft by the right integer in order to drive the camshaft.

      No, that's the difference between a timing belt (like my first two cars) and a timing chain (which all my later cars have used).

      You're right on all points with respect to the chain, but you're thinking that cars are/were less crappy than they are/were.

    5. Re:Car analogy by Anonymous Coward · · Score: 0

      I hate stories like this because then you get deep mistrust of computerised systems where they *can* be incredibly useful, and without an adequate substitute.

      A computer is as close as you can practically get to being perfect.

      I hope you're trolling. Computer systems are not as close as you can get to perfect. Computers are fragile, immensely complicated things. Sure digital systems can be more precise or faster than analog ones. But in practice computer systems are not the most reliable. What other systems produce errors due to cosmic rays? Would you place your life in the hands of a computer system that had to have no software flaws in any of its components or suffer a hardware malfunction while you were on its watch? "Stories like this" may or may not be fud, but they also bring up case after case where your beloved computers were not as reliable as you would think they are.

    6. Re:Car analogy by ledow · · Score: 1

      "Would you place your life in the hands of a computer system that had to have no software flaws in any of its components or suffer a hardware malfunction while you were on its watch?"

      Yes, quite happily. Provided I knew the software was designed properly. In fact, not only that, but I do it *EVERY SINGLE DAY*. If my car's ABS decides to go loopy, it could easily kill me. Same for traction control. That's controlled by a black-box computer system in every car that's fitted with it. Fuel mixtures, the fuel pump itself, even the traffic lights. The only question is "has the system been engineered to a life-support-system level, rather than knocked up by an amateur?".

      If I go to hospital, everything from the life support machines to the blood pressure monitor is a highly engineered computer with professionally tested software. If I go into London, the trains are computer-controlled (some of them, anyway, on the Docklands Light Railway) and travel at stupid speeds. If I travel on an aeroplane, my life is in a computer's hands much more so than the pilots. The fact that professional pilots even *allow* these systems onto their planes reassures me. Computers save and secure my life, silently, all day long. I trust the computers implicitly if they have been engineered to the correct standard (where the failure mode is safe too). Anyone who doesn't trust computers in those situations shouldn't be driving at all (not just in modern cars, but because of the traffic control systems, etc.), should never travel by air, go on a cruise ship, or a million and one other things. Do you expect / rely on your home telephone to contact the emergency services? You just placed your life in the hands of a computer.

      Your post is *exactly* what I'm talking about. You rely on a computer to wake you up, get you to work at 70mph+ without dying, do your work, get you back home without dying, cook your dinner without irradiating you (own a microwave?), etc.etc.etc. But one of those computers that *isn't* critical (your home PC) goes wrong and suddenly computers are unrealiable. It's bullshit, and due to inconsiderate thinking you've tarred all these highly-controlled systems with the "My home PC crashes" brush.

    7. Re:Car analogy by Ancient_Hacker · · Score: 1

      Think. An engine going around at 3,000 RPM, if the belt is 1mm too long, is going to slip the timing by 3 meters every minute. And that's assuming no slip.
      Timing belts have teeth. Those belts you see on the outside of the engine are not timing belts, they drive the water pump, alternator, and AC.

    8. Re:Car analogy by wonkavader · · Score: 1

      On my 1967 vw bug, there were no teeth. This is why one checks the timing with a strobe light attached to a spark plug. You tension the timing belt and check the timing, then you twist the distributor cap to get the timing JUST RIGHT.

      You're ancient, but are you ancient enough?

  43. I'm not impressed ... by golodh · · Score: 1
    I remain very much unimpressed by the article, due mainly to it's rather sensationalist focus on missile systems and Ariane but also to it's apparent ignorance of a now 50-year old branch of applied Mathematics: Numerical Analysis (see e.g. http://en.wikipedia.org/wiki/Numerical_analysis) and its failure to distinguish between the root causes of both system failures. The Ariane failure (see http://en.wikipedia.org/wiki/Ariane_5_Flight_501 ) was interesting in that the software itself was Numerically sound, but it only failed to watch for overflow:

    Efficiency considerations had led to the disabling of the software handler (in Ada code) for this error trap, although other conversions of comparable variables in the code remained protected. This led to a cascade of problems, culminating in destruction of the entire flight.

    The Patriot case was simply unsound from a numerical point of view because it used an approximation which accumulated errors to the point where they seriously compromised the end result, which is a whole thing altogether (and mathematically speaking much simpler and more fundamental).

    Numerical analysis is basically about "How can we make sure that a computer algorithm on such-and-such hardware will always produce an answer to this-and-this mathematical problem with such-and-such error bounds.". This really isn't something like "coding well", but it can require complicated and careful mathematics to get right, which is something programmers usually haven't a clue about. Instead, and provided the effort is warranted by the application, one needs to have a competent Numerical Analyst (a fancy title for a Mathematician specialized in this particular field) check (if not actually design) the software. Coders can then do the rest, provided there is sufficient communication between the architect (the numerical analyst) and the builders (the coders) about all the quirks of the hardware and how they are accounted for and dealt with.

    Every CS graduate is supposed to know that advanced numerical work with computers (like those in the Patriot system, where the 0.3 second error is a fine example of negligence) falls under the domain of Numerical Analysis and require specialist attention. This is why some jobs should be undertaken by software engineers, not coders.

    1. Re:I'm not impressed ... by amorsen · · Score: 1

      Ariane is also special in that the programmers thought about the error, asked the right people, got the (correct) answer about the values they could possibly get from that particular sensor. They proved that they could safely disable the error trap. Indeed, their code performed to specification (as far as I know) in every single Ariane 4 flight.

      Then the code was moved to the Ariane 5, and part of the reverification process was apparently skipped... It's impossible to put them blame for this on the original programmers.

      --
      Finally! A year of moderation! Ready for 2019?
  44. In one sentence? by Hurricane78 · · Score: 1

    As Paul Lockhart said: Math is about creativity!

    There. I saved you a hell of a lot of time! ^^

    --
    Any sufficiently advanced intelligence is indistinguishable from stupidity.
  45. Fixed, but a day late by Anonymous Coward · · Score: 0

    Fixed, but a day late. for a 2 week turn around time from when this was fault-isolated to a fix was fielded in SW Asia is fast for government work. Sadly, 28 American soldiers died. The computer found a possible ABT. When it verified the track, it wasnt where its programming told it to be. Track was dropped.

  46. The Real Problem by Anonymous Coward · · Score: 0

    The real problem is all the math teachers around the world that teach students that all decimals after tenths or hundreths of a digit are, and I quote, "insignificant". In this case, what you basically have is a bunch of programmers who grew up learning that anything after the tenths digit is "insignificant". It's as simple as that.

  47. Sorry, but no. by golodh · · Score: 1
    The "advancedness" of military hardware has absolutely nothing to do with the problems sketched in the article.

    As long as the hardware has basic floating point support it's possible to design software that will get the right answer, and usually fast enough. It's all down to the software.

    1. Re:Sorry, but no. by RegularFry · · Score: 1

      Well, it probably does in Patriot's case. I'm sure the designers would have liked not to insist on a reboot every 36 hours, and if they'd had a 32-bit register to do their time calculations in they probably would have been able to push it out to at least a couple of weeks (although I can't be bothered to work out the precise details right now).

      The fact that they only had a 24-bit register to work in says a lot about how advanced the gear they were allowed to work with was.

      --
      Reality is the ultimate Rorschach.
  48. 24-bit registers? by s_p_oneil · · Score: 1

    As a programmer, I can confirm that these programmers screwed up, but I would bet money on the fact that the management forced them to. There's no way programmers working on physics software would choose a processor limited to 24-bit registers unless that was the only choice they were given, so that decision must have been forced upon them by their bosses. I'm also certain that the decision that it was "good enough" to ship was not made by the programmers. Here's an interesting quote:

    From: http://www.corpwatch.org/article.php?id=11110
    "As usual with the Pentagon, cost is no object. But the Patriot is very expensive system and it's getting costlier all the time. Raytheon and Lockheed originally promised to deliver the new Patriot system for $3.7 billion dollars. Now the cost has soared to $7.8 billion. Each Patriot missile unit costs about $170 million. In the first Gulf War, an average of four missiles were launched against a single incoming Scud."

    Even if that's grossly inaccurate, they saved a few bucks per multi-million dollar unit. That's like being penny wise but several million pounds foolish. While I agree it's not that hard to work around the 24-bit limitation, the decision to use such a limited processor was probably a major contributing factor to the schedule slips and cost overruns. Any time a project slips that badly, management will step in and force them to rush it out the door before it's ready. My bet is that the developers knew the problem was there, but they didn't have time to even look at it because they had bigger fish to fry when they were trying to get it out the door.

    1. Re:24-bit registers? by Anonymous Coward · · Score: 0

      Processor register width has nothing to do with math accuracy. If you need more accuracy you use a math library.

      Gosh, my 6502 only has 8 bit registers, so I must have been dreaming about doing floating point 30 years ago.

      Boo hoo, my x86 CPU is 32 bit and when I use long double variables it explodes with confusion.

      Want a billion digits of accuracy, regardless of register width?

      http://gmplib.org/pi-with-gmp.html

    2. Re:24-bit registers? by nedlohs · · Score: 1

      I don't understand how they can accumulate errors in the first place.

      You store the numbers of ticks in a 24 bit register, that gives you almost 19.5 days before it loops to 0 - so you either handle the "the timer looped" or it is required to be reset every 19 days (make it 14 for a safety margin).

      You only ever multiply by 0.1 when you need the value in seconds (and why would you even need that, just work in ticks as the time unit instead of seconds).

      I can't see how you could possibly accumulate floating point rounding errors. Sure if you tracked time as a floating point number and added 0.1 to it each tick. But that would be retarded in the first place for many reasons, not just because you would obviously accumulate errors.

    3. Re:24-bit registers? by RegularFry · · Score: 1

      Ok, now do that inside the time budget.

      --
      Reality is the ultimate Rorschach.
    4. Re:24-bit registers? by s_p_oneil · · Score: 1

      I doubt the article explained the true nature of the "bug" correctly. I work on 3D modeling and rendering on large scales (http://sponeil.net/), so I am acquainted with a number of types of precision problems. These developers are definitely working with 3D modeling and physics (and probably rendering for simulations). This means they're using 3D vectors and 4D matrices/quaternions, which accumulate precision errors surprisingly fast. The physics equations require approximations that introduce even larger errors, and the slower the processor, the lower the quality of approximations. A 0.1-second time step provides pitiful precision for physics calculations, and if they'd had a better processor, they would've taken advantage of it to take more samples. There's also the fact that the timer on the processor may be a tiny bit off, and that error will accumulate over time. Even if it the timer were perfect, they may need to multiply floats by the "current time". As that number gets larger, the result loses bits of precision during the multiply. There are any number of similar problems they could be running into (and this problem is probably a combination of more than one).

      Of course all of these problems can be solved in any number of ways, but this is a complex piece of software, it sounds like the processor is extremely limited, changes to the math require large amounts of testing (very expensive tests that at some point involve firing real missiles), and given how late the project was, you can be sure they had management breathing down their necks trying to get them to rush it out the door. Of course there are plenty of poor programmers out there, and I'm sure that was a factor, but probably not the largest factor. It sounds like they knew about this problem and management decided it wasn't worth the testing effort to try to fix it. After all, they had a "reboot every hour or so" workaround, which I agree is ludicrous, but the farther a project slips, the more management will start to think that certain problems are acceptable.

    5. Re:24-bit registers? by nedlohs · · Score: 1

      Obviously there something to it, since Im sure the programmers weren't complete idiots. I just can't see it.

      You detect an incoming threat, and then N ticks later you look again to determine the trajectory. Why would you care about the actual time? You just care how many ticks have passed between two events. Unless you have a database of scheduled friendly flights by time (and I would think an IFF system would be more likely than that), but that would be independent of the trajectory check.

      Reboot every hour or so means you need an additional system, to catch an incoming threat during the reboot of the other device. I'm sure management selling those systems didn't see that as too big a deal :)

      And of course the fatalities mentioned are really the fault of the guy firing the scud, then the operator of the patriot who didn't follow the specifications and left it running for 100 hours, and finally the idiotic software. Since the guy doing the firing wanted it to kill people I'd say the blame lands at the feet of the operator...

      BVut as I said I'm sure there's some detail of the system explaining it.

    6. Re:24-bit registers? by toddestan · · Score: 1

      It may not necessarily be the number of bits in the register, if your counter is not being ticked up at exactly 0.1 seconds. I think this may be what the article was trying to say. For example, what if the embedded system ran at 33 and 1/3 Mhz, and therefore needed to increment the counter every 3,333,333 and 1/3 clock cycles? Except that there is no such thing as a 1/3 of a clock cycle to the software running on the system, so the counter is incremented every 33,333,333th clock cycle instead. Now you've just introduced a tiny accumulating error of 10ns per second, which can start to add up after many hours.

  49. The real reasons... by Baldrson · · Score: 1
    1. In the mid 70s a programmer was being sued for damages caused by an error in his code but the judge ruled that programming was not a profession the way, say, civil engineering is, so there was no liability implied.
    2. Intel came out with the 8086 and doomed semiconductors to millions of monkeys banging on keyboards as they got degrees in "computer science".
    3. The network externality of software interoperability gave Microsoft a natural monopoly in setting software standards.
    4. The network externality of, ahem, network standards gave the government a natural monopoloy in setting networking standards.
    5. Abandon all hope...

    Oh, I suppose I should say something more proximate:

    The failure to unify optimizing JIT compilers with memoized (encached, tabled, etc.) demand driven (lazy) computations so that we can express our maths independent of precision without performance penalties. This, of course, is directly related to the failure to maintain dependency graphs so that when under continuous demand (observation) demand driven computation unifies with data driven (data flow) computation -- and when no longer under demand (observation), memoizations (encachments, tabled entries, etc.) can be voided until the next demand requires recomputation.

    I'll get around to working on it one of these days. Its just that, like many other things, I thought it was obvious enough 25 years ago that someone who had some serious money would have backed that kind of programming environment. I told Ray in 1985 Microsoft would do lots of damage, but he didn't believe me and even I didn't think it would be this bad...

  50. Fixed-point math by ArsenneLupin · · Score: 2, Informative

    We had a similar problem with an Aegis design, and it was a major headache for us Hardware engineers to try to convince the Systems Engineers that counting in Binary time was more logical than counting in 0.1 second increments. The SEs kept insisting that their computers at home accurately count in seconds and we hardware engineers should be able too.

    And the software engineers would have been right. The error was not about counting in 0.1 second increments versus 1 second increments or whatever, but it was in using floating point representation where fixed point (basically, scaled integer) would have been more appropriate.

    And come to think of it, that is more or less what most desktop and server OSes do: they count number of milli, micro, or nanoseconds, and store that as an integer.

    Similar issue arises in finance: you don't encode dollar amounts as floating point. Instead you store number of cents (or mils) as integer. Every programmer of financial software knows about this (... or should know about this...)

    Floating point is really only appropriate to represent values which are not known precisely anyways (measurement results), where the little additional rounding error wouldn't matter. For all else, used fixed-point.

    1. Re:Fixed-point math by commodore64_love · · Score: 1

      The battle was Not between Hardware* and Software Engineers. We agreed.

      If you re-read what I wrote I said the hardware group and the SYSTEMS engineers were the ones battling it out. In my experience systems engineers rarely understand the intricate details of the projects they work upon, which is fine. It's not possible to know everything. The frustration is when the systems engineers assume that hardware engineers and programmers are dunces, and refuse to listen to us.

      That was the root problem. We hardware/software persons knew binary counting would provide the best accuracy, but the systems engineers refused to let us do that.

      --
      "I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
    2. Re:Fixed-point math by ArsenneLupin · · Score: 1
      Sorry, I misread SE as Software Engineer, instead of Systems Engineer.

      My point about (decimal) fixed point still stands.

    3. Re:Fixed-point math by jbolden · · Score: 1

      That's a general problem of creating a hierarchy without listening to the people below them. As a technical architect is that I say something that draws strenuous disagreement than either I'm expressing myself poorly or dead wrong and I'm not moving forward till I figure out which.

  51. Maxima would have been overkill by tepples · · Score: 1

    [GMP] does not, for example, let you do square roots or trigonometric calculations

    I know that. I recommended GMP because the article is about improper handling of rationals, not square roots or trig (or even Trig Van Palin for that matter), and Maxima would have been overkill.

    1. Re:Maxima would have been overkill by TheRaven64 · · Score: 1

      GMP is overkill for fixing the problem in TFA; a simple fixed-point value would have been fine. That's not to say that GMP isn't nice; I use it in my Smalltalk implementation for BigInts, and it's really nice watching things that were integer overflows in Objective-C become huge numbers in Smalltalk.

      Maxim isn't really what the grandparent was after either, although it does have much same functionality.

      --
      I am TheRaven on Soylent News
  52. may not exactly be the programmers by astar · · Score: 1, Offtopic

    Disclosure: I am a programmer.

    I had a conjecture that the Patriot missile was a Raytheon project. Not a particularly well-based speculation, but I did the RFA for confirmation. For some reason, they did not mention the manufacturer. They felt free to mention Intel and Google, but I guess the manufacturer was an advertiser.

    As it happens, 40 years ago I was an end user of a Raytheon air-defense missile system called the Hawk. There was a common derogatory phrase about Raytheon. I guess we might call it a meme now.
    It was cute and perhaps relevant to this article, but it has been so long, I can not reproduce it.

    Anyway, whoever manufactured the Patriot, I sort of doubt that the first cause was a bad programmer.

    A war story. This is not all Raytheon's fault, but it makes a nice slander.

    At Kassel, there was a NSA antenna farm with a hawk battery next to it. It was noteworthy, but not unusual, for Migs to buzz the antenna farm. I guess it happened every few months. Go figure.

    1. Re:may not exactly be the programmers by RegularFry · · Score: 1

      Anyway, whoever manufactured the Patriot, I sort of doubt that the first cause was a bad programmer.

      Amen to that. This strikes me as a conscious engineering decision followed by a failure to impress on the end users the consequence of not correctly mantaining their gear.

      --
      Reality is the ultimate Rorschach.
    2. Re:may not exactly be the programmers by BranMan · · Score: 1

      I worked on it - PATRIOT was indeed Raytheon designed and built. The missiles were dual-sourced - Raytheon and Boing both made them. PATRIOT was indeed made to replace Hawk. Little anecdote about the two systems: Hawk was not reliable enough, so every component had to be doubled - so everything had a spare and you could keep the system up. PATRIOT passed the reliability test so that it didn't have to have everything doubled - but supposedly was designed to support that. Damned if I knew how they were going to do that - the Transmitter in particular was pretty darned packed. I shudder to think of working on it with twice the gear stuffed into it - and a megawatt of 208 3 phase power running through it.

    3. Re:may not exactly be the programmers by astar · · Score: 1

      Huh, I did not know that Hawk was that redundant. I estimate the batteries were fully up maybe 70% of the time. I had a van full of discrete transistor computer to simulate an attack, and with an operational battery, it took a lot of extra tweaking of the battery to get the battery to function with the simulator.

      Anyhow it did not make much difference. I am not really sure that any US military unit anywhere ever fired a hawk in anger. And the very optimistic expectation in the European theater was that if the flag went up, the battery would survive 30-60 seconds, and get off one or maybe two missiles. As you probably know, this stuff all comes under combined forces doctrine. So the cannon cockers would protect us from artillery attack. Unfortunately, the way the cannon cockers figured it was that they might not be able to fire, but most of their units were capable of retreating. :)

      The draftees did not particularly care about the mig overflights, but they would have been happy to shoot down a mig just for grins. Some of the officers were gung-ho, but they did not have the authority to fire. They had to call battalion, and battalion did not have the authority to fire. So battalion called group. But group did not have the authority to fire. Eventually, it got up to V Corp. who had the authority, but I never heard of them giving authorization to fire, and by the time the answer came all the way back, the mig was long gone.

      Oh well, two wasted years.

  53. Reboot? by jellyfrog · · Score: 1

    How does restarting the operating system kernel, reinitializing drivers for all the hardware and restarting every running program help?
    I know that's not what you meant, and the operating system in use is probably not windows (I hope, at least). Still, is it that hard to just deal with the problem, instead of starting from nothing and doing a whole lot of unrelated stuff? Reboots should generally not be required.

  54. Misplaced Blame by Matey-O · · Score: 1

    Sure, don't actually blame the _scud_ for the deaths.

    --
    "Draco dormiens nunquam titillandus."
  55. Didn't read TFA but...EMP. by Anonymous Coward · · Score: 0

    As opposed to a modern system susceptible to EMP.

    1. Re:Didn't read TFA but...EMP. by Entropius · · Score: 1

      EMP's propagate well underwater?

    2. Re:Didn't read TFA but...EMP. by Wonko+the+Sane · · Score: 1

      It certainly won't propagate through a few inches of solid steel in any case.

  56. Fixed point sucks, too by FranTaylor · · Score: 1

    Really what you want is to store numbers "rationally" as a numerator and a denominator.

    You get all the advantages of fixed point, and you can also represent fractional numbers exactly so that (1/3)*3 == 1

    If you use a proper language like Scheme for your calculations, it's just built in.

    1. Re:Fixed point sucks, too by Anonymous Coward · · Score: 0

      OK, Einsten, so what do you do when you want to use an irrational number like Pi in your calulations?

      How do you suppose a time-critical system is meant to use arbitrary precision math that takes arbitrarily long to perform calculations?

      Doh!

  57. Don't blame the computer... by Anonymous Coward · · Score: 0

    ... blame the programmer who tried to stuff 0.1 into binary, and then used the resulting erroneous binary number as if it were correct.

    .

    In this case, the computer didn't suck at math; the programmer sucked at programming, and should have been kept far away from computers controlling armament.

  58. Waterjet robot errors by homb · · Score: 1

    Many years ago I was asked to look at a waterjet robot that was behaving abnormally. The robot's task was to cut plastic sheets into square tiles as they went through it.
    The problem is that after 30 minutes of activity the square tiles weren't so square any more, and it kept getting worse. The software engineers from the manufacturer came and went a number of times, and failed to solve the problem.

    It was obvious to me that it was a compounding rounding error, so I looked at the robot's program. It said (simplified):
    1- start at the set 0.0 coords
    2- turn on jets
    3- go forward 30cm
    4- stop
    5- go left 30cm
    6- turn off jets
    7- go right 30cm
    8- goto 2

    Essentially it never went back to the 0.0 coords and kept adding the errors of going left and right 30cm. It took about 30 minutes to get to the code, find the problem and solve it.

  59. Re:Computers don't suck at math, some programmers by Antique+Geekmeister · · Score: 1

    Slow up. While the overall concept is reasonable, mil-spec computers are _tiny_ in resources. They have to be: getting them mil-spec approved is a lengthy process, and radiation hardening CPU's and microprocessors is very difficult. The bigger the chip in resources, or the smaller the traces, the more radiation vulnerable. And for an interception missile, the available payload to carry shielding for the electronics is miniscule if it exists at all. So competent military programmers learn to be very, very parsimonious indeed in their code.

    They also tend to write in C or even assembly, for optimization to their very limited hardware. There have been attempts to use all sorts of other languages for such processors, but they keep coming back to C.

  60. Is this really a rounding error ? by Cochonou · · Score: 1

    You have to look at the typical accuracy of time references. Military crystal oscillators are usually accurate down to 0.5 or 1 ppm, but that's about the best you can usually get. For instance, look at Q-Tech offerings, which is standard technology for avionics. So let's check... 100 hours * 3600 seconds * 1 ppm = 0.36 seconds. Even without rounding errors, the error would have been the same if the system was running from a 1 ppm crystal oscillator. It seems to me that the real problem is that the time references of the different radars were not synchronized more often.

  61. "The radar looked in the wrong place" (SIC) by beatsme · · Score: 1

    While we're on the topic of computers performing calculations and misguided notions, how about we rid ourselves of these unnecessary anthropomorphisms which lead to the idea that a computer is even "doing math" with our number system, or that the radar is "looking" anywhere at all.

    1. Re:"The radar looked in the wrong place" (SIC) by owlstead · · Score: 1

      You are of course correct. "performing calculations" and "pointing in the wrong direction" would be more to the point. That said, I don't think "we" are the problem here, you are preaching to the wrong church. You'd better contact the author of the article instead.

  62. Patriot wasn't designed to shoot down missiles by argent · · Score: 1

    1. Patriot was an anti-aircraft system, originally, and that it was operating outside its design parameters.

    2. No system should depend on a non-synchronized free-running clock, anyway. That's not an arithmetic problem, that's a design problem. If an absolute time base is needed, you have to actually create one, not assume that everyone's clocks are going to magically stay in sync.

  63. Old news by Anonymous Coward · · Score: 0

    This is a case in some text book I had to get for univesity a few years ago. According to it, this was in 1991.

    The book is Computer Architecture and Organization [an Integrated Approach] by Miles Murdocca and Vincent Heuring (you can find the case on page 51).

  64. Technically.... by CFD339 · · Score: 2, Funny

    ....I believe the aerobatic maneuver you describe is called the "Lawn Dart", and while it has been done many times in aviation history, few pilots have ever succeeded in doing it twice.

    --
    The problem with quotes on the internet, is that nobody bothers to check their veracity. -- Abraham Lincoln
  65. Why the computer sucks at stopping... by Ostracus · · Score: 1

    "Except that people tend to rely on computers, and take risks they would not have otherwise taken."

    So in other words, computers are the anti-lock brakes of the electronic's world? See I knew I could get a car analogy in there.

    --
    Shai Schticks:"You don't make peace with friends, you make peace with enemies"
  66. hawgwash by ifeelswine · · Score: 1

    my computer sucks at math because it doesn't apply itself. period. end of discussion.

  67. Patriot success rate was likely extremely inflated by neapolitan · · Score: 4, Informative

    I know that I'm arguing with a trolling AC, but for the other readers of slashdot, you should know that the grandparent's post refers to the controversy regarding the analysis of the Patriot system during the first Gulf war. There was a huge propaganda machine behind the Patriot's "successes" which turned out to be very near zero indeed. This was covered in a series of hearings in the early 90's...

    http://www.fas.org/spp/starwars/docops/pl920908.htm

    You can also read up on this from transcripts from the hearings after the war.

    In the interests of fairness, here is a rebuttal / review.

    http://www.fas.org/spp/starwars/docops/zimmerman.htm

    I remain unconvinced -- from reading this (almost 20 years ago) I concluded that at best, the military did not know for sure that these worked well.

    --
    Slashdotter, ID #101. UIDs are in binary, right?
  68. I thought they didn't work anyway by PJ6 · · Score: 1

    Last I heard the US military wildly inflated the Patriot Missile success rate to 95% from "possibly 0%", and tried to cover up scores of civilian deaths directly caused by them. And Raytheon couldn't get even one hit under controlled conditions. Presumably these missiles work now if they're being bought and sold, but I still haven't seen any proof. Has any non-US affiliated party released test results?

  69. Re:Poor Research by Anonymous Coward · · Score: 0

    Subject changed to reflect Parent's misinformation. If you look a little more carefully, the Patriot did stop Scuds later on- I assume after the bugfix mentioned went in. While not 100%, it did eventually earn its salt, with it's score a lot higher than "zero".

    As to the Israeli system... I guess they haven't worked the bugs out yet.

  70. Simple solution by owlstead · · Score: 1

    They should of course have used Swiss build computers instead.

  71. major fail by Syntroxis · · Score: 1
    I was working for a small aerospace startup. We had a project to test molecular beam epitaxy on a project to fly on the shuttle. I built most of their PC,s, ran the network, provided support, etc. Our platform, at the time, was the 486 processor. This new-fangled processor the 586, came out, and the lil PhD that did the orbital dynamics of the project just had to have one. Having been the victim of several arrows in the back, I had a rather adamant aversion to using version 1.0 of anything. I argued, and won, we would not use the 586 yet.

    Welllllll, they snuck one past me as a piece of "test" equipment.

    The project was placed on the shuttle, and when it came time to fly, it was lifted from the payload bay by the arm, and activated. It could not lock guidance. It would only wobble. Long story short, after several attempts, re-uploading flight software, and a second flight on the shuttle. The project was scrapped.

    Several weeks after the second flight, I made sure that the articles about the Intel 586 floating point error were in everyones' in-box. About two years later, the company went under. Doubt it was because of the problem with the 586.... it was more because the company was a threat to the traditional way NASA does business.

    --
    Wherever you go, there you are.
  72. Incorrect Evaluation of the Problem by Anonymous Coward · · Score: 0

    Computers are quite well at math, and do so with amazing accuracy.
    The problem is the people applying the math computations to the
    computer system. The problem existed long before computers and
    produced an area of mathematics called numerical analysis.
    Slide rules produced greater errors, and hand calculations even more
    so, yet this can be accounted for.

    Non-math people should not be doing this on their own but should
    be requesting the help of mathematicians when they don't understand
    how to compensate for such errors or know when the errors have
    reached a level that compromises the integrity of the calculations
    and thus requires a new starting point.

    Programmers should not pretend to be mathematicians or engineers
    or anything but what they are.

  73. Computers don't suck at math by kenh · · Score: 1

    This headline, while captivating, is inaccurate - computers excel at math, and can do complex calculations faster and better than any other device I know of. BUT, the issue here was the fundamental design which pitted the software design against the hardware limitations. The author of the post above (with the benefit of hindsight) was able to describe the problem in just a handful of words, begging the obvious solution - the sampling should have been done in increments that suited the 24 bit registers the values would reside in for calculations - they should never have left the system to "round" any values. Design flaw, plain and simple.

    --
    Ken
    1. Re:Computers don't suck at math by EmagGeek · · Score: 1

      Exactly right. This is a design flaw, not some imagined inherent inability for computers to do math. The engineers designed in the error, plain and simple.

    2. Re:Computers don't suck at math by rod · · Score: 1

      Yes, you're both correct. Humans suck in almost everything. When they go right, humans are good creating meta-whatever, but never good in being precise. This is stupid public workers (or hired to stupid people) boring work, and would end up wrong.

      Computers are great. Humans made mistakes.

      tsk

  74. read Goldberg's paper by Anonymous Coward · · Score: 0

    Most people don't know about neither how integer math is supposed to be done nor about floating-point maths. Read Goldberd's paper "What every computer scientist should know about floating-point numbers".

    People that *think* they know about that subject are probably the worst offenders: do you know what it takes to writes correctly the following method (camel case, long method name, just to get to the point):

    areEqualsOrAlmostEquals(float a, float b, float maxAbsOrRelError) {...}

    you give a float which specify the maximum absolute or relative error and your method simply returns true or false.

    Do you *really* know what it takes to write such a method? Hint: you probably don't and I can write test cases making your solution fail.

    Goldberg's paper is pure gold ;)

    I have it since years, it's 80 pages long and it's a bible on the subject.

    Most people don't understand integer math nor floating-point math, it's a fact.

  75. Implementing meaningful timestamps is not trivial! by janwedekind · · Score: 1

    Anyone who thinks that measuring time is simply a matter of using accurate floating point values should have a closer look at the definition of time

    • There is the universal time (UT) where there is noon when the average sun crosses the zenit in Greenwich.
    • There is the ephemeridal time (ET) which is a constant time measure which was defined last century. Due to the earth's rotation slowing down ET-UT is more than one minute these days.
    • There is UTC which is a constant time measure which is aligned with UT (running at the speed of ET). Once UTC-UT reaches 0.9 seconds a switch-second is introduced (happens about twice a year).
    • Then there is time zones and all the summer time / winter time nonsense
    • A year has more than 2**34 milliseconds. The Julian calendar starts at 4713 BC. In order to save digits time-signal services use the Modified Julian Date which starts somewhere in 1858.

    And depending on the problem you are working on you may have to take into account relativistic effects caused by the gravitational potential of the earth and the sun.

  76. Numerical Analysis by tkrotchko · · Score: 1

    Yes, when I studied Computer Science (admittedly about 30-40 years ago), it was a requirement to study numerical analysis which basically laid out the fundamentals of how and when floating point numbers failed in binary representation. So the idea that people in the 70's either didn't know or care about these issues isn't true.

    If you haven't heard of the Euler or Runge-Kutta, you probably should before doing any sort of system design that involves floating point numbers.

    I think everybody is too busy teaching programming these days to study computer science ;)

    --
    You were mistaken. Which is odd, since memory shouldn't be a problem for you
    1. Re:Numerical Analysis by Entropius · · Score: 1

      I help teach a class in computational physics for sophomores.

      The VERY FIRST ASSIGNMENT is looking at this. We give them a simple numeric integral to do -- one that they can do analytically -- and tell them to make a log/log plot of the error vs. the number of bins used to do the Riemann sum. Of course they get wrong answers for too few bins (since it's too coarse) and wrong answers for too many bins (floating-point rounding errors). Then we have them do it again with doubles and have them do it again with a better algorithm.

      After that we do Euler and Runge-Kutta.

      The biggest problem isn't students being blissfully unaware of floating-point precision issues; it's them blaming precision issues when they've really just got broken code.

  77. Attention Span of a Gnat by remy · · Score: 1

    99.9% of /. readers think "hardware" when they see the word "computer".
    99.99% of the general public think "whatever's in that mysterious box" when they see the word computer--and that includes the tubes that hook up to the interwebs.

    It's actually decent science reporting for the 99.99% of people who don't distinguish between hardware and software, and if one weren't a /. reader, one's attention span would exceed that of a gnat (not excluding myself here, btw), and one might have read (all the way on page 3):

    "Surprisingly, perhaps – and with the exception of the Pentium floating point error, which was caused by a hardware glitch – all of the errors we've mentioned here could have been prevented. In that sense, they can all be thought of as software errors."

    It's actually a pretty good article overall, though since the (presumably UK-based) audience of something called "TechRadar" ought to have more in common with /. than with the general public, the title could have been less inflammatory to those in the know...

    1. Re:Attention Span of a Gnat by Anonymous Coward · · Score: 0

      The Pentium FDIV (Floating point divide) bug was not due to a hardware glitch. The error originated in the system level design the lookup table (LUT)used in the IC. All the subsequent verification was against the expected results from the LUT, so the CPU passed all the verification tests. The bug was due to a bad system level design that was never verified - until after the hardware shipped.

      from http://www.trnicely.net/pentbug/pentbug.html:
      "... The difficulty apparently arises from an error in the
      lookup tables used to implement the hardware division algorithm;
      the lookup tables are either incorrect or incomplete. ..."

  78. I call bullsh*t by RogueWarrior65 · · Score: 1

    I don't know any computer that uses a 0.1 second tick period. Even crappy Linux 2.4 has a 100 Hz tick rate. I seriously doubt a system like the Patriot would have less than half a mile resolution.

    1. Re:I call bullsh*t by SpinyNorman · · Score: 1

      Someone posted the actual GAO report and it seems that it does, but this is only used for predicting a "where will it be seen next" space-time window (not a precise position) for the radar to search. The trouble was that the time corrdinate was absolute not relative to last position, hence accumulated clock accuracy caused it to eventually look in the wrong place and lose the target.

      http://www.fas.org/spp/starwars/gao/im92026.htm

    2. Re:I call bullsh*t by Jeremy+Erwin · · Score: 1

      And "crappy Linux 2.4" can't be installed on a Patriot Weapons Control Computer.

  79. What a stupid article by Anonymous Coward · · Score: 0

    Has anyone read the case study on this problem? The "solution" for this was simply to reboot the system every day. Now the argument for this solution does not really matter but the key thing to note is that the original proposal never stated that the system was to be on for weeks on end. This is not a code failure, it is a contractual failure on both sides. The customer should have had this stipulation in the contract, and the supplier should have found out such important information before the designing of the system.

  80. GiNaC by cpghost · · Score: 1

    How about GiNaC?

    --
    cpghost at Cordula's Web.
  81. Re:The entire purpose is killing. by publiclurker · · Score: 3, Insightful

    Actually the main purpose is a cost plus fixed profits contract for the weapons manufacturer. Even if no one ever dies on either side of the gun, it's still a success to them.

  82. Not a math problem, an algorithm problem... by jedidiah · · Score: 1

    The painfully obvious solution is to keep time by "ticks" rather than some decimal representation of seconds.

    As anyone who has been through school can tell you, floating point numbers come with their own built in error.

    The obvious solution is to use integers or use them as "fixed point" decimal.

    --
    A Pirate and a Puritan look the same on a balance sheet.
  83. Re:Patriot success rate was likely extremely infla by Philip+K+Dickhead · · Score: 1, Troll

    Look. When the system is named "Patriot", you already have enough information to understand the framing context - if you care to have the particular insight. This is a propaganda tool, like the rockets launched from Airstrip One.

    It fulfilled its mission when it was designed, manufactured and labeled as "The Patriot Missile System". Ballistic interception is a secondary mission and fulfillment is unnecessary for success.

    --
    "Speaking the Truth in times of universal deceit is a revolutionary act." -- George Orwell
  84. Re:The entire purpose is killing. by RegularFry · · Score: 1

    The corollary to that is that the people making equipment to protect other people must be smarter. Guess which side of the line the Patriot system falls on?

    --
    Reality is the ultimate Rorschach.
  85. Integer Math by originalhack · · Score: 1

    First, that's why most timing belts are really timing chains.

    Second (to the earlier posters), testing is way to late to catch this type of problem.

    This is the sort of thing that should be caught at design review where the "tenths of a second" are actually computed as INTEGER hundred-counts of a millisecond timer that is the fundamental time-base for the system and is "adjtime"d to prevent it from accumulating error relative to a very reliable time reference (like GPS).

    Of course, this is what happens when managers think all programmers are interchangeable and don't value engineers who have a clear view of what they are doing and why.

  86. Flaw in design by Murdoch5 · · Score: 0

    So because the design of the system results in it failing and that leads to computers sucking at math, I don't think that works. Really because someone thought of a broken method for design the computer sucks at doing the programs math. So I really don't know why you'd blame the entire field of computer mathematics.

  87. READ THE GD ARTICLE by ToasterMonkey · · Score: 4, Insightful

    FTFA:
    "So computers might suck at maths, but there's always a solution available to circumvent their inherent weaknesses. And in that case, it's probably more accurate to say that computer programmers suck at maths - or at least some of them do."

    Thank you, come again.

    So in a system that should have clocks synchronized to less than a microsecond nobody bothered to run "ntpdate" even once in hundred days ?

    Yes, obviously they just needed to ssh into their patriot missile air defense system, edit a few lines in /etc/inet/ntp.conf and svcadm restart ntp.

    The obvious problem in the article, if you read it, is computer's finite precision, and how it is dealt with. By 'computer', the author could have easily included the system libraries that are actually doing all the rounding and overflows instead of implementing arbitrary precision in software.

    Everyone defending the way 'computers' is used in this article, and conflating it with 'processor' is a complete idiot.

    1. Re:READ THE GD ARTICLE by Jane+Q.+Public · · Score: 3, Insightful

      The obvious problem in the article, if you read it, is computer's finite precision, and how it is dealt with. By 'computer', the author could have easily included the system libraries that are actually doing all the rounding and overflows instead of implementing arbitrary precision in software.

      Not at all, since correcting an inappropriate hardware design with software is like fixing an automobile that was designed with square wheels by manually sawing off the corners to make them octagonal instead. You could create a recursive software routine to continue sawing until the wheels were a good approximation of round, but that's an awful lot of sawing to fix something that should have been right in the first place.

      The clock in modern systems is nothing but a hardware register that gets incremented periodically (as correctly described in the article). The ONLY rounding error introduced by software is in converting that number to decimal. But rounding had nothing to do with the problem described. The appropriate solution is a better hardware design, not attempting to patch or correct it in software.

      The problem was error accumulated in the clock register itself due to the imprecision of the clock, and overflows due to the inappropriately small size of the register. Both are hardware issues and represent bad design decisions. They way to fix them is to design the hardware properly in the first place so that it is appropriate for the job at hand.

    2. Re:READ THE GD ARTICLE by jbolden · · Score: 1, Insightful

      By 'computer', the author could have easily included the system libraries that are actually doing all the rounding and overflows instead of implementing arbitrary precision in software.

      Everyone defending the way 'computers' is used in this article, and conflating it with 'processor' is a complete idiot.

      This is a programmers blog. We don't conflate that sorts of things. If a program is using the wrong library that's not a problem with "computers sucking at math" but a problem with "programmers not understanding arithmetic libraries very well". The topic of computer arithmetic and the issues with various representation is covered standard in undergraduate programming classes. In other words these problems happened because:

      1) They picked the wrong programmers
      2) The didn't do QC
      3) They didn't have test libraries that tested their systems correctly.

      etc...

      Computers don't suck at math. Nurses can do 98% of what a doctor does and for most of it more quickly than the doctor. It is that other 2% which is the difference between the doctor's education and the nurse's.

    3. Re:READ THE GD ARTICLE by Nefarious+Wheel · · Score: 1

      If I can see little further than others, it is because I have been standing on the shoulders of midgets.

      I've seen a lot of developers who plow into a problem using libraries that look like they'll do the job, with no validation or verification that the libraries are themselves right. They just assume they are, and plow forward.

      Sometimes, like exchanging flippers for feet, I wonder if we made the right decision to leave analog computers behind for this sort of thing. Old battleship and Nike missile systems used analog computers, and they (for all their majestic, neo-steampunk size) made a fairly decent ballistic trajectory.

      Who's that on my lawn?

      --
      Do not mock my vision of impractical footwear
    4. Re:READ THE GD ARTICLE by Macman408 · · Score: 0, Flamebait

      The problem was error accumulated in the clock register itself due to the imprecision of the clock, and overflows due to the inappropriately small size of the register. Both are hardware issues and represent bad design decisions. They way to fix them is to design the hardware properly in the first place so that it is appropriate for the job at hand.

      ...so your suggested solution is an infinitely precise register, and an infinitely large register?

      You're obviously not a hardware designer, so I'll spare the clue-by-four. But there are some things that are better handled by software. It would certainly be *possible* to make a register look almost infinitely large; in the event the hardware detects an overflow, it allocates some space in memory, and calls that the upper bits of the register. Then it resets the register itself, and can start counting from zero again. You've extended your register size from, say 64 bits to 64 bits plus whatever amount of memory you have. But it's still not quite infinite. What happens when you run out of memory? (OK, so then the hardware starts saving to the hard drive. And if that fills up? Why, then the hardware signs you up for an online backup service using your credit card number and starts sending the overflow there. And when you get a new credit card?)

      As you can see, you've delayed the problem - perhaps even by enough that almost nobody will ever encounter it. But it's still there, lying in wait for somebody to stumble across.

      The better hardware solution? Provide the tools for the programmer to do what's appropriate for his application. Almost nobody designs hardware for just one application any more - I'd be willing to make a sizable bet that all of the cases cited in TFA used general-purpose hardware, so the general-purpose solution is software. Metaphorically, the hardware should give the programmer a loaded gun and let him decide how to use it. He can use it to shoot a deer and eat for a month, or shoot a bear to protect himself, or shoot fish in a barrel if he needs entertainment. Of course, he might also shoot himself in the foot because he doesn't know what he's doing.

      Now if you want to make an argument over what layer of software takes advantage of the tools that the hardware provides, that's a legitimate discussion. Perhaps the compiler should make the hardware look infinite (this has same problems as the hardware doing it, but you save transistors). Or maybe provide a math library with the capability of making the hardware look infinite, and making sure the programmer tells it what to do if it eventually does overflow (but your programmer still has to recognize that he needs to use the library).

      In any case, basically every solution to these problems come down to one thing: how many programmers need to know that the hardware really isn't infinite? If the answer is "every damn one of them," that really simplifies picking the idiots out of the crowd.

    5. Re:READ THE GD ARTICLE by Jane+Q.+Public · · Score: 4, Insightful

      I'm obviously not a hardware designer? That's funny. I am not the cluless one here. How about some simple math? Maybe you would learn something.

      A 24-bit register, with clock ticks every 0.1 second, would overflow in less than 20 days. And if the clock ticks were faster, then it would overflow even sooner. No wonder they recommended rebooting the system every few days.

      Of course I do not recommend an infinitely large register. Simply one that is large enough for the job at hand. This one obviously isn't. Further, a 0.1-second resolution clock is obviously not adequate to a job requiring this kind of precision.

      If the hardware clock is off (not overflowed but INACCURATE, which was the real situation here), no amount of software tweaking will properly fix the problem. The article did not state but implied -- incorrectly -- that the clock register was accumulating rounding errors; that is not the case. Nobody makes system clocks that way, nor did they in the 90s or even the 80s. The system clock is nothing but a counter that is incremented every clock tick. The actual problem was that the clock ticks were not sufficiently precise, so over time the count was off. Math libraries and rounding errors played no part whatsoever in that error.

      Finally, I would like to point out that today's standard PC-type system clocks are large enough that they won't overflow for 100 years or so; that is the obvious and proper solution to the overflow problem. The problem of clock ticks that are sufficiently precise for timing of missile navigation, as far as I know, has not been addressed on standard PCs, however, and they do not try to correct for that in software because the adequate precision in the clock simply does not exist. It would amount to tilting at windmills. Keeping a count in software of the number of times the register overflows is also NOT an appropriate solution for a system clock, nor is any software tweak, because software by definition is volatile while the hardware clock is not. In other words, nobody does it that way, dude, because it's just plain the wrong answer.

      As for your final comment, most Unix programmers know what epoch time is, when it started (00:00:00 UTC on 1 January 1970 according to ISO 8601), and when that date will roll over in the counter (approximately 65 YEARS later, so it isn't much of an issue). Nobody is arguing that we should make a missile system that needs to last, unmodified, for over 65 years. But proper hardware design in the first place, which was certainly possible at that time using ASICs if not straight-up custom chips, would have eliminated the problem.

    6. Re:READ THE GD ARTICLE by Jane+Q.+Public · · Score: 1

      Apologies, an edit somehow slipped through there. The 100 year figure is incorrect. Overflow of a Unix (that is, PC-type system clock register) will occur about 65 years after the epoch date of 00:00:00 UTC, January 1, 1970.

    7. Re:READ THE GD ARTICLE by Macman408 · · Score: 1

      Actually, judging by their description of the problem, I'd guess that they were using a floating point format, with no more than 20 bits of mantissa (see my comment elsewhere on this topic). It works out pretty well if you assume that they just continually add 0.1 to a register for 100 hours. And judging from the way the math lines up, it sounds like they might've been taking that 24-bit floating point number, and putting it into a much larger register, otherwise the addition of 0.1 to 360,000 would have given a result of 360,000, and not just accumulated an error of about 1/3 s. Of course, all I have to go on is what the article says. I'd *hope* that nobody (not hardware, not software) would make a clock the way that I can infer from what is described in the article, but that's the best guess I can make.

      There were probably perfectly good reasons for choosing a processor (or designing one, if that's what they did) with only a 24-bit word. For example, cost. Just because you CAN make a 64-bit floating point unit doesn't mean you SHOULD. It's typically far better to take an off-the-shelf part, and use it properly. My PC is capable of 32-bit and 64-bit integer and floating point math natively. However, if I run Mathematica on it, I can work with arbitrary-precision numbers that may have hundreds or thousands of bits of precision, or more. And, in the rare cases where a computation results in possibly-reduced precision, the program is smart enough to warn me that the answer is inexact.

      So why spend money on an ASIC if you can use commercial off-the-shelf hardware? And even if you do use an ASIC, why make it more expensive by adding extra hardware when software could do just as well. (Added bonus: a software error is much easier and less expensive to fix. If hardware had been the culprit for this problem, a fix would have likely taken 6 months or more and cost millions of dollars.)

      As an aside, I've noticed you quite vehemently stating in this comment and others that the error was due to the inaccuracy of the clock (I presume you mean something like, eg, a 25 MHz clock chip that was running at 24.99999 MHz), and not rounding errors as the article stated. Do you have a citation for this?

    8. Re:READ THE GD ARTICLE by smallfries · · Score: 2, Informative

      The problem described is not overflow, it is repeated rounding on the imprecise representation of 0.1. The systems failed after 3.6M ticks. In a 24-bit register overflow is not a problem until at least 16.7M ticks. The lowest bound is because this is not an integer register and the article does not describe the size of the exponent. If you check the figures in the article, when the system failed it was out by about three ticks in 3.6M. Overflow causes the representation to suddenly shift to a completely wrong value. Subtle shifts to nearby values are symptomatic of rounding error.

      Rounding is an issue. This particular example is classic textbook stuff and been used in many a software engineering course. Using a 24-bit floating point representation store 0.1. The error will be roughly 1/(2^23). Now repeatedly accumulate this value. The problem occurs because each time the values goes past a power of two boundary in magnitude we need to round the accumulator causing a slight loss of accuracy. Then for each subsequent addition we are adding a less precise representation of 0.1 until they are more likely to round up rather than down.

      The system has three sources of accuracy that anybody with experience in floating arithmetic could have pointed out easily:
      1. The initial representation of 0.1 is off by a tiny amount (very small impact on the final value)
      2. As the accumulator increases in magnitude it's exponent rises, in this case by log_2(3.6M) = 22 places.
      3. The subsequents additions of the small value to the much larger value become increasingly imprecise.

      The problem is not the clock itself, nor some integer accumulation of the time: it is a designer who chose to use a floating point accumulator. Multiplying the representation of 0.1 by the integer number of ticks at each stage would have eliminated the problem. Accumulating 1/8 ticks in floating point would have worked fine. Doing what they did was stupid.

      --
      Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
    9. Re:READ THE GD ARTICLE by Anonymous Coward · · Score: 0

      I don't buy it. A programmer with an ounce of sense would have used 1/8th or 1/16th second ticks.

    10. Re:READ THE GD ARTICLE by JLF65 · · Score: 1

      Programmers work with what they're given. You often don't have a choice on what the rate is. For example, the "standard" tick rate of a PC in DOS and Windows is 18.2 ticks per second. Where in the world did this weird rate come from? It came from the fact that the original DOS set the timer to 0xFFFF, and the timer was clocked by one third the color burst subcarrier frequency. This gives the exact tick rate of 3579545 / 3 / 65536 = 18.206507365 ticks per second.

    11. Re:READ THE GD ARTICLE by Jane+Q.+Public · · Score: 1

      Putting it into a larger register (unless it is being done in software) amounts to just having a larger register in the first place. If you were doing that in software, then what you would be doing is counting overflows of the main clock.

      But nobody -- and I mean nobody -- designs a system clock in such a way that rounding errors accumulate, as implied in the article. It just isn't done. The reasons for that are obvious and well-known, and have been for a long time.

      The price difference between an ASIC and a stock part is totally insignificant when you consider that it is going into a MISSILE, and what the rest of that missile costs.

      I think the author of the article simply got it wrong. The article clearly stated that clock ticks were 1/10th of a second, which might be okay for a standard part, but is completely inappropriate for an instrument that must navigate and calculate positions of objects moving at high speed. The fact remains that if the article is even halfway correct, somebody made some outrageously bad hardware design decisions.

    12. Re:READ THE GD ARTICLE by Jane+Q.+Public · · Score: 1

      YOU show ME where anybody -- anywhere -- has ever designed a system clock in such a way that rounding errors accumulate. It just isn't done, man. No system designer would do that. It would be a stupid design, which brings us right back to what I was saying in the first place: somebody made some bad design decisions.

      The article also clearly stated that a clock tick was 1/10th of a second. Simple math will show you that is a totally inappropriate minimum timing interval for a device that must calculate the positions of objects moving at hundreds or thousands of miles an hour.

      Q.E.D. Even if you ignore the clock register issue, either the article is wrong, or somebody made some very bad hardware design decisions.

    13. Re:READ THE GD ARTICLE by Macman408 · · Score: 1

      I'll concede one point - the clock was indeed an integer, and the rounding error apparently came by multiplying that integer by 0.1, where the 0.1 had only limited precision. (I found the GAO report.) I'm relieved that the programmer wasn't so stupid as to add 0.1 many times (I agree no system designer SHOULD do that, but I've seen some pretty stupid designs...), but I still think the designers are idiots for different reasons than you think the designers are idiots. ;-)

      I still think that the hardware could have been used just the way it was - and indeed, the software bug was found 14 days before the attack, was fixed 9 days beforehand, and the fix arrived the day after the attack. Different hardware design could've prevented the bug, but in my opinion, the shortfall was that of the programmers not understanding their hardware (or thoroughly testing their software).

      Obviously, you feel differently.

    14. Re:READ THE GD ARTICLE by davidbofinger · · Score: 1

      [...] like fixing an automobile that was designed with square wheels by manually sawing off the corners to make them octagonal instead. You could create a recursive software routine to continue sawing until the wheels were a good approximation of round [...]

      Would you call this a workaround? Or correction of a rounding error?

    15. Re:READ THE GD ARTICLE by Anonymous Coward · · Score: 0

      I think the author of the article simply got it wrong. The article clearly stated that clock ticks were 1/10th of a second

      The article did get it wrong, but not as you surmise. The following is from the GAO report linked by Macman408:

      "The Patriot battery at Dhahran failed to track and intercept the Scud missile because of a software problem in the system's weapons control computer. This problem led to an inaccurate tracking calculation that became worse the longer the system operated."

      That's pretty clear that it was not the hardware system clock which was at fault, but rather the software. It goes into more detail:

      Time is kept continuously by the system's internal clock in tenths of seconds but is expressed as an integer or whole number (e.g., 32, 33, 34...). The longer the system has been running, the larger the number representing time. To predict where the Scud will next appear, both time and velocity must be expressed as real numbers. Because of the way the Patriot computer performs its calculations and the fact that its registers(4) are only 24 bits long, the conversion of time from an integer to a real number cannot be any more precise than 24 bits. This conversion results in a loss of precision causing a less accurate time calculation. The effect of this inaccuracy on the range gate's calculation is directly proportional to the target's velocity and the length of time the system has been running. Consequently, performing the conversion after the Patriot has been running continuously for extended periods causes the range gate to shift away from the center of the target, making it less likely that the target, in this case a Scud, will be successfully intercepted.

      So yes, it was a software problem, and yes, the system clock kept time accurately in tenths of seconds. You based your position on assumptions that were incorrect, which is why you are wrong. Next time try a little humility and do some research before drawing your conclusions.

    16. Re:READ THE GD ARTICLE by Jane+Q.+Public · · Score: 1

      My assertion is that if they were converting times from a clock register to either floating-point or a different time scale in software, then it was poor system design.

      I saw the quote from the GAO report, by the way, and I still believe that it inaccurately describes the situation. To me, it appears to be a (very rough) attempt to put a technical issue into language a politician might understand.

    17. Re:READ THE GD ARTICLE by Jane+Q.+Public · · Score: 1

      "You based your position on assumptions that were incorrect, which is why you are wrong. Next time try a little humility and do some research before drawing your conclusions."

      No, I didn't. Repeat: a 1/10th-second clock tick represents completely inappropriate hardware for the job of calculating the positions of high-speed objects, even if the individual clock ticks are accurate within nanoseconds. The article was completely unambiguous in stating that the clock ticks were 1/10th of a second apart.

    18. Re:READ THE GD ARTICLE by Anonymous Coward · · Score: 0

      > My assertion is that if they were converting times from a clock register to either floating-point or a different time scale in software, then it was poor system design.

      No, your assertion was that this was NOT a software problem, and that the clock itself was inaccurate. You were soooooo sure of that fact, despite all the sources not supporting it.

      You insisted your GUESS was correct and that everyone else doesn't know what they're talking about. Even when admitting you were wrong, you shifted the goalposts by claiming that you were making a different assertion...THAT's why you're a blowhard.

  88. Rhapsody in blue by Anonymous Coward · · Score: 0

    The local Saudi station had just finished a piece and was a few seconds into Rhapsody in Blue when the interruption came. They stopped the music and told all listeners to take shelter (alternating in Arabic then English). The sirens on our base did not go off - at all. I was writing a letter to my sister and commented that one of these times somebody is going to get hurt for not responding to these alarms (it was late in the war at this point and people were starting to get complacent in the alarms as we were all believers in the "infallible" Patriot).

    It may have been that they *knew* it was not destined for us - but we usually got alarms for anything in the area. Jubail (where I was) was right in-line for the path to Dhahran. We *should* have got the alarm. Those people would (should) have lived had they got the alarm.

    I did not find out until the next morning that the Army barracks had been hit.

    Anonymous - because I forgot the login to my account about 8 years ago and haven't created another.

  89. Re:The entire purpose is killing. by jtev · · Score: 1

    Oh, but it is so satisfying when you find out that the system killed the people it was intended to kill, or the building, or the other missile, or whatever. There is such a thing as professional pride.

    --
    That which is done from love exists beyond good and evil
  90. Tenths of a second? Why oh why... by shovas · · Score: 1

    Who stores or processes time in tenths of a second? Milleseconds, microseconds, but tenths of a second? I think somebody got their source information wrong or the designers of the system were really on something special.

    --
    Selah.ca. Pause, and calmly think on that.
    1. Re:Tenths of a second? Why oh why... by wonkavader · · Score: 1

      I agree, but remember this was 1991, and a military project. That means they (probably) chose the processor they'd use years before, because military projects take SO LONG and you need to fill out massive amounts of paperwork and have many huge meetings to change anything. So they might have been using a processor from 1984 or 1985, assuming the battery was PAC-2, not a PAC-3.

      They STILL should have used more accuracy, but while we'd do that now by reflex, they might have been trying to cram something into a vastly smaller box than we would now normally envision.

  91. people who program computers !=programmers by hellfire · · Score: 1

    My intent by saying "people who program computers" was not to single out computer programmers, but designers, managers, and everyone else involved too. In military, this involves politicians, government contractors, and generals too, who all make huge piles of money even if the project is less than successful, like the Patriot missile and F-22.

    A programmer was probably told to use that register size, even though he knew it would be flawed.

    --

    "All great wisdom is contained in .signature files"

  92. Re:God yes by symbolic · · Score: 3, Funny

    We even have a modern analog for this - the shift-lock key.

  93. Now I have another reason to use bc by wonkavader · · Score: 1

    Now I have another reason to use bc instead of Excel.

    Bender:~/docs$ bc
    bc 1.06.94
    Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006 Free Software Foundation, Inc.
    This is free software with ABSOLUTELY NO WARRANTY.
    For details type `warranty'.
    850*77.1
    65535.0
    1.0 - 0.9 - 0.1
    0

  94. No by ProteusQ · · Score: 1

    687 m/.3433s = 2001 m/s approx. That's 6566 ft/s, or 1.24 mps, which is 4476 mph.

    That would mean it's traveling at over Mach 5 (at sea level), so perhaps this data is incorrect after all.

    1. Re:No by Anonymous Coward · · Score: 0

      687 m/.3433s = 2001 m/s approx. That's 6566 ft/s, or 1.24 mps, which is 4476 mph.
      That would mean it's traveling at over Mach 5 (at sea level), so perhaps this data is incorrect after all.

      No, this is a mistake on the part of the author. The missile didn't travel 687m in a third of a second; the Patriot miscalculated the position to look for the target by 687m.

      Just goes to prove that "column writers suck at math." :)

  95. so, the obvious(?) question by Nekomusume · · Score: 1

    Actually, the article describes why programers suck at math.

    Instead of using seconds as a time unite and introducing what amounts to rounding errors, systems like these should run straight off the clock ticks. There's no reason for a computer to be using seconds internally in the first place - just convert to them when human-readable output is required.

  96. Compromises by wonkavader · · Score: 1

    You're spot on about irrational numbers.

    But I'm not sure the time critical argument works so well, these days. Let's say we designed a numeric package which was capable of ratio storage and irrational storage as well. Perhaps even able to handle unknowns. Let's say it was 1000 times slower than modern floating point calculations. We'd still be able to do something like a million calculations a second, and we'd know this as part of the parameters we'd use to decide where to use it and how.

    We'd only use it where necessary, or where speed wasn't a critical issue (better to err on the side of accuracy). And if we made such a library pretty standard and everyone used it, processors would probably begin to include specific instructions to improve the performance of "correct" math.

    The thing is that we don't test enough and basically don't care enough whether the math is right. We generally think it's all good enough. And that really ought to change.

    We made the compromises we now live with, after all, when computers were thousands of times slower than they are today.

  97. Re:Implementing meaningful timestamps is not trivi by russotto · · Score: 1

    Fortunately, when dealing with a tactical, self-contained, antimissile system, none of those needs to be important. Any external time measure is irrelevant. What's relevant is that all parts of the system have the same time base. No need to worry about leap seconds, time zones, years, or local noon.

  98. Defense contractors CS skills often lacking by putaro · · Score: 1

    Back in the early nineties a college friend came to me with a problem she was having at work. She was working on a digital compass to be installed in the F-16. There was a raster screen and a circle with tick marks was part of the compass display. They were having difficulties drawing the circle rapidly enough and her management was considering installing a floating point co-processor to make things run faster. After a bit of discussion, it turned out that the circle drawing algorithm they were using was based on sin/cos. She was fairly new on the project and while her graphics background was not very deep, she did know that there should be better ways to do things.

    We had a bit of a discussion on better circle drawing algorithms as well as the joys of pre-computation, look up tables and not redrawing things that were not really changing. I still shudder to think of what other cruft must have been lurking in that software.

    1. Re:Defense contractors CS skills often lacking by Anonymous Coward · · Score: 0

      Shows how much you know. You don't need stupid, hard to maintain look-up tables. The correct solutions is the use techniques like Bresenham's, which reduces the iterative math to a series of additions. See "Bresenham's Line Drawing Algorithm." It reduces the multiplies to additions. The same technique can be used (with a second tier) to reduce the squares (in the Cartesian circle drawing algorithm) to additions.

  99. O'Reilly ? by Anonymous Coward · · Score: 0

    I don't recall too much "transparency" when it came to "accidentally" loading live nukula missiles for transit flights in the US a short time back.
    Wasn't, and still really isn't, a lot of "transparency" with that 9/11 thing you all seem to have in your heads.
    I don't see a lot "transparency" when it comes to timing and details of arm shipments to America's "friends" around the globe. Not much "transparency" in Guantanamo - or hundreds of US spy bases around the globe.
    I don't like the sound of your trumpet, sonny, sounds a little off-key to me. Maybe you only know that one note.

  100. Moderation by Fnord666 · · Score: 1

    So how do I moderate TFA as a troll? If it isn't already an option, it needs to be added. As a plus, this moderation could be used in the performance reviews of slashdot's editors.

    --
    'The tyrant will always find pretext for his tyranny.' - Aesop's Fables
  101. Killer app by Anonymous Coward · · Score: 0

    A true killer app, but for the customer.

  102. Stupid by Anonymous Coward · · Score: 0

    Why are you publishing this stupid story? Every computer scientist in the world is already aware of this. Besides: both IBM and NeXT found ways around this years and years ago. Reading the other comments makes it obvious the submitter and the /. mod both are imbeciles.

  103. Combine a few bits and make decimal computers by azav · · Score: 1

    Seriously.

    --
    - Zav - Imagine a Beowulf cluster of insensitive clods...
    1. Re:Combine a few bits and make decimal computers by azav · · Score: 1

      What I mean is since the processors use binary, why not create processors that actually run not on binary, but decimal, maybe a co processor or something like that? Some part of the hardware dedicated to calculating in decimal.

      --
      - Zav - Imagine a Beowulf cluster of insensitive clods...
  104. I don't understand by Anonymous Coward · · Score: 0

    Why is the clock tick 0.1 seconds?

    On the 3 MHz Sinclair Spectrum it was 0.0000003 seconds.

    And why the worry about it anyway?

    Find the hardware tick (it is NEVER a nice easy number, just like the 0.1 seconds they complain of is not even actually 0.1 seconds but could be 0.1000320548732 seconds) and calibrate your software to use the elapsed time since plot start rather than "the next step".

    Was the patriot written by a bunch of VBA cowboys?

  105. Paid with their lives by sochdot · · Score: 5, Insightful

    I'd just like to point out here that the 28 people were not killed by the failure of the intercept system. They were killed by the nice folks who launched the missile in the first place.

    --
    If at first you don't succeed, destroy all evidence that you tried.
  106. A better title for the article by Todd+Knarr · · Score: 1

    A better title for the article would be "Why people suck at computational error analysis.". Except for the one about the Pentium FDIV bug, every example it gives is one where humans ignored some rule or another governing error propagation in the computations. Back in college, I ran across a book titled "Scientific Analysis on the Pocket Calculator" that had a full third of it's content devoted entirely to error analysis, both of the errors the initial data would introduce (limited precision in the initial data) and ones the calculator itself would cause (limited range, limited number of significant digits, overflow/underflow in intermediate calculations). Computers add the additional fun of dealing with numbers that're rational in base 10 but irrational in base 2. And in school they gloss over all of this, don't bother teaching it. They all teach as if computers have infinite range and an infinite number of significant digits in all calculations. Is it any wonder the results are botched so often?

  107. Poor Investigative Reporting by gregh76 · · Score: 1

    The part about the Partriot missile isn't at all clear on explaining how the computational error resulted in the mistake. A quick search led to this article, which is far more plausable: http://www.mc.edu/campus/users/travis/syllabi/381/patriot.htm

    It's programmers who suck at math, not computers. Computers do exactly what you tell them to, which includes how they're designed to interpret what you tell them to do. This was a tragic example of what can happen when reusing legacy software.

  108. A poor workman blames his tools... by shking · · Score: 1

    If the story in the example is true, then t's painfully obvious that the coder(s) in the example didn't understand some core concepts in math and computer science

    --
    -- "At Microsoft, quality is job 1.1" -- PC Magazine, Nov. 1994
  109. Re:Implementing meaningful timestamps is not trivi by janwedekind · · Score: 1

    You are right. In a self-contained system you can choose your own time base of course and absolute time is not important (only relative time). However I think it does no harm to be aware of those things.

  110. on a related note, windwos 7 calculator by UltimApe · · Score: 1

    can do those maths just fine.

    --
    "Infecting minds with my own memetic virus, one post at a time." Ultimape
    1. Re:on a related note, windwos 7 calculator by narcc · · Score: 1

      Is this a Windows 7 troll?

      XP's calculator gets the answers right as well.

      So does the generic calculator on Ubuntu 8.04, Office XP's version of Excel, and OOo 3's Calc.

  111. Re:Computers don't suck at math, some programmers by Anonymous Coward · · Score: 0

    Yes, too bad these programmers in the 80s didn't just use Python!

  112. In this case all you need to know by SharpFang · · Score: 1

    is when it's much better to use fixed-point arithmetic,
    If you're working with 0,1s ticks, make your clock an integer counting these ticks and use them universally throughout your software.
    Whenever you face the operation of division in your program, think twice whether it wouldn't be better to replace the basic unit by the one pre-divided and use integer multiplication elsewhere instead. No mess associated with floating point operations.

    --
    45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
  113. To Tell the Truth. by DoninIN · · Score: 1
    I think most of the thinking is still that Patriots didn't really intercept the Scuds, in the sense of killing the warheads, and when they did it was mostly luck anyway. The fuses on the warheads weren't fast enough etc.

    But that does not matter one bit. The Patriot missiles won the first gulf war. Yes I meant that, the scud was a political weapon, the attacks were intended to trick Israel into getting into the war and creating a political meltdown. These scud attacks weren't going to do a serious amount of damage in real terms, to anything. The Patriots seemed to intercept them at least some of the time and they removed much of the terror of the these terror attacks. As a political weapon it was awesome.

    Now back to the point. The Patriot wasn't originally intended to intercept targets going that fast, it was meant to shoot down airplanes, so it wasn't designed initially to cope with targets going that fast. The whole problem as described isn't that computers suck at math. It's that reality isn't digital and when you represent it as such you need to be very careful. But I still have a problem blaming these guys for this bug, that missile system was built to shoot down aircraft, not missiles. Aircraft that might be travelling at subsonic speeds

  114. Solution Unsatisfactory by LandGator · · Score: 1

    Put a milspec GPS in every major component to get the atomic-clock-accurate time before critical functions.

    --
    There is nothing wrong with yr Internet. Do not attempt to adjust the picture. We are controlling the transmission - NSA
    1. Re:Solution Unsatisfactory by ErikZ · · Score: 1

      You mean you want an anti-missile system to be easily defeated by GPS jamming?

      --
      Democrats or Republicans. They are both taking us to the same place and they are not afraid of us anymore.
    2. Re:Solution Unsatisfactory by LandGator · · Score: 1

      'GPS Jamming' screws with the location determination, not the atomic clock functionality. Sheesh. RTF, folks.

      --
      There is nothing wrong with yr Internet. Do not attempt to adjust the picture. We are controlling the transmission - NSA
  115. Baiting Saddam by michaelmalak · · Score: 1

    The Bush I administration tricked Saddam into thinking the U.S. would not respond to an Iraqi invasion of Kuwait. Accordingly, 100,000 people paid with their lives.

  116. Better still... by ClickOnThis · · Score: 1

    ...store the time as two integers: one for seconds, and the other for milliseconds, microseconds, or whatever you wish. You need to handle the carry, but that's the price you pay when you want that kind of resolution with a 24-bit register Lots of time-critical software running on satellites uses an approach similar to this.

    To quote Kernighan and Plauger in The Elements of Programming Style: "10.0 times 0.1 is hardly ever 1.0." This Patriot episode exemplifies their words tragically.

    --
    If it weren't for deadlines, nothing would be late.
  117. umh by AlgorithMan · · Score: 1

    But all these tiny amounts add up

    only if the algorithm is numerically instable...

    --
    The MAFIAA is a bunch of mindless jerks who will be the first up against the wall when the revolution comes
  118. this is a staple in numerical analysis courses by peter303 · · Score: 1

    Error analysis is important part of your basic graduate level numerical analysis course. This occurs in the floating point approximations used in most computers. Also large matrix calculations which can multiply and sum numbers a huge amount of times.

  119. 503 service unavailable by cdn-programmer · · Score: 1

    Why do we keep getting this error on this story when trying to reply toe the abstract. its been there for hours.

    Error 503 Service Unavailable

    Service Unavailable
    Guru Meditation:

    XID: 1125948833
    Varnish

  120. 32768 Hz by OrangeTide · · Score: 1

    The crystal you find in an ordinary wrist watch gives you ticks in increments of 1/32768 second. 32768 is a power of 2 (2^15) and encodes extremely well into a binary computer.
    If a cheap watch can encode time in a way that is convenient for a computer, why can't a billion dollar missile system?

    --
    “Common sense is not so common.” — Voltaire
  121. Still alive and well by cdn-programmer · · Score: 3, Interesting

    Crap like this was alive and well when I was in uni and its still alive and well.

    Witness: Limits to Growth written by Meadows et al: http://en.wikipedia.org/wiki/The_Limits_to_Growth

    Consider that book was written in 1972. I was programming computers in 1972. I actually did a course in numerical analysis in 1972 and just re-read the first 10 pages or so. I happen to have read a masters thesis that came out of the Colorado School of Mines where the author stated Meadows' Runge Kutta Numerical Integrations did not converge.

    Yet that book is still often quoted. Its been flawed from the get go. So consider something else! How fast were the machines that Meadows used? How big? What would be the MOST SOPHISTICATED model he could use at the time. How could _anyone_ take seriously predictions made by a primitive model run on such a machine?

    Witness: The current discussion about Global Warming and Climate Change. The change in CO2 over the last 100 years is about 100 ppm if you can believe the data. This is 100/1,000,000 = 0.0001. Now the thing is this. A 32 bit float holds about 6.9 digits of precision. Lets call it 7 digits. If one were to add a whole number of some kind to the fractional change of the CO2 as measured relative to the total gases in the atmosphere then one has 7-4 = 3 digits or less to work with.

    Of course one can use a double precision float. That isn't my point. One has to be an EXPERT in order to avoid huge problems with propagating rounding errors.

    Its not just about pretending computers use base 10 when they don't, its about knowing the actual properties of a number of type float and what the consequences are when we use it.

    In the case of that rocket I suspect the rounding error can be solved by normalizing everything so the time line is not in seconds but is actually in clock ticks... as accurately as they can be determined of course.

    But in my career I have seen so few programmers who can do this that I've never even needed to look at a finger or a toe for something to count on. Nada - never met one.

    I'll give another example. More than one project team that I worked with had no idea how floats even work! To sit there and try to use floats for their Accounts Payable and Accounts Receivable and then say they can't understand why nothing will balance? Arrghh! IMHO its downright incompetence. They needed to use comp which COBOL supported which is base 10 or normalize all their money into pennies and handle the decimal when the data was read in and printed.

  122. use case considered harmful by epine · · Score: 1

    No one expected a Patriot in air defense mode to stay stationary for 10 hours let alone 100.

    Use case blindness is an incredibly rich source of severe system errors. Proof of correctness is hard enough without a clutter of use case clauses (apparently) lopping off obvious failure modes. Until they don't, because, oh ya, use cases evolve long after the coding is done. [Cue John Mellencamp].

    And the thing is, it wasn't hard to just do it right in the first place. If they were working in a language with C-level abstraction (which isn't much), it's trivial to create a type such as uptime_t which counts in integer ticks (of whatever granularity you require) and has an uptime range which is very nearly impossible to overflow (months or years at a dead minimum). A 64-bit integer incremented at 4GHz won't overflow for 4 billion seconds, more than a century. Few timekeeping systems increment at 4GHz, so a century is your worst case. Hey brother, can you spare me eight bytes?

    For Want of a Nail a lot of COBOL programmers came out of retirement.

    Even a lowly pair of 24 bit integers (if that was their machine architecture) can be used to create a 48 bit integer with increment and difference at almost zero overhead. You can augment this with a saturating 24 bit uptime_diff_t. If the answer comes back as 2^24-1 deci-seconds (about two weeks) your code should interpret this as "blink and it's gone". These types can be implemented in asm with a two or three 10 line macros, at a cost only a handful of extra cycles at run time.

    Floating point conversion of the diff_t result would have been fine (elapsed time of flight for a Scud missile isn't going to overflow anything). Nothing required here but clear thinking and a refusal to accept "that can't happen" use cases lightly.

    BTW, some people are confused about the precision required: they aren't trying to hit the missile with this calculation, but position an acquisition window for a higher-precision targeting system, if I got the drift.

    There's a time and place for use cases, and there is a time and place for a more rigorous foundation.

    There are no atheists in foxholes. Corollary: the only use case is whatever saves their skin. I didn't notice any of the soldiers under the bridge in Apocalypse Now sitting around reading their user manuals by the rocket's red glare.

    1. Re:use case considered harmful by snaz555 · · Score: 1

      Floating point conversion of the diff_t result would have been fine (elapsed time of flight for a Scud missile isn't going to overflow anything). Nothing required here but clear thinking and a refusal to accept "that can't happen" use cases lightly.

      BTW, some people are confused about the precision required: they aren't trying to hit the missile with this calculation, but position an acquisition window for a higher-precision targeting system, if I got the drift.

      You still need to pay close attention to what you're doing. If you communicate a time stamp to the targeting system using the equivalent of (double)timestamp*0.1f you will have the same error as if you added up 0.1f for each tick. It's quite easy to accidentally use this form instead of (double)timestamp/10f, because multiplication is cheaper than division, making it natural to prefer it.

  123. Re:Still alive and well (part II due to the 503) by cdn-programmer · · Score: 1

    So here is an example from Elementary Numerical analysis, S.D, Conte and Carl de BOOR circa 1965, 1972 ISBN (library of congress card number?) 73-174612:

    Calculate the roots of the following equation:

    x^2 + 111.11x + 1.2121 = 0

    use base-10 5 digit floats for this.

    one can use x = (-b (+/-) SQRT( b^2 - 4ac)) / 2a in order to do this.

    One will get:

      b^2 = 12,345
      b^2-4ac = 12,340
      SQRT(b^2-4ac) = 111.09
      x = (-b + SQRT(b^2-4ac))/2a = -0.010000

    The correct answer is -0.010910

    Note that we have gone from 5 digits to 2 digits of accuracy. This is on page 12.

    One can use this formulation: x = -2c / (b + SQRT(b2-4ac)) which will give the answer to 5 digits precision.

    Here is another example:

    f(x) = 1-cos(x) for very small x. Lets use 6 digit arithmetic and compute near x=1.0e-6 The error can be as large as 0.5e-7

    yet f(x) = 1-cos(x) = (1-cos(x)^2)/(1+cos(x)^2) = sin(x)^2/(1+cos(x)^2) which can be evaluated quite accurately.

    Again = GIGO!

  124. So, then, to sum up: by Chris+Mattern · · Score: 1

    A designer used a horribly inappropriate data representation, which led to fatal bugs in the program, and this is proof that computers are bad at math. Uh-huh.

  125. let me rephrase the title: by buddyglass · · Score: 1

    "Why Computer Programmers Suck at Math

    There's a whole discipline called "Numerical Analysis". Whoever programmed the Patriot's tracking software should look into it.

  126. Actually computers don't suck at math. by DJGrahamJ · · Score: 1

    Just throwing that out there...

  127. this isn't about computers sucking at math by Klintus+Fang · · Score: 1

    It's about programmers lacking basic knowledge (or at least failing to take account) of how a computer works internally. well written and well validated software doesn't have problems like this.

    --
    In a minute there is time For decisions and revisions which a minute will reverse. -T.S. Eliot
  128. Don't mind me... by benjamindees · · Score: 1

    Sorry, I didn't read your entire post before responding. You're assuming they have some need to convert to floating point, which sounds completely retarded to me (not to mention a missile intercept system with a 0.1 second resolution). At this point, the utter incompetence of 90% of the US and it's military, educational system and most of it's industry is absolutely no surprise to me. Of course that doesn't make it any less frustrating.

    --
    "I assumed blithely that there were no elves out there in the darkness"
  129. man google sucks at math by Anonymous Coward · · Score: 0

    i even tried 599,999,999,999,999 - 599,999,999,999,997.9 and it still equals zero

  130. "Why most programming languages suck at maths" by DamnStupidElf · · Score: 2, Informative

    LISP, Scheme, Haskell, Mathematica, Maple, and plenty of other languages support arbitrary precision rational numbers as built in types. This fixes all rounding errors involving rational numbers (including fractions). If irrational numbers like pi, e, or transcendental functions are necessary, then there will always be inherent error in the representation and the programmer has to know how to do with that error and calculate the expected error of a sequence of operations. If you want to get fancy, you can use an algebraic language like Mathematica to symbolically solve your equations and maintain perfect accuracy with symbolic representations of irrational and transcendental numbers.

  131. Is this about the incident in the first Iraq war? by RichiH · · Score: 2, Informative

    While I agree that the design decisions which lead to this were poorly made, this error was common knowledge.

    The Patriot system _must_ be restarted every X days, exactly due to this bug. This is documented and everything.

    While the initial error was with the people who created the Patriot system, the soldiers who were assigned to the system were the ones who made sure that a documented bug with a known-good work-around became a loss of life.

  132. Anonymous Coward by Anonymous Coward · · Score: 0

    Two comments:

    1. Any article on this subject that does not discuss the acceptable accuracy required for any particular calculation is hardly worth reading. It seems to assume that all calculations have an absolute exact answer. Should we even be discussing an article that talks about a computer not being able to "do" stuff properly, without differentiating between specification errors, coding errors, hardware errors, usage outside the spec, etc etc.?

    2. The discussions of the Patriot and Ariane cases in the articles are travesties of the actual events. Note that in both cases trying to use old software for new purposes were major factors.

  133. Why wanna-be coders suck at Math by Domini · · Score: 1

    Is what the article should have been called.

    Anyone worth their salt in numerical analysis and scientific computation would not make this error.

  134. Stupidity by Anonymous Coward · · Score: 0

    Computers don't make mistakes. Programmers do.

  135. 1970's QA was fantastic by Anonymous Coward · · Score: 0

    Don't blame QA.

    The system worked fine in my PowerPoint design!!!

  136. Why Slashdot sucks at writing headlines by mbeckman · · Score: 1

    Computers are just dandy at math, thank you. Some programmers aren't so hot, but they can be trained. Slashdot, on the other hand, continues to generate gratuitous inflammatory headlines. Training does not appear to be effective. As others have pointed out, abusing a computer does not make the computer "sucky", anymore than abusing English makes it suck at expressing thoughts concisely. Slashdot consistently abuses its audience with misleading and downright false headlines, such as this one.

  137. Unsurprising results by Kenoli · · Score: 1

    Turns out computers do exactly what they're programmed to do. Who knew?

  138. Everyone knows by dave87656 · · Score: 1

    Everyone (or at least most software people) know you can't do exact math in hardware. Usually it's good enough, but mission critical and financial applications have to have their calculations implements in software (eg. BigDecimal in Java).

  139. Poor pilot by phorm · · Score: 1

    Something like the two pilots that recently overshot their destination and didn't notice for several hours despite numerous warnings, phone calls, and other notifications.

    There is only so much one can do to compensate for PEBKC, and in the case of modern aviation you expect that the person between the keyboard (or console) and the chair is a trained professional who doesn't make stupid mistakes like that and doesn't need a big red flashing light for every different stupid thing he/she might do...

  140. Re:The entire purpose is killing. by OeLeWaPpErKe · · Score: 1

    Question : do you people honestly believe this ?

    If normal people actually accepted ideas like this Europe would be Nazi today, for one thing. I also seriously doubt America would still be a democracy.

  141. IEEE floating point by jipn4 · · Score: 1

    For starters, IEEE floating point is a lousy design, from its needlessly complex special cases to its atrocious error handling. That kind of poor and overly complex design is symptomatic for a lot of floating point software.

  142. Computer Science, or lack there of by mlwmohawk · · Score: 1

    Ok, this just pisses me off. Computers do not suck at math because they don't do math. They do add, subtract, multiply, and divide within their limitations. Hell, it wasn't until the 8088 that I used a micro that could multiply or divide. It has *always* been the job of the computer scientist to understand this.

    It is *new* computer science that teaches languages like java or C# that abstract "computers" from the programs. I guess using theoretical computers is easier for moron professors to teach. I learned computer science from one of the old school teachers (ex-navy weather research) who would bitch about the usage of bits and bytes in a program.

    I'm not sorry to say that "Computer Science" has to be more about the science of using COMPUTERS, not about some abstract ideal computer like edifice "virtual machine." I learned computer science as a way to model REAL problems on REAL computers, understanding limits and even using them advantageously. Algorithms are often incomplete or fundamentally wrong when they ignore the simple fact that they are running on a real computer with real limitations.

    For all you java and dot net zealots who say, and I am quoting many of you, "Why do I need to know how that works" when it comes to lists, trees, hash tables, mutex, semaphores, MATH, and so on. This is but one example. If you KNEW how these things worked, you wouldn't be bitten in the arse when they didn't do as you imagined they would.

    One of the first things you should learn in computer science in school or self education is that floating point is an approximation with limited precision. Any math done with it, must be done in the correct order that preserves as much precision as possible, and even then, if your precision requirements exceed the decimal accuracy of 64 bit floating point, then you can't do your math with floating point. You will have to code your own or buy/use a 3rd party precision math package.

    Floting point is fine for a lot of things, but not everything, but if your computer science teacher didn't beat you head in with the limitations, you missed out. "How" "real" computers work is fascinating, and just knowing how they work affect how you code. Here's the big question.....

    Now that you know floating point is not accurate, how many past projects would you double check to make sure they really do work?

  143. The irony of /. moderation by blamanj · · Score: 1

    Assuming that most of the /. crowd are programmers, I find it rather ironic that most of comments moderated up to 5 talk about needing better QA and the comments about learning numerical analysis are down at 1 and 2. Blaming someone else may feel good, but learning intricacies of the profession are what it takes to actually fix the problem.

    That said, as someone who's actually studied NA, I don't apply it often enough, because the tools we use day to day don't help very much.

  144. Round it up by Anonymous Coward · · Score: 0

    Why don't "they" just add in the rounding off amount times "time elapsed"? Is that so hard?

  145. Re:God yes by phoenix321 · · Score: 1

    Caps Lock is CRUISE CONTROL FOR COOL

    And I'd love to have all websites yelling at me when I'm about to enter a capslocked-password a final third time.

  146. Re:God yes by Mr.+Freeman · · Score: 1

    I'd love if you just checked your damn caps lock key when you got it wrong the first time. Even with cruise control, you still have to steer.

    --
    -1 disagree is not a modifier for a reason. -1 troll, flaimbait, redundant, overrated are NOT acceptable substitutes.
  147. Uh... because computers only have a finite number by sherifffruitfly · · Score: 1

    of numbers with which to approximate an infinite number system. That's the root of the reason why the mathematical field of numerical analysis exists. (goes to re-read QR factorization for shits n giggles)

  148. Re:God yes by phoenix321 · · Score: 1

    Not all netbooks have a caps lock LED, only an app that brings up a small tooltip-like message when you switch capslock on or off that disappears after 4 seconds, if you even saw it.

    The capslock-notification is a Windows-only app, of course. Yes, there's probably an apt-get for everything and this one, too, I should've bothered, but I despise capslock anyway. With a passion. I'll probably remap it to right shift, if I ever find some free time.

  149. Re:The entire purpose is killing. by jockeys · · Score: 1

    so, so true. +1

    --

    In Soviet Russia jokes are formulaic and decidedly non-humorous.
  150. Re:Patriot success rate was likely extremely infla by Anonymous Coward · · Score: 0

    Just wanted to note that I love the site linked in your sig, and I'm glad you're promoting it.

  151. don't blame the goalie by bill_mcgonigle · · Score: 1

    what he was really doing who should take the blame for the failure that killed people, not the computer.

    Or, y'know, call me crazy, but maybe we should blame the Iraqi government for launching the missiles with intent to kill civilians? That Patriot did any good at all (if it did) is just added good fortune.

    --
    My God, it's Full of Source!
    OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
  152. Too little resolution by jtgd · · Score: 1

    You are on to the problem - lack of resolution - but a little off in the numbers.

    At 687 meters in 0.3433 seconds it is traveling 2000 m/s. So with only 0.1 second accuracy you can only get within +/- 200 meters. Even if there were no floating point conversion error, or accumulation of errors, 0.1 second is simply not enough resolution of time to intercept the missile.

    How could they ever think this would work?

    --
    J
  153. Re:The entire purpose is killing. by jtev · · Score: 1

    The Nazis conquered quite a bit of Europe because of professional pride. They were defeated by Russian blood, and US industry. Sometimes it's not enough to be good, you have to be quantitativly better than anyone else. Despite being evil bastards, they produced some damned fine guns, tanks, and planes. It is a mistake to think that just because someone has beliefs you think of as "evil" that they are not competent. Unfortunatly history is full of quite competent evil. Now, if my attitude constituted "evil" in your mind, to be honest, I don't care. Because someone who thinks like you is to much of a coward to do anything about it.

    --
    That which is done from love exists beyond good and evil