Slashdot Mirror


Why Computers Suck At Math

antdude writes "This TechRadar article explains why computers suck at math, and how simple calculations can be a matter of life and death, like in the case of a Patriot defense system failing to take down a Scud missile attack: 'The calculation of where to look for confirmation of an incoming missile requires knowledge of the system time, which is stored as the number of 0.1-second ticks since the system was started up. Unfortunately, 0.1 seconds cannot be expressed accurately as a binary number, so when it's shoehorned into a 24-bit register — as used in the Patriot system — it's out by a tiny amount. But all these tiny amounts add up. At the time of the missile attack, the system had been running for about 100 hours, or 3,600,000 ticks to be more specific. Multiplying this count by the tiny error led to a total error of 0.3433 seconds, during which time the Scud missile would cover 687m. The radar looked in the wrong place to receive a confirmation and saw no target. Accordingly no missile was launched to intercept the incoming Scud — and 28 people paid with their lives.'"

13 of 626 comments (clear)

  1. Re:Poor QA by Anonymous Coward · · Score: 5, Informative

    This particular story took place in 1991, and most of the code for Patriot was written in the 70s - needless to say, software QA was a little more lax back then. The fix for this problem was out a couple days after the incident.

  2. "User error"? by wisebabo · · Score: 4, Informative

    I actually read about this specific incidence once; I seem to remember (though honestly not sure) that the design flaw was known and the user manual indicated that the computer needed to be reset every 36 hours. However, in wartime, under attack (there were frequent Scud intercepts), the crew controlling the missile battery opted against shutting it down if even for short time. Maybe even though the manual said it SHOULD be rebooted it did not explain WHY or what the consequences would be.

  3. Re:Poor QA by commodore64_love · · Score: 4, Informative

    >>>It's also pretty pathetic that the system designers implemented a broken design and did not foresee this problem. High-resolution timekeeping has been accomplished pretty successfully already...

    I sorry.

    j/k.

    We had a similar problem with an Aegis design, and it was a major headache for us Hardware engineers to try to convince the Systems Engineers that counting in Binary time was more logical than counting in 0.1 second increments. The SEs kept insisting that their computers at home accurately count in seconds and we hardware engineers should be able too. The HE manager and the SE manager were butting heads for about a month over this issue, until finally an upper-level manager handed-down a decision in favor of the HE manager and binary-based counting/requirements documentation.

    I guess in the Patriot situation, the decision went in the opposite direction. Hence errors we introduced.

    --
    "I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
  4. This problem has been solved since the 1960s by tjstork · · Score: 3, Informative

    I remember this from a numerical methods class in the 1980s. To deal with situations like this, you can do one of three things :

    a) Have a function that you sample as a function of t, so you don't get accumulated error.
    b) Have enough bits so that error won't be an issue. This is actually hard to do because floating point errors do stack up pretty quick if you are not careful.
    c) Or, you can have an error term which you can use to make adjustments along the way to account for a lack of precision. Bresenham's line does that more or less exactly when he does his lines. That's why you had "stair stepping" as the algorithm corrected itself along the way.

    If the OP was correct, then PATRIOT failed because it did none of them. My bet is in reality, they simply underestimated the actual error term, but did everything else correct. This could be because of discrepancies in flight control instrumentation or some sensor, or, they were simply trying to save money on bits and didn't really do the calculation as to how far the missile could be off in an error term length seconds of flight at a particular phase in its flight profile.

    Bottom line is, the engineering discipline exists to solve this problem and is really no different than error handling in any guidance system. Putting a man on the moon, launching an ICBM at target, shooting down a missile, are all essentially the same computer science problem from an error management perspective. The Phd's already nailed this decades ago. There's not a fundamental limitation to computing, in this case, merely, a failure or inability of engineers on this project to apply the correct known answer to this problem.

    --
    This is my sig.
  5. Re:Curse of binary floating point by Carewolf · · Score: 5, Informative

    Fixed point never rounds when operating in the range and precision for which it is designed. In this case they needed a precision of .1, using INT/10 would be 100% accurate and never give them any rounding errors for this use case.

    So, in other words: You are wrong, and should probably considering using fixed point more.

  6. Re:Curse of binary floating point by Carewolf · · Score: 3, Informative

    With fixed point you can choice the basis of the fraction part. A binary fixed point would not help them, but a decimal fixed point of /10 or /100 would. The algrebra of fixed point is the same no matter what base you choice. This means it is fastest way to get decimal based fraction instead of binary fractions (decimal floating point is best with hardware support).

  7. Re:Curse of binary floating point by PhilHibbs · · Score: 4, Informative

    Well, in this specific instance a decimal system would have been ok, but it isn't a general answer. The general answer is "make sure your increments are divisible into your number base", if they had used 1/8th or 1/16ths of a second, or even 3/32 of a second, as their timer increment then they would not have had this problem. There's no reason why 1/10th of a second has any magic properties.

    In general terms, all number bases have other number bases with which they are incompatible. The inability of binary to represent 1/10 accurately is just the same as the inability of decimal to represent 1/3 accurately. It's only because we use decimal all the time that we overlook decimal's shortcomings (or instinctively compensate for or avoid them) and then blame computers for binary's incompatibility with decimal.

  8. Re:Poor QA by TheRaven64 · · Score: 4, Informative

    Everybody knows that they exist, fewer people know how to avoid them. Lots of early multimedia frameworks, for example, were written using floating point timestamps and developed this exact problem (add some fraction repeatedly for each audio and each video frame, and after an hour the two tracks are noticeably out of sync). Now, they use a numerator-and-denominator form which is simple to add without rounding errors and so you only get them when you convert to floating point for comparison.

    Even fewer people realise how compiler and hardware dependent they can be. For example, if you do a sequence of floating point operations on x86 then the values will stay in 80-bit registers until they are stored out to a variable. If you compile the same code for a newer machine with SSE or for another architecture then you will get 32-bit operations on your 32-bit floats and so you'll have less precision. A lot of compilers will even generate different precision between debug and release builds.

    --
    I am TheRaven on Soylent News
  9. Patriot success rate was likely extremely inflated by neapolitan · · Score: 4, Informative

    I know that I'm arguing with a trolling AC, but for the other readers of slashdot, you should know that the grandparent's post refers to the controversy regarding the analysis of the Patriot system during the first Gulf war. There was a huge propaganda machine behind the Patriot's "successes" which turned out to be very near zero indeed. This was covered in a series of hearings in the early 90's...

    http://www.fas.org/spp/starwars/docops/pl920908.htm

    You can also read up on this from transcripts from the hearings after the war.

    In the interests of fairness, here is a rebuttal / review.

    http://www.fas.org/spp/starwars/docops/zimmerman.htm

    I remain unconvinced -- from reading this (almost 20 years ago) I concluded that at best, the military did not know for sure that these worked well.

    --
    Slashdotter, ID #101. UIDs are in binary, right?
  10. Re:Poor QA by SpinyNorman · · Score: 3, Informative

    Someone posted the actual GAO report on this, which makes a bit more sense than the gibberish TechRadar arcticle.

    http://www.fas.org/spp/starwars/gao/im92026.htm

    The way the system is sure it's tracking the target it was given is by predicting where it should be seen next based on speed and diretion, and then only looking for it in a window ("range gate") around that predicted position. The window is a point in space-time and therefore has time coordinates as well as space coordinates, and the problem was that the Patriot system apparently used absolute time since power on to specify the time coordinate, hence the error accumulation. The problem could have been avoided simply by using a time coordinate relative to the last tracked postion rather than an absolute one.

    The GAO report also blames the 24 bit registers of the 1970's era hardware as limiting accuracy which is just garbage. A good excuse to a politician perhaps, but there was nothing stopping them from using a 64 bit, or whatever, math library if that would have helped.

    Of course the Patriot was being used outside of it's original requirements spec when being used to target SCUDs, so it seems someone really screwed up in not reviewing the design beforehand and determining it's limitations (and fixing them) rather than finding out after the fact when 28 people are dead as a result.

  11. Re:Poor QA by Tacvek · · Score: 4, Informative

    Yes. The issue here sounds like they had a system clock counter that was an integer, that counted the number of 0.1 second clock ticks. Then they wanted to convert this to a floating point number in 24 bit IEEE format, They simply multiplied 0.1 by the integer in the register. Of course, that still sounds like too large an error top have occured from just that, but lets pretend it did.

    There are several issues here. For missiles travelling at such speeds, using a system clock counter based on 0.1 second ticks sounds terribly coarse to me. Second, since 0.1 seconds are the baseline resolution of the system, the system should have been using floating point numbers where '1' corresponds to a decisecond rather than a second. Then the time counter would be exactly expressible in the floating point format.

    Lastly, if the floating point format really needed to be in units of seconds, rather than deciseconds, the time counter should have been loaded in, having an exact representation, then it should be divided by 10, which has an exact representation. This is all prety basic to anybody who has even a limited understanding of floating point. If you understand the inherent precision of every operation even better than I do, even more improvements would be possible.

    But to be honest, I'm not sure why floating point was used at all here. It sounds to me like fixed point may have worked just fine for most of these problems. (Of course, fixed point has its own set of rules ensuring maximal accuracy. )

    --
    Stylish sheet to fix many problems in Slashdot's D3: https://gist.github.com/801524
  12. Re:Poor QA by Neoprofin · · Score: 3, Informative

    One of the other results (the first one that comes up for me actually) claims that in testimony presented to Congress Postol's methodology was called out as flawed based on the fact that three or eight Patriots were launched at every incoming missle and his video analysis is done per interceptor fired completely ignoring the massive odds against more than one interceptor making a hit. The Isreali's independent analysis puts the success rate at 50%.