Why Computers Suck At Math
antdude writes "This TechRadar article explains why computers suck at math, and how simple calculations can be a matter of life and death, like in the case of a Patriot defense system failing to take down a Scud missile attack: 'The calculation of where to look for confirmation of an incoming missile requires knowledge of the system time, which is stored as the number of 0.1-second ticks since the system was started up. Unfortunately, 0.1 seconds cannot be expressed accurately as a binary number, so when it's shoehorned into a 24-bit register — as used in the Patriot system — it's out by a tiny amount. But all these tiny amounts add up. At the time of the missile attack, the system had been running for about 100 hours, or 3,600,000 ticks to be more specific. Multiplying this count by the tiny error led to a total error of 0.3433 seconds, during which time the Scud missile would cover 687m. The radar looked in the wrong place to receive a confirmation and saw no target. Accordingly no missile was launched to intercept the incoming Scud — and 28 people paid with their lives.'"
It's pretty pathetic and negligent that software that controls explosive missles was not tested for over 100 hours of operation. That's a standard Quality Assurance procedure for even the simplest low-budget hardware...
It's also pretty pathetic that the system designers implemented a broken design and did not foresee this problem. High-resolution timekeeping has been accomplished pretty successfully already...
I wonder how much time and money was spent in research and development for this thing
It doesn't seem like we're getting a quality product for the likely huge sum that was paid for it...
Use decimal floating point or simple swich to fixed point. Fixed point not used as often as it should, and many developers don't know how difficult ordinary floiting point really is.
Mathematica. 'nuff said.
the IEEE specification stuff is literally designed knowning there will be calculation errors. Don't use this" create your own number system like mathematica does for 100% accuracy always.
So, in other words, the programmers for this piece of "mission-critical" software were not aware of floating point arithmetic and error propagation? What does that have to do with "computers" in general?
Use fixed point numbers? You know, in financial apps, you never store things as floating points, use cents or 1/1000th dollars instead!
Computers don't suck at math, those programmers do. You can get any precision mathematics on even 8 bit processors, most of the time compilers will figure out everything for you just fine. If you really have to use 24 bits counters with 0.1s precision, you *know* that your timer will wrap around every 466 hours, just issue a warning to reboot every 10 days or auto reboot when it overflows.
1) This problem was covered in Risks Digest years ago.
2) Design and production phase was completed in 1980.
http://catless.ncl.ac.uk/Risks/10.82.html#subj1
is a good start for "Why the hell are we using this weapons system the way we are?"
As memory serves the fix is to restart the system perodically.
As memory also serves that's been part of the operating procedure for a very long time.
Shouldn't we focus on the fact that without computers, even MORE people would die? This article seems to make the conjecture that somehow these instruments are worthless, but it appears the writer of it sucks at math as well.
# ppl who would die without computers -MINUS- # ppl who die with computers = # of lives SAVED by computers.
That second # isn't bad, it was already there before computers came along!
Translation: computers are only as smart as the people programming them... and there's plenty of stupid people out there.
We knew this. This is no great revelation. So why is this news?
"All great wisdom is contained in .signature files"
All they had to do is use integers, where a value of 1 represents 0.1 s.
You know it makes sense, a little reminder from jointm1k.
'The calculation of where to look for confirmation of an incoming missile requires knowledge of the system time, which is stored as the number of 0.1-second ticks since the system was started up. Unfortunately, 0.1 seconds cannot be expressed accurately as a binary number, so when it's shoehorned into a 24-bit register -- as used in the Patriot system -- it's out by a tiny amount.
But all these tiny amounts add up. At the time of the missile attack, the system had been running for about 100 hours, or 3,600,000 ticks to be more specific. Multiplying this count by the tiny error led to a total error of 0.3433 seconds, during which time the Scud missile would cover 687m'
Nonsense, it's perfectly possible to design a computer that can accurately tell the time. What caused Patriot to fail was that over an expended period, the clocks went out of sync, between the various dispersed sub-systems. As Patriot wasn't designed to be switched on for so long.
Regardless, what isn't possible is is to design a system that can accurately track and shoot down missiles in flight. As the Patriot defence system so patently demonstrated. As I recall, it succeeded less than 50 % of the time. Which begs the veracity of the starwars SDI project. Just another excuse to spend billions on the defence budget.
Found this article matching the criteria only dated February 28, 1991.
It just didn't seem plausible did it... How this correlates to modern computer FP calculations is beyond me.
The problem seems to be right out of the textbook for "Practical Analysis" (not sure if this is the correct translation for the german "Praktische Analysis"). This was a nandatory course for every computer science degree during my university time (20 years ago). Don't know if this is still the case. It was an eye opener to see how correct formulas and a perfectly working computer could yield absurd results. Several times i was asked for help by people claiming their Excel was broken due to such mistakes.
CU, Martin
The computer is not at fault here. The problem is the moron who thought floating point representation is a good choice for a fixed point value.
These problems are just too common. Like some game company discovered weird display in their game. Found out that floating point numbers are not very precise when far away from 0, like in a huge seamless world.
Any first year compsci student should know that this happens, and should know to choose data types that can represent the data to the needed degree of accuracy.
A simple struct {int integral_part, int decimal_part}; would do the job for this. Or since you care exactly about .1 second increments, you could even use integral values in the first place. With 24 bits, you can cover 19 days before it overflows, and almost half a day on top of that to provide a buffer if bad guys show up right as the scheduled reset comes up.
100 hours = 3,600,000 ticks? Wait, summary math is wrong. One hour = 60 minutes. Each of those 60 minutes is 60 seconds. 60 sets of 60 seconds is 60 * 60 = 3,600 seconds per hour. 100 hours means 100*3,600 = 360,000. Either they missed a digit and the system was online for 1,000 hours straight or they added one to the final result.
I actually read about this specific incidence once; I seem to remember (though honestly not sure) that the design flaw was known and the user manual indicated that the computer needed to be reset every 36 hours. However, in wartime, under attack (there were frequent Scud intercepts), the crew controlling the missile battery opted against shutting it down if even for short time. Maybe even though the manual said it SHOULD be rebooted it did not explain WHY or what the consequences would be.
The problem is the programmer, they should simply have maintained a count of the ticks in an integer and then multiplied it by 0.1 when necessary. Even better, use a proper data type, not a suckish 24-bit float in a freaking weapon, unless they understand very well what are they doing.
Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41) [GCC 4.3.3] on linux2
...
Type "help", "copyright", "credits" or "license" for more information.
>>> from decimal import Decimal, getcontext
>>> n = 0
>>> tick = Decimal('0.1')
>>> for i in range(3600000): n += tick
>>> n
Decimal('360000.0')
>>> Decimal(1) / Decimal(7)
Decimal('0.1428571428571428571428571429')
>>> getcontext().prec = 50
>>> Decimal(1) / Decimal(7)
Decimal('0.14285714285714285714285714285714285714285714285714')
And, yes, I know that Decimal in Python 2.6/3.1 is slow. Will be faster in 2.7/3.2. And there are similar libraries in Java and other languages.
There's a hidden treasure in Python 3.x: __prepare__()
Talk about misleading headline. All was at the programmers' fault. The computer did no "bad math".
Stupid humanist journalists should not be writing technical articles.
The author seems to imply that computers can't do simple base 10 math without errors. That's not entirely true if you have a fixed precision. You use an integer and shift it so there is no decimal portion, in this case you would make your base a 1/10th of second instead of 1 second. Addition, subtraction and multiplication will be error free. You'll still have a problem with division and other operations but in this case that doesn't sound like their primary issue. It wasn't the computer's fault that the designers did not account for the fact that 2.0/2.0 != 1 on almost all FPU's today. It usually just equals a really good approximation of 1 that's "close enough".
There certainly are cases of bad math in computers, particularly Intel computers. But this isn't such an example. This is just a lazy and stupid programmer who didn't understand what he was really doing who should take the blame for the failure that killed people, not the computer.
I'm an American. I love this country and the freedoms that we used to have.
I remember this from a numerical methods class in the 1980s. To deal with situations like this, you can do one of three things :
a) Have a function that you sample as a function of t, so you don't get accumulated error.
b) Have enough bits so that error won't be an issue. This is actually hard to do because floating point errors do stack up pretty quick if you are not careful.
c) Or, you can have an error term which you can use to make adjustments along the way to account for a lack of precision. Bresenham's line does that more or less exactly when he does his lines. That's why you had "stair stepping" as the algorithm corrected itself along the way.
If the OP was correct, then PATRIOT failed because it did none of them. My bet is in reality, they simply underestimated the actual error term, but did everything else correct. This could be because of discrepancies in flight control instrumentation or some sensor, or, they were simply trying to save money on bits and didn't really do the calculation as to how far the missile could be off in an error term length seconds of flight at a particular phase in its flight profile.
Bottom line is, the engineering discipline exists to solve this problem and is really no different than error handling in any guidance system. Putting a man on the moon, launching an ICBM at target, shooting down a missile, are all essentially the same computer science problem from an error management perspective. The Phd's already nailed this decades ago. There's not a fundamental limitation to computing, in this case, merely, a failure or inability of engineers on this project to apply the correct known answer to this problem.
This is my sig.
That is just an example of a terrible programmer(s)...if you ever programmed in assembly before floating point processors, especially on 8 bit machines, you'd be very comfortable extending your number of bits using fixed point math. Its work, but not hard...terrible people died because of a lazy or uneducated programming team.
Why on earth didnt they have a clock source other than the standard one? There are numerous sources of correct time like GPS, radio, NTP, clock servers, atom clocks or add in cards. The worst possible clock source is a standard PC. This system was probably faulty by design since the simple clock hardware in a normal server isnt made for keeping exact time.
HTTP/1.1 400
I'm not a serious developer and certainly not one that works on mission critical systems but I have a question:
Are there any symbolic math libraries that allow a program to compute and store its interim values symbolically until the final result was needed? (Like, as an AC mentioned earlier, Mathematica?). Of course there would be an memory overhead (but surely the entire Mathematica kernel wouldn't be needed) and performance might be much MUCH slower than current "binary math" libraries but surely in a day of gigabyte RAM chips and gigaflop CPUs (and Terflop GPUs) the added precision would be worth it?
So does anything like this exist? Would it be hard to develop (that's a challenge for you out there!)
This is not an example of computers sucking at math.
This is an example of engineers and developers failing to draw up valid requirements, failing to develop to specification, and failing to test against real-world use cases.
Management undoubtedly shares an equal if not greater portion of the blame here. This is typical military-industrial complex, lowest-bidder contractor mentality at work, just another form of corporate welfare if the government doesn't turn around and punish shortfalls like this.
Sounds like the computer did the math just fine, but with a flawed clock.... That's classic GIGO!
...that sucks at maths, but some programmers?
OK, that was an easy shot, but really, don't you agree that today's academic courses in science at large are becoming so specialized so soon that good sense stemming from scientific culture cannot be expected any more?
And this is why it is a good idea to take a Numerical Analysis course or an Assembly course that lets you play with floating-point arithmetic as part of your CS electives. As much as I'd like to blame today's Java/.NET-oriented CS curricula (which seem to be fashionable now in many universities), it's been quite a while that many universities barely pay any attention (if any) to the details of floating point arithmetic.
The article contains some interesting examples but all of which have been in programming texts and courses for years. I'm not really sure why it's on /.
So if this is the future...where's my jet pack?
Let's take the double precision floating point representation as an example. It uses 64 bits to store each number and permits values from about -10308 to 10308 (minus and plus 1 followed by 308 zeros, respectively) to be stored.
The how story and most of the posts are one giant Flamebait.
Nice how all the Slashdot geniuses seem to think they could have done a better job had the *only* been there 20-30 years ago, before most of these would be heroes were even born.
Then there are morons who get on their high horse about corporate welfare bullshit. Sure, no one at Raytheon gave a shit about our soldiers, they just wanted to make a buck.
What a disgusting way to start a Saturday.
Look, you guys can talk trash all you want, but when you say this:
>>Patriot defense system failing to take down a Scud missile attack
You're just lying to yourself. The Patriots defense is awesome this year. I mean, was there really ANY point for the Titans offense to show up a couple of weeks back?
And the Scuds? C'mon man. They let go their best man two seasons ago. The QB can't hit the broadside of a barn and their entire wide-receiver corp has Jello hands anyway. The missile attack is a gadget play, pure and simple. Belichick sees right through that and you know it.
Haters need to stop all the hatin' and get on the Pats bus!!!!! GO PATRIOTS!
If Nalgene water bottles are outlawed, only outlaws will have Nalgene water bottles.
Screw this. During Gulf-war I, or whatever it's called, Patriot did not 'miss' any target that they fired at - period. The system was never designed to destroy missiles or get direct hits, however. The original missiles were to destroy planes by going off in close proximity to the target - which they do, very successfully. Missiles like Scuds, however, are not always destroyed by this. They tend to just break up, sending the intact warhead off track, slightly. From what I know, this happened in the case mentioned.
I happen to have worked with this system in the mid-nineties and this was a hot topic, back then. Why the total uptime of the system would mess up tracking is beyond me. The system will track what it either sees or is told to look for. This has nothing to do with rounding errors in time. Our system back then has been online for many days without impaired ability to track anything.
Quote: total error of 0.3433 seconds, during which time the Scud missile would cover 687m.............. This would mean the SCUD would be traveling at almost 71 million miles per hour! I don't think so............
The mpz module in the LGPL library GMP (not to be confused with a bitmap image editor) does arithmetic on large integers, and its mpq module represents rational numbers exactly as ratios of mpz integers. For example, 3.14 would become "157/50".
I don't disagree with what you wrote. One thing is, though, that requirements are very fluid and you have to ask if perhaps the problem is that 10 hours and reboot is a ridiculous requirement from the get go. Soldiers aren't going to sit in a middle of a war zone and turn off the shields.
Arguably, when specing out systems like this, the solution is probably not to build them because they are really too complex to test for battlefield conditions. But that's crazy. So.. what was the outcome? You put a system out there, make it is as good as you can, and the outcome, in this case, was that the system did intercept some missiles, did save some lives and did pioneer missile interception in a war.
28 people died because the system isn't perfect, to be true, but how many people lived because the system worked at all?
This is my sig.
Each battery has overlapping coverage with its nearest neighbors. A proper deployment has overallping fields of fire in both depth and breadth. Surface-to-Air missile defense involves multiple layers of different systems, each specializing in different ranges: Short Range - things like stingers, Medium Range - things like HAWK, Long-Range - things like Patriot. A proper tactical deployment never relies upon a single battery to provide the sole coverage. The problem here was primarily on of tactical deployment. The technical issues can be argued, but, the real failure was a failure to deploy in tactically correct fashion. They sent a battery or two as a "Show of Force", probably overriding the tactical expertise of the officers involved for political expediency. You have jack-asses like Rumsfeld and Cheney (and their ilk) making military tactical decisions when they are not qualified to do so. The REAL failure here is one of politics.
Over-the-top Response Guy! Giving "Over-the-Top Responses" since 1970.
It was written up as a tech failure (and not a people failure) because newsmen who call their sources stupid lose their sources. As others have pointed out, the answer to your question of why this is news is because of the system failure resulting is death.
The authors of the article? So it would seem.
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
Programmers' errors/naivete aside, if an error of 0.3433 seconds can mean the target aperture is 687m off, then a resolution of 0.1 seconds - even when working properly - could still be 200m off.
And I see other comments about using fixed-point. I wonder why couldn't they just use an integer and use deciseconds as their base time?
Slashdot: Why Programmers Suck at Math
Okay, that'd be misleading too--I suppose it'd be more accurate to write "How a few incompetent programmers who built a weapon got people killed because they suck at math". Not very headliney? Okay, how about "Military Moron Makes Murderous Machine, But Beginner's Bug Betrays Billions"? I rounded up from 28 to billions, so it should still be inaccurate enough for Slashdot. As a bonus, you still can't tell what the hell the article is about from the headline :-)
There's no way a real-time missile tracking system is going to be dealing with time at an accuracy of 0.1 sec.
A Patriot missile travels at about Mach 3 (~1000 m/sec) so a rounding error of 0.05, even without any error accumulation, means you'd be off by 50m in position.
Who knows what the real story is vs the garbage that was reported, but even if there was a cumulative error that's the fault of the programmer rather than a lack of a computers ability to do math. You do your error analysis and use whatever accuracy needed to keep the errors in a tolerable range.
The part about the system running for 100 hours was pure gibberish. Yes, we can all divide that by 0.1 sec, but what on earth does that have to do with a real-time tracking system tracking a target is acquired a few minutes ago?!
A better title for the story rather than "computers can't do math" would be "we can't do tech reporting".
This TechRadar article also explains why cars suck at math, too.
The timing belt was manufactured to be a few mm too short. But over the course of several thousand revolutions, those mm add up to a massive error, which causes the pistons to strike metal. Thus the car was a write-off.
It's no fairer to blame the computer than it is the car - some ABSOLUTE PILLOCK didn't design, implement or test their system properly. And *they* caused the 28 deaths, not the computer (and it can't be overstated just how elementary a mistake this is, especially in a military system, and should have been caught by basic code review and testing at every stage).
I hate stories like this because then you get deep mistrust of computerised systems where they *can* be incredibly useful, and without an adequate substitute. Every time a car won't start because the electronic ignition wasn't designed properly, every time a home computer crashes because someone didn't bother to isolate the apps from the OS well enough, every time something like this happens, people distrust "computers" more and more when what they should be distrusting is damn crappy programming.
A computer is as close as you can practically get to being perfect. Short of hardware failure (Intel FDIV bugs, bad RAM, corrupt drives etc.), computers do not make mistakes. If they crash, it's because they've been *told* to crash (the fact that you even *see* a blue screen or kernel panic means that the computer is still just blindly following orders).
There's no excuse for this - it's basic, elementary mathematics and binary manipulation. Some pillock threw a cheap CPU clock and a standard library at a time-critical, life-dependent military problem without even thinking. The programmers should be sacked, the testing teams should be sacked and ANYTHING they've ever created or reviewed should be overhauled to make sure they haven't made even worse mistakes.
The Patriot case was simply unsound from a numerical point of view because it used an approximation which accumulated errors to the point where they seriously compromised the end result, which is a whole thing altogether (and mathematically speaking much simpler and more fundamental).
Numerical analysis is basically about "How can we make sure that a computer algorithm on such-and-such hardware will always produce an answer to this-and-this mathematical problem with such-and-such error bounds.". This really isn't something like "coding well", but it can require complicated and careful mathematics to get right, which is something programmers usually haven't a clue about. Instead, and provided the effort is warranted by the application, one needs to have a competent Numerical Analyst (a fancy title for a Mathematician specialized in this particular field) check (if not actually design) the software. Coders can then do the rest, provided there is sufficient communication between the architect (the numerical analyst) and the builders (the coders) about all the quirks of the hardware and how they are accounted for and dealt with.
Every CS graduate is supposed to know that advanced numerical work with computers (like those in the Patriot system, where the 0.3 second error is a fine example of negligence) falls under the domain of Numerical Analysis and require specialist attention. This is why some jobs should be undertaken by software engineers, not coders.
As Paul Lockhart said: Math is about creativity!
There. I saved you a hell of a lot of time! ^^
Any sufficiently advanced intelligence is indistinguishable from stupidity.
Fixed, but a day late. for a 2 week turn around time from when this was fault-isolated to a fix was fielded in SW Asia is fast for government work. Sadly, 28 American soldiers died. The computer found a possible ABT. When it verified the track, it wasnt where its programming told it to be. Track was dropped.
The real problem is all the math teachers around the world that teach students that all decimals after tenths or hundreths of a digit are, and I quote, "insignificant". In this case, what you basically have is a bunch of programmers who grew up learning that anything after the tenths digit is "insignificant". It's as simple as that.
As long as the hardware has basic floating point support it's possible to design software that will get the right answer, and usually fast enough. It's all down to the software.
As a programmer, I can confirm that these programmers screwed up, but I would bet money on the fact that the management forced them to. There's no way programmers working on physics software would choose a processor limited to 24-bit registers unless that was the only choice they were given, so that decision must have been forced upon them by their bosses. I'm also certain that the decision that it was "good enough" to ship was not made by the programmers. Here's an interesting quote:
From: http://www.corpwatch.org/article.php?id=11110
"As usual with the Pentagon, cost is no object. But the Patriot is very expensive system and it's getting costlier all the time. Raytheon and Lockheed originally promised to deliver the new Patriot system for $3.7 billion dollars. Now the cost has soared to $7.8 billion. Each Patriot missile unit costs about $170 million. In the first Gulf War, an average of four missiles were launched against a single incoming Scud."
Even if that's grossly inaccurate, they saved a few bucks per multi-million dollar unit. That's like being penny wise but several million pounds foolish. While I agree it's not that hard to work around the 24-bit limitation, the decision to use such a limited processor was probably a major contributing factor to the schedule slips and cost overruns. Any time a project slips that badly, management will step in and force them to rush it out the door before it's ready. My bet is that the developers knew the problem was there, but they didn't have time to even look at it because they had bigger fish to fry when they were trying to get it out the door.
Oh, I suppose I should say something more proximate:
The failure to unify optimizing JIT compilers with memoized (encached, tabled, etc.) demand driven (lazy) computations so that we can express our maths independent of precision without performance penalties. This, of course, is directly related to the failure to maintain dependency graphs so that when under continuous demand (observation) demand driven computation unifies with data driven (data flow) computation -- and when no longer under demand (observation), memoizations (encachments, tabled entries, etc.) can be voided until the next demand requires recomputation.
I'll get around to working on it one of these days. Its just that, like many other things, I thought it was obvious enough 25 years ago that someone who had some serious money would have backed that kind of programming environment. I told Ray in 1985 Microsoft would do lots of damage, but he didn't believe me and even I didn't think it would be this bad...
Seastead this.
We had a similar problem with an Aegis design, and it was a major headache for us Hardware engineers to try to convince the Systems Engineers that counting in Binary time was more logical than counting in 0.1 second increments. The SEs kept insisting that their computers at home accurately count in seconds and we hardware engineers should be able too.
And the software engineers would have been right. The error was not about counting in 0.1 second increments versus 1 second increments or whatever, but it was in using floating point representation where fixed point (basically, scaled integer) would have been more appropriate.
And come to think of it, that is more or less what most desktop and server OSes do: they count number of milli, micro, or nanoseconds, and store that as an integer.
Similar issue arises in finance: you don't encode dollar amounts as floating point. Instead you store number of cents (or mils) as integer. Every programmer of financial software knows about this (... or should know about this...)
Floating point is really only appropriate to represent values which are not known precisely anyways (measurement results), where the little additional rounding error wouldn't matter. For all else, used fixed-point.
[GMP] does not, for example, let you do square roots or trigonometric calculations
I know that. I recommended GMP because the article is about improper handling of rationals, not square roots or trig (or even Trig Van Palin for that matter), and Maxima would have been overkill.
Disclosure: I am a programmer.
I had a conjecture that the Patriot missile was a Raytheon project. Not a particularly well-based speculation, but I did the RFA for confirmation. For some reason, they did not mention the manufacturer. They felt free to mention Intel and Google, but I guess the manufacturer was an advertiser.
As it happens, 40 years ago I was an end user of a Raytheon air-defense missile system called the Hawk. There was a common derogatory phrase about Raytheon. I guess we might call it a meme now.
It was cute and perhaps relevant to this article, but it has been so long, I can not reproduce it.
Anyway, whoever manufactured the Patriot, I sort of doubt that the first cause was a bad programmer.
A war story. This is not all Raytheon's fault, but it makes a nice slander.
At Kassel, there was a NSA antenna farm with a hawk battery next to it. It was noteworthy, but not unusual, for Migs to buzz the antenna farm. I guess it happened every few months. Go figure.
How does restarting the operating system kernel, reinitializing drivers for all the hardware and restarting every running program help?
I know that's not what you meant, and the operating system in use is probably not windows (I hope, at least). Still, is it that hard to just deal with the problem, instead of starting from nothing and doing a whole lot of unrelated stuff? Reboots should generally not be required.
Sure, don't actually blame the _scud_ for the deaths.
"Draco dormiens nunquam titillandus."
As opposed to a modern system susceptible to EMP.
Really what you want is to store numbers "rationally" as a numerator and a denominator.
You get all the advantages of fixed point, and you can also represent fractional numbers exactly so that (1/3)*3 == 1
If you use a proper language like Scheme for your calculations, it's just built in.
... blame the programmer who tried to stuff 0.1 into binary, and then used the resulting erroneous binary number as if it were correct.
.
In this case, the computer didn't suck at math; the programmer sucked at programming, and should have been kept far away from computers controlling armament.
Many years ago I was asked to look at a waterjet robot that was behaving abnormally. The robot's task was to cut plastic sheets into square tiles as they went through it.
The problem is that after 30 minutes of activity the square tiles weren't so square any more, and it kept getting worse. The software engineers from the manufacturer came and went a number of times, and failed to solve the problem.
It was obvious to me that it was a compounding rounding error, so I looked at the robot's program. It said (simplified):
1- start at the set 0.0 coords
2- turn on jets
3- go forward 30cm
4- stop
5- go left 30cm
6- turn off jets
7- go right 30cm
8- goto 2
Essentially it never went back to the 0.0 coords and kept adding the errors of going left and right 30cm. It took about 30 minutes to get to the code, find the problem and solve it.
Slow up. While the overall concept is reasonable, mil-spec computers are _tiny_ in resources. They have to be: getting them mil-spec approved is a lengthy process, and radiation hardening CPU's and microprocessors is very difficult. The bigger the chip in resources, or the smaller the traces, the more radiation vulnerable. And for an interception missile, the available payload to carry shielding for the electronics is miniscule if it exists at all. So competent military programmers learn to be very, very parsimonious indeed in their code.
They also tend to write in C or even assembly, for optimization to their very limited hardware. There have been attempts to use all sorts of other languages for such processors, but they keep coming back to C.
You have to look at the typical accuracy of time references. Military crystal oscillators are usually accurate down to 0.5 or 1 ppm, but that's about the best you can usually get. For instance, look at Q-Tech offerings, which is standard technology for avionics. So let's check... 100 hours * 3600 seconds * 1 ppm = 0.36 seconds. Even without rounding errors, the error would have been the same if the system was running from a 1 ppm crystal oscillator. It seems to me that the real problem is that the time references of the different radars were not synchronized more often.
While we're on the topic of computers performing calculations and misguided notions, how about we rid ourselves of these unnecessary anthropomorphisms which lead to the idea that a computer is even "doing math" with our number system, or that the radar is "looking" anywhere at all.
1. Patriot was an anti-aircraft system, originally, and that it was operating outside its design parameters.
2. No system should depend on a non-synchronized free-running clock, anyway. That's not an arithmetic problem, that's a design problem. If an absolute time base is needed, you have to actually create one, not assume that everyone's clocks are going to magically stay in sync.
This is a case in some text book I had to get for univesity a few years ago. According to it, this was in 1991.
The book is Computer Architecture and Organization [an Integrated Approach] by Miles Murdocca and Vincent Heuring (you can find the case on page 51).
The problem with quotes on the internet, is that nobody bothers to check their veracity. -- Abraham Lincoln
"Except that people tend to rely on computers, and take risks they would not have otherwise taken."
So in other words, computers are the anti-lock brakes of the electronic's world? See I knew I could get a car analogy in there.
Shai Schticks:"You don't make peace with friends, you make peace with enemies"
my computer sucks at math because it doesn't apply itself. period. end of discussion.
I know that I'm arguing with a trolling AC, but for the other readers of slashdot, you should know that the grandparent's post refers to the controversy regarding the analysis of the Patriot system during the first Gulf war. There was a huge propaganda machine behind the Patriot's "successes" which turned out to be very near zero indeed. This was covered in a series of hearings in the early 90's...
http://www.fas.org/spp/starwars/docops/pl920908.htm
You can also read up on this from transcripts from the hearings after the war.
In the interests of fairness, here is a rebuttal / review.
http://www.fas.org/spp/starwars/docops/zimmerman.htm
I remain unconvinced -- from reading this (almost 20 years ago) I concluded that at best, the military did not know for sure that these worked well.
Slashdotter, ID #101. UIDs are in binary, right?
Last I heard the US military wildly inflated the Patriot Missile success rate to 95% from "possibly 0%", and tried to cover up scores of civilian deaths directly caused by them. And Raytheon couldn't get even one hit under controlled conditions. Presumably these missiles work now if they're being bought and sold, but I still haven't seen any proof. Has any non-US affiliated party released test results?
Subject changed to reflect Parent's misinformation. If you look a little more carefully, the Patriot did stop Scuds later on- I assume after the bugfix mentioned went in. While not 100%, it did eventually earn its salt, with it's score a lot higher than "zero".
As to the Israeli system... I guess they haven't worked the bugs out yet.
They should of course have used Swiss build computers instead.
Welllllll, they snuck one past me as a piece of "test" equipment.
The project was placed on the shuttle, and when it came time to fly, it was lifted from the payload bay by the arm, and activated. It could not lock guidance. It would only wobble. Long story short, after several attempts, re-uploading flight software, and a second flight on the shuttle. The project was scrapped.
Several weeks after the second flight, I made sure that the articles about the Intel 586 floating point error were in everyones' in-box. About two years later, the company went under. Doubt it was because of the problem with the 586.... it was more because the company was a threat to the traditional way NASA does business.
Wherever you go, there you are.
Computers are quite well at math, and do so with amazing accuracy.
The problem is the people applying the math computations to the
computer system. The problem existed long before computers and
produced an area of mathematics called numerical analysis.
Slide rules produced greater errors, and hand calculations even more
so, yet this can be accounted for.
Non-math people should not be doing this on their own but should
be requesting the help of mathematicians when they don't understand
how to compensate for such errors or know when the errors have
reached a level that compromises the integrity of the calculations
and thus requires a new starting point.
Programmers should not pretend to be mathematicians or engineers
or anything but what they are.
This headline, while captivating, is inaccurate - computers excel at math, and can do complex calculations faster and better than any other device I know of. BUT, the issue here was the fundamental design which pitted the software design against the hardware limitations. The author of the post above (with the benefit of hindsight) was able to describe the problem in just a handful of words, begging the obvious solution - the sampling should have been done in increments that suited the 24 bit registers the values would reside in for calculations - they should never have left the system to "round" any values. Design flaw, plain and simple.
Ken
Most people don't know about neither how integer math is supposed to be done nor about floating-point maths. Read Goldberd's paper "What every computer scientist should know about floating-point numbers".
People that *think* they know about that subject are probably the worst offenders: do you know what it takes to writes correctly the following method (camel case, long method name, just to get to the point):
areEqualsOrAlmostEquals(float a, float b, float maxAbsOrRelError) {...}
you give a float which specify the maximum absolute or relative error and your method simply returns true or false.
Do you *really* know what it takes to write such a method? Hint: you probably don't and I can write test cases making your solution fail.
Goldberg's paper is pure gold ;)
I have it since years, it's 80 pages long and it's a bible on the subject.
Most people don't understand integer math nor floating-point math, it's a fact.
Anyone who thinks that measuring time is simply a matter of using accurate floating point values should have a closer look at the definition of time
And depending on the problem you are working on you may have to take into account relativistic effects caused by the gravitational potential of the earth and the sun.
Yes, when I studied Computer Science (admittedly about 30-40 years ago), it was a requirement to study numerical analysis which basically laid out the fundamentals of how and when floating point numbers failed in binary representation. So the idea that people in the 70's either didn't know or care about these issues isn't true.
If you haven't heard of the Euler or Runge-Kutta, you probably should before doing any sort of system design that involves floating point numbers.
I think everybody is too busy teaching programming these days to study computer science ;)
You were mistaken. Which is odd, since memory shouldn't be a problem for you
99.9% of /. readers think "hardware" when they see the word "computer".
99.99% of the general public think "whatever's in that mysterious box" when they see the word computer--and that includes the tubes that hook up to the interwebs.
It's actually decent science reporting for the 99.99% of people who don't distinguish between hardware and software, and if one weren't a /. reader, one's attention span would exceed that of a gnat (not excluding myself here, btw), and one might have read (all the way on page 3):
"Surprisingly, perhaps – and with the exception of the Pentium floating point error, which was caused by a hardware glitch – all of the errors we've mentioned here could have been prevented. In that sense, they can all be thought of as software errors."
It's actually a pretty good article overall, though since the (presumably UK-based) audience of something called "TechRadar" ought to have more in common with /. than with the general public, the title could have been less inflammatory to those in the know...
I don't know any computer that uses a 0.1 second tick period. Even crappy Linux 2.4 has a 100 Hz tick rate. I seriously doubt a system like the Patriot would have less than half a mile resolution.
Has anyone read the case study on this problem? The "solution" for this was simply to reboot the system every day. Now the argument for this solution does not really matter but the key thing to note is that the original proposal never stated that the system was to be on for weeks on end. This is not a code failure, it is a contractual failure on both sides. The customer should have had this stipulation in the contract, and the supplier should have found out such important information before the designing of the system.
How about GiNaC?
cpghost at Cordula's Web.
Actually the main purpose is a cost plus fixed profits contract for the weapons manufacturer. Even if no one ever dies on either side of the gun, it's still a success to them.
The painfully obvious solution is to keep time by "ticks" rather than some decimal representation of seconds.
As anyone who has been through school can tell you, floating point numbers come with their own built in error.
The obvious solution is to use integers or use them as "fixed point" decimal.
A Pirate and a Puritan look the same on a balance sheet.
Look. When the system is named "Patriot", you already have enough information to understand the framing context - if you care to have the particular insight. This is a propaganda tool, like the rockets launched from Airstrip One.
It fulfilled its mission when it was designed, manufactured and labeled as "The Patriot Missile System". Ballistic interception is a secondary mission and fulfillment is unnecessary for success.
"Speaking the Truth in times of universal deceit is a revolutionary act." -- George Orwell
The corollary to that is that the people making equipment to protect other people must be smarter. Guess which side of the line the Patriot system falls on?
Reality is the ultimate Rorschach.
First, that's why most timing belts are really timing chains.
Second (to the earlier posters), testing is way to late to catch this type of problem.
This is the sort of thing that should be caught at design review where the "tenths of a second" are actually computed as INTEGER hundred-counts of a millisecond timer that is the fundamental time-base for the system and is "adjtime"d to prevent it from accumulating error relative to a very reliable time reference (like GPS).
Of course, this is what happens when managers think all programmers are interchangeable and don't value engineers who have a clear view of what they are doing and why.
So because the design of the system results in it failing and that leads to computers sucking at math, I don't think that works. Really because someone thought of a broken method for design the computer sucks at doing the programs math. So I really don't know why you'd blame the entire field of computer mathematics.
FTFA:
"So computers might suck at maths, but there's always a solution available to circumvent their inherent weaknesses. And in that case, it's probably more accurate to say that computer programmers suck at maths - or at least some of them do."
Thank you, come again.
So in a system that should have clocks synchronized to less than a microsecond nobody bothered to run "ntpdate" even once in hundred days ?
Yes, obviously they just needed to ssh into their patriot missile air defense system, edit a few lines in /etc/inet/ntp.conf and svcadm restart ntp.
The obvious problem in the article, if you read it, is computer's finite precision, and how it is dealt with. By 'computer', the author could have easily included the system libraries that are actually doing all the rounding and overflows instead of implementing arbitrary precision in software.
Everyone defending the way 'computers' is used in this article, and conflating it with 'processor' is a complete idiot.
The local Saudi station had just finished a piece and was a few seconds into Rhapsody in Blue when the interruption came. They stopped the music and told all listeners to take shelter (alternating in Arabic then English). The sirens on our base did not go off - at all. I was writing a letter to my sister and commented that one of these times somebody is going to get hurt for not responding to these alarms (it was late in the war at this point and people were starting to get complacent in the alarms as we were all believers in the "infallible" Patriot).
It may have been that they *knew* it was not destined for us - but we usually got alarms for anything in the area. Jubail (where I was) was right in-line for the path to Dhahran. We *should* have got the alarm. Those people would (should) have lived had they got the alarm.
I did not find out until the next morning that the Army barracks had been hit.
Anonymous - because I forgot the login to my account about 8 years ago and haven't created another.
Oh, but it is so satisfying when you find out that the system killed the people it was intended to kill, or the building, or the other missile, or whatever. There is such a thing as professional pride.
That which is done from love exists beyond good and evil
Who stores or processes time in tenths of a second? Milleseconds, microseconds, but tenths of a second? I think somebody got their source information wrong or the designers of the system were really on something special.
Selah.ca. Pause, and calmly think on that.
My intent by saying "people who program computers" was not to single out computer programmers, but designers, managers, and everyone else involved too. In military, this involves politicians, government contractors, and generals too, who all make huge piles of money even if the project is less than successful, like the Patriot missile and F-22.
A programmer was probably told to use that register size, even though he knew it would be flawed.
"All great wisdom is contained in .signature files"
We even have a modern analog for this - the shift-lock key.
Now I have another reason to use bc instead of Excel.
Bender:~/docs$ bc
bc 1.06.94
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
850*77.1
65535.0
1.0 - 0.9 - 0.1
0
687 m/.3433s = 2001 m/s approx. That's 6566 ft/s, or 1.24 mps, which is 4476 mph.
That would mean it's traveling at over Mach 5 (at sea level), so perhaps this data is incorrect after all.
Actually, the article describes why programers suck at math.
Instead of using seconds as a time unite and introducing what amounts to rounding errors, systems like these should run straight off the clock ticks. There's no reason for a computer to be using seconds internally in the first place - just convert to them when human-readable output is required.
You're spot on about irrational numbers.
But I'm not sure the time critical argument works so well, these days. Let's say we designed a numeric package which was capable of ratio storage and irrational storage as well. Perhaps even able to handle unknowns. Let's say it was 1000 times slower than modern floating point calculations. We'd still be able to do something like a million calculations a second, and we'd know this as part of the parameters we'd use to decide where to use it and how.
We'd only use it where necessary, or where speed wasn't a critical issue (better to err on the side of accuracy). And if we made such a library pretty standard and everyone used it, processors would probably begin to include specific instructions to improve the performance of "correct" math.
The thing is that we don't test enough and basically don't care enough whether the math is right. We generally think it's all good enough. And that really ought to change.
We made the compromises we now live with, after all, when computers were thousands of times slower than they are today.
Fortunately, when dealing with a tactical, self-contained, antimissile system, none of those needs to be important. Any external time measure is irrelevant. What's relevant is that all parts of the system have the same time base. No need to worry about leap seconds, time zones, years, or local noon.
Back in the early nineties a college friend came to me with a problem she was having at work. She was working on a digital compass to be installed in the F-16. There was a raster screen and a circle with tick marks was part of the compass display. They were having difficulties drawing the circle rapidly enough and her management was considering installing a floating point co-processor to make things run faster. After a bit of discussion, it turned out that the circle drawing algorithm they were using was based on sin/cos. She was fairly new on the project and while her graphics background was not very deep, she did know that there should be better ways to do things.
We had a bit of a discussion on better circle drawing algorithms as well as the joys of pre-computation, look up tables and not redrawing things that were not really changing. I still shudder to think of what other cruft must have been lurking in that software.
I don't recall too much "transparency" when it came to "accidentally" loading live nukula missiles for transit flights in the US a short time back.
Wasn't, and still really isn't, a lot of "transparency" with that 9/11 thing you all seem to have in your heads.
I don't see a lot "transparency" when it comes to timing and details of arm shipments to America's "friends" around the globe. Not much "transparency" in Guantanamo - or hundreds of US spy bases around the globe.
I don't like the sound of your trumpet, sonny, sounds a little off-key to me. Maybe you only know that one note.
So how do I moderate TFA as a troll? If it isn't already an option, it needs to be added. As a plus, this moderation could be used in the performance reviews of slashdot's editors.
'The tyrant will always find pretext for his tyranny.' - Aesop's Fables
A true killer app, but for the customer.
Why are you publishing this stupid story? Every computer scientist in the world is already aware of this. Besides: both IBM and NeXT found ways around this years and years ago. Reading the other comments makes it obvious the submitter and the /. mod both are imbeciles.
Seriously.
- Zav - Imagine a Beowulf cluster of insensitive clods...
Why is the clock tick 0.1 seconds?
On the 3 MHz Sinclair Spectrum it was 0.0000003 seconds.
And why the worry about it anyway?
Find the hardware tick (it is NEVER a nice easy number, just like the 0.1 seconds they complain of is not even actually 0.1 seconds but could be 0.1000320548732 seconds) and calibrate your software to use the elapsed time since plot start rather than "the next step".
Was the patriot written by a bunch of VBA cowboys?
I'd just like to point out here that the 28 people were not killed by the failure of the intercept system. They were killed by the nice folks who launched the missile in the first place.
If at first you don't succeed, destroy all evidence that you tried.
A better title for the article would be "Why people suck at computational error analysis.". Except for the one about the Pentium FDIV bug, every example it gives is one where humans ignored some rule or another governing error propagation in the computations. Back in college, I ran across a book titled "Scientific Analysis on the Pocket Calculator" that had a full third of it's content devoted entirely to error analysis, both of the errors the initial data would introduce (limited precision in the initial data) and ones the calculator itself would cause (limited range, limited number of significant digits, overflow/underflow in intermediate calculations). Computers add the additional fun of dealing with numbers that're rational in base 10 but irrational in base 2. And in school they gloss over all of this, don't bother teaching it. They all teach as if computers have infinite range and an infinite number of significant digits in all calculations. Is it any wonder the results are botched so often?
The part about the Partriot missile isn't at all clear on explaining how the computational error resulted in the mistake. A quick search led to this article, which is far more plausable: http://www.mc.edu/campus/users/travis/syllabi/381/patriot.htm
It's programmers who suck at math, not computers. Computers do exactly what you tell them to, which includes how they're designed to interpret what you tell them to do. This was a tragic example of what can happen when reusing legacy software.
If the story in the example is true, then t's painfully obvious that the coder(s) in the example didn't understand some core concepts in math and computer science
-- "At Microsoft, quality is job 1.1" -- PC Magazine, Nov. 1994
You are right. In a self-contained system you can choose your own time base of course and absolute time is not important (only relative time). However I think it does no harm to be aware of those things.
can do those maths just fine.
"Infecting minds with my own memetic virus, one post at a time." Ultimape
Yes, too bad these programmers in the 80s didn't just use Python!
is when it's much better to use fixed-point arithmetic,
If you're working with 0,1s ticks, make your clock an integer counting these ticks and use them universally throughout your software.
Whenever you face the operation of division in your program, think twice whether it wouldn't be better to replace the basic unit by the one pre-divided and use integer multiplication elsewhere instead. No mess associated with floating point operations.
45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
But that does not matter one bit. The Patriot missiles won the first gulf war. Yes I meant that, the scud was a political weapon, the attacks were intended to trick Israel into getting into the war and creating a political meltdown. These scud attacks weren't going to do a serious amount of damage in real terms, to anything. The Patriots seemed to intercept them at least some of the time and they removed much of the terror of the these terror attacks. As a political weapon it was awesome.
Now back to the point. The Patriot wasn't originally intended to intercept targets going that fast, it was meant to shoot down airplanes, so it wasn't designed initially to cope with targets going that fast. The whole problem as described isn't that computers suck at math. It's that reality isn't digital and when you represent it as such you need to be very careful. But I still have a problem blaming these guys for this bug, that missile system was built to shoot down aircraft, not missiles. Aircraft that might be travelling at subsonic speeds
Put a milspec GPS in every major component to get the atomic-clock-accurate time before critical functions.
There is nothing wrong with yr Internet. Do not attempt to adjust the picture. We are controlling the transmission - NSA
The Bush I administration tricked Saddam into thinking the U.S. would not respond to an Iraqi invasion of Kuwait. Accordingly, 100,000 people paid with their lives.
...store the time as two integers: one for seconds, and the other for milliseconds, microseconds, or whatever you wish. You need to handle the carry, but that's the price you pay when you want that kind of resolution with a 24-bit register Lots of time-critical software running on satellites uses an approach similar to this.
To quote Kernighan and Plauger in The Elements of Programming Style: "10.0 times 0.1 is hardly ever 1.0." This Patriot episode exemplifies their words tragically.
If it weren't for deadlines, nothing would be late.
only if the algorithm is numerically instable...
The MAFIAA is a bunch of mindless jerks who will be the first up against the wall when the revolution comes
Error analysis is important part of your basic graduate level numerical analysis course. This occurs in the floating point approximations used in most computers. Also large matrix calculations which can multiply and sum numbers a huge amount of times.
Why do we keep getting this error on this story when trying to reply toe the abstract. its been there for hours.
Error 503 Service Unavailable
Service Unavailable
Guru Meditation:
XID: 1125948833
Varnish
The crystal you find in an ordinary wrist watch gives you ticks in increments of 1/32768 second. 32768 is a power of 2 (2^15) and encodes extremely well into a binary computer.
If a cheap watch can encode time in a way that is convenient for a computer, why can't a billion dollar missile system?
“Common sense is not so common.” — Voltaire
Crap like this was alive and well when I was in uni and its still alive and well.
Witness: Limits to Growth written by Meadows et al: http://en.wikipedia.org/wiki/The_Limits_to_Growth
Consider that book was written in 1972. I was programming computers in 1972. I actually did a course in numerical analysis in 1972 and just re-read the first 10 pages or so. I happen to have read a masters thesis that came out of the Colorado School of Mines where the author stated Meadows' Runge Kutta Numerical Integrations did not converge.
Yet that book is still often quoted. Its been flawed from the get go. So consider something else! How fast were the machines that Meadows used? How big? What would be the MOST SOPHISTICATED model he could use at the time. How could _anyone_ take seriously predictions made by a primitive model run on such a machine?
Witness: The current discussion about Global Warming and Climate Change. The change in CO2 over the last 100 years is about 100 ppm if you can believe the data. This is 100/1,000,000 = 0.0001. Now the thing is this. A 32 bit float holds about 6.9 digits of precision. Lets call it 7 digits. If one were to add a whole number of some kind to the fractional change of the CO2 as measured relative to the total gases in the atmosphere then one has 7-4 = 3 digits or less to work with.
Of course one can use a double precision float. That isn't my point. One has to be an EXPERT in order to avoid huge problems with propagating rounding errors.
Its not just about pretending computers use base 10 when they don't, its about knowing the actual properties of a number of type float and what the consequences are when we use it.
In the case of that rocket I suspect the rounding error can be solved by normalizing everything so the time line is not in seconds but is actually in clock ticks... as accurately as they can be determined of course.
But in my career I have seen so few programmers who can do this that I've never even needed to look at a finger or a toe for something to count on. Nada - never met one.
I'll give another example. More than one project team that I worked with had no idea how floats even work! To sit there and try to use floats for their Accounts Payable and Accounts Receivable and then say they can't understand why nothing will balance? Arrghh! IMHO its downright incompetence. They needed to use comp which COBOL supported which is base 10 or normalize all their money into pennies and handle the decimal when the data was read in and printed.
No one expected a Patriot in air defense mode to stay stationary for 10 hours let alone 100.
Use case blindness is an incredibly rich source of severe system errors. Proof of correctness is hard enough without a clutter of use case clauses (apparently) lopping off obvious failure modes. Until they don't, because, oh ya, use cases evolve long after the coding is done. [Cue John Mellencamp].
And the thing is, it wasn't hard to just do it right in the first place. If they were working in a language with C-level abstraction (which isn't much), it's trivial to create a type such as uptime_t which counts in integer ticks (of whatever granularity you require) and has an uptime range which is very nearly impossible to overflow (months or years at a dead minimum). A 64-bit integer incremented at 4GHz won't overflow for 4 billion seconds, more than a century. Few timekeeping systems increment at 4GHz, so a century is your worst case. Hey brother, can you spare me eight bytes?
For Want of a Nail a lot of COBOL programmers came out of retirement.
Even a lowly pair of 24 bit integers (if that was their machine architecture) can be used to create a 48 bit integer with increment and difference at almost zero overhead. You can augment this with a saturating 24 bit uptime_diff_t. If the answer comes back as 2^24-1 deci-seconds (about two weeks) your code should interpret this as "blink and it's gone". These types can be implemented in asm with a two or three 10 line macros, at a cost only a handful of extra cycles at run time.
Floating point conversion of the diff_t result would have been fine (elapsed time of flight for a Scud missile isn't going to overflow anything). Nothing required here but clear thinking and a refusal to accept "that can't happen" use cases lightly.
BTW, some people are confused about the precision required: they aren't trying to hit the missile with this calculation, but position an acquisition window for a higher-precision targeting system, if I got the drift.
There's a time and place for use cases, and there is a time and place for a more rigorous foundation.
There are no atheists in foxholes. Corollary: the only use case is whatever saves their skin. I didn't notice any of the soldiers under the bridge in Apocalypse Now sitting around reading their user manuals by the rocket's red glare.
So here is an example from Elementary Numerical analysis, S.D, Conte and Carl de BOOR circa 1965, 1972 ISBN (library of congress card number?) 73-174612:
Calculate the roots of the following equation:
x^2 + 111.11x + 1.2121 = 0
use base-10 5 digit floats for this.
one can use x = (-b (+/-) SQRT( b^2 - 4ac)) / 2a in order to do this.
One will get:
b^2 = 12,345
b^2-4ac = 12,340
SQRT(b^2-4ac) = 111.09
x = (-b + SQRT(b^2-4ac))/2a = -0.010000
The correct answer is -0.010910
Note that we have gone from 5 digits to 2 digits of accuracy. This is on page 12.
One can use this formulation: x = -2c / (b + SQRT(b2-4ac)) which will give the answer to 5 digits precision.
Here is another example:
f(x) = 1-cos(x) for very small x. Lets use 6 digit arithmetic and compute near x=1.0e-6 The error can be as large as 0.5e-7
yet f(x) = 1-cos(x) = (1-cos(x)^2)/(1+cos(x)^2) = sin(x)^2/(1+cos(x)^2) which can be evaluated quite accurately.
Again = GIGO!
A designer used a horribly inappropriate data representation, which led to fatal bugs in the program, and this is proof that computers are bad at math. Uh-huh.
"Why Computer Programmers Suck at Math
There's a whole discipline called "Numerical Analysis". Whoever programmed the Patriot's tracking software should look into it.
Just throwing that out there...
It's about programmers lacking basic knowledge (or at least failing to take account) of how a computer works internally. well written and well validated software doesn't have problems like this.
In a minute there is time For decisions and revisions which a minute will reverse. -T.S. Eliot
Sorry, I didn't read your entire post before responding. You're assuming they have some need to convert to floating point, which sounds completely retarded to me (not to mention a missile intercept system with a 0.1 second resolution). At this point, the utter incompetence of 90% of the US and it's military, educational system and most of it's industry is absolutely no surprise to me. Of course that doesn't make it any less frustrating.
"I assumed blithely that there were no elves out there in the darkness"
i even tried 599,999,999,999,999 - 599,999,999,999,997.9 and it still equals zero
LISP, Scheme, Haskell, Mathematica, Maple, and plenty of other languages support arbitrary precision rational numbers as built in types. This fixes all rounding errors involving rational numbers (including fractions). If irrational numbers like pi, e, or transcendental functions are necessary, then there will always be inherent error in the representation and the programmer has to know how to do with that error and calculate the expected error of a sequence of operations. If you want to get fancy, you can use an algebraic language like Mathematica to symbolically solve your equations and maintain perfect accuracy with symbolic representations of irrational and transcendental numbers.
While I agree that the design decisions which lead to this were poorly made, this error was common knowledge.
The Patriot system _must_ be restarted every X days, exactly due to this bug. This is documented and everything.
While the initial error was with the people who created the Patriot system, the soldiers who were assigned to the system were the ones who made sure that a documented bug with a known-good work-around became a loss of life.
Two comments:
1. Any article on this subject that does not discuss the acceptable accuracy required for any particular calculation is hardly worth reading. It seems to assume that all calculations have an absolute exact answer. Should we even be discussing an article that talks about a computer not being able to "do" stuff properly, without differentiating between specification errors, coding errors, hardware errors, usage outside the spec, etc etc.?
2. The discussions of the Patriot and Ariane cases in the articles are travesties of the actual events. Note that in both cases trying to use old software for new purposes were major factors.
Is what the article should have been called.
Anyone worth their salt in numerical analysis and scientific computation would not make this error.
Computers don't make mistakes. Programmers do.
Don't blame QA.
The system worked fine in my PowerPoint design!!!
Computers are just dandy at math, thank you. Some programmers aren't so hot, but they can be trained. Slashdot, on the other hand, continues to generate gratuitous inflammatory headlines. Training does not appear to be effective. As others have pointed out, abusing a computer does not make the computer "sucky", anymore than abusing English makes it suck at expressing thoughts concisely. Slashdot consistently abuses its audience with misleading and downright false headlines, such as this one.
Turns out computers do exactly what they're programmed to do. Who knew?
Everyone (or at least most software people) know you can't do exact math in hardware. Usually it's good enough, but mission critical and financial applications have to have their calculations implements in software (eg. BigDecimal in Java).
Something like the two pilots that recently overshot their destination and didn't notice for several hours despite numerous warnings, phone calls, and other notifications.
There is only so much one can do to compensate for PEBKC, and in the case of modern aviation you expect that the person between the keyboard (or console) and the chair is a trained professional who doesn't make stupid mistakes like that and doesn't need a big red flashing light for every different stupid thing he/she might do...
Question : do you people honestly believe this ?
If normal people actually accepted ideas like this Europe would be Nazi today, for one thing. I also seriously doubt America would still be a democracy.
For starters, IEEE floating point is a lousy design, from its needlessly complex special cases to its atrocious error handling. That kind of poor and overly complex design is symptomatic for a lot of floating point software.
Ok, this just pisses me off. Computers do not suck at math because they don't do math. They do add, subtract, multiply, and divide within their limitations. Hell, it wasn't until the 8088 that I used a micro that could multiply or divide. It has *always* been the job of the computer scientist to understand this.
It is *new* computer science that teaches languages like java or C# that abstract "computers" from the programs. I guess using theoretical computers is easier for moron professors to teach. I learned computer science from one of the old school teachers (ex-navy weather research) who would bitch about the usage of bits and bytes in a program.
I'm not sorry to say that "Computer Science" has to be more about the science of using COMPUTERS, not about some abstract ideal computer like edifice "virtual machine." I learned computer science as a way to model REAL problems on REAL computers, understanding limits and even using them advantageously. Algorithms are often incomplete or fundamentally wrong when they ignore the simple fact that they are running on a real computer with real limitations.
For all you java and dot net zealots who say, and I am quoting many of you, "Why do I need to know how that works" when it comes to lists, trees, hash tables, mutex, semaphores, MATH, and so on. This is but one example. If you KNEW how these things worked, you wouldn't be bitten in the arse when they didn't do as you imagined they would.
One of the first things you should learn in computer science in school or self education is that floating point is an approximation with limited precision. Any math done with it, must be done in the correct order that preserves as much precision as possible, and even then, if your precision requirements exceed the decimal accuracy of 64 bit floating point, then you can't do your math with floating point. You will have to code your own or buy/use a 3rd party precision math package.
Floting point is fine for a lot of things, but not everything, but if your computer science teacher didn't beat you head in with the limitations, you missed out. "How" "real" computers work is fascinating, and just knowing how they work affect how you code. Here's the big question.....
Now that you know floating point is not accurate, how many past projects would you double check to make sure they really do work?
Assuming that most of the /. crowd are programmers, I find it rather ironic that most of comments moderated up to 5 talk about needing better QA and the comments about learning numerical analysis are down at 1 and 2. Blaming someone else may feel good, but learning intricacies of the profession are what it takes to actually fix the problem.
That said, as someone who's actually studied NA, I don't apply it often enough, because the tools we use day to day don't help very much.
Why don't "they" just add in the rounding off amount times "time elapsed"? Is that so hard?
Caps Lock is CRUISE CONTROL FOR COOL
And I'd love to have all websites yelling at me when I'm about to enter a capslocked-password a final third time.
I'd love if you just checked your damn caps lock key when you got it wrong the first time. Even with cruise control, you still have to steer.
-1 disagree is not a modifier for a reason. -1 troll, flaimbait, redundant, overrated are NOT acceptable substitutes.
of numbers with which to approximate an infinite number system. That's the root of the reason why the mathematical field of numerical analysis exists. (goes to re-read QR factorization for shits n giggles)
Not all netbooks have a caps lock LED, only an app that brings up a small tooltip-like message when you switch capslock on or off that disappears after 4 seconds, if you even saw it.
The capslock-notification is a Windows-only app, of course. Yes, there's probably an apt-get for everything and this one, too, I should've bothered, but I despise capslock anyway. With a passion. I'll probably remap it to right shift, if I ever find some free time.
so, so true. +1
In Soviet Russia jokes are formulaic and decidedly non-humorous.
Just wanted to note that I love the site linked in your sig, and I'm glad you're promoting it.
what he was really doing who should take the blame for the failure that killed people, not the computer.
Or, y'know, call me crazy, but maybe we should blame the Iraqi government for launching the missiles with intent to kill civilians? That Patriot did any good at all (if it did) is just added good fortune.
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
You are on to the problem - lack of resolution - but a little off in the numbers.
At 687 meters in 0.3433 seconds it is traveling 2000 m/s. So with only 0.1 second accuracy you can only get within +/- 200 meters. Even if there were no floating point conversion error, or accumulation of errors, 0.1 second is simply not enough resolution of time to intercept the missile.
How could they ever think this would work?
J
The Nazis conquered quite a bit of Europe because of professional pride. They were defeated by Russian blood, and US industry. Sometimes it's not enough to be good, you have to be quantitativly better than anyone else. Despite being evil bastards, they produced some damned fine guns, tanks, and planes. It is a mistake to think that just because someone has beliefs you think of as "evil" that they are not competent. Unfortunatly history is full of quite competent evil. Now, if my attitude constituted "evil" in your mind, to be honest, I don't care. Because someone who thinks like you is to much of a coward to do anything about it.
That which is done from love exists beyond good and evil