Floating Point Programming, Today?
An anonymous reader asks: "I'm rather new with programming and stumbled across these twe articles: The Perils of Floating Point from 1996 and What Every Computer Scientist Should Know About Floating-Point Arithmetic from 1991. I tried some of the examples in these articles with Intel's Fortran Compiler and g77 and noted that some of those issue reported no longer seem valid whereas quite a few still very much are around. Could someone, please, give me a pointer to some newer thoughts and/or new facts surrounding floating point programming. What has been improved since those articles were written? What is still the same? How is the future, especially with the new platforms IA64 and AMD64? I am most interested in the x86 and x86-64 architectures. Thank you for your kind help."
Both articles are still valid today, mostly because current processors use the same IEEE floating point format than the ones available in 96 (or 91).
This is the place where you write something that will make you seem like a complete idiot.
...those articles are only 99.99999891 percent true
It's 10 PM. Do you know if you're un-American?
Floating point stuff hasn't really changed much since then. Basic rule of thumb, if you want it to be accurate don't use floating point.
Much the same problem as you have with decimals. Many fractions cannot be evaluated evenly in certain bases. It will always cause you headaches if you don't realize this.
Try writing a bunch of numbers in hex but then do all of your calculations in decimal. you'll have the same problem.
It all depends on what platform you program on and so on. Newer x86 processors do their floating point in an 80-bit format and only truncate when copying back to your original 32 or 64 bit floats. That saves you some precision but not that much. As others have said, there are probably situations where almost all of the material in those articles is valid.
Tomorrow will be cancelled due to lack of interest
Don't count money as floating point. You'll just have rounding errors. Using long doubles instead of floats won't help you at all.
The solution is to count pennies instead, or if you need values bigger than 22 million dollars, use a BCD library. BCD is Binary Coded Decimal.
If tits were wings it'd be flying around.
Using accurate arithmetics to improve numerical reproducibility and stability in parallel applications, also available here if you're one of the unwashed masses. It has algorithms to see if your app is facing floating point trouble.
Hardware floating point is only so accurate - if you need more floating point (or integer) precision, use GNU MP - a library for C with bindings for many other languages too. It came in quite handy when I wrote some cryptography code with very large numbers.
extend mantissa so there is enough overlap - usually involves some kind of multiple precision libraries like mentioned in other post GNU MP and many others. I've implemented one for my own use, too. Generally means lots of overhead since there will be less than 5% of operations actually benefitting from greater precision.
postpone such operations until there is overlap - store such numbers together and do operations on them together, too. Sometimes additions in loops will add up small parts so actually there will be overlap with big part and additions can be done with enough precision.
On a side, interesting thing is that in computers multiplications and divisions are better (that is more accurate) than additions and subtractions because of logarithmic format.
I know that Sun was working on a variable precision floating-point CPU. I'm not sure how that project is going and what the end effect is, but I remember it being an interesting idea.
Multiple precision libraries are usually decent with only one problem, they are always slower by a couple orders of magnitude than regular CPU operations, so using them is just such a pain.
iThink iHate iMod
Floating Point Arithmetic: Issues and Limitations.
On the other hand it is clear that a finite representation of real numbers has tradeoffs. But only few seem to care about the cumulated errors.
My experience in engineering (simulation of casted turbine blades) was that people know that bad things can occur during complex floating point calculations but the matter was too complicated to be investigated.
Example: if during finite element simulation a timestep did not end up with a valid solution (the iterative/approximative solver of the large linear systems did not converge or even crash) just some control parameters were varied (time step, perhaps material curves) until the calculation seemed to produce some valid looking result. Needless to say, that that only obvious errors can be spotted that way.
The strange thing about all that is, that in the last years the mathematical discipline of interval analyis has been developed. Here every number is represented with its interval of known error bounds. These error intervall are kept and updated during calculations. Thus at the end of a large complex calculation, you know the error. That is a very valuable property.
More, in fact what one does so in many cases is not only a standard calculation but rather machine proof of error bounds.
This offers some unique properties, e.g. for rigorous global searches.
So we have far better technology available. Why is this stuff not used more widely?
As far as I know, only SUN puts interval analysis enabled data types in its FORTRAN and C/C++ compilers. But I have not seen that stuff in gcc, which would have a big impact.
Very strange.
To whom is interested, here is a homepage of the intervals community.
Regards,
Marc
PDA level mobile FPUs are very rare indeed. In practice, devices using the ARM family processors have no hardware float support. It's thus very important for developers to understand floating point intimately, so that they won't be left at the mercy of awful compiler-emulated floating point code. Of course, in those cases most code tends to orient itself for fixed point arithmetic. Fixed point calculations are much better suited for the integer crunching power of, say, the Intel XScale.
There are also good tradeoffs developers can make between floats and fixed point, for example by using block floating point (BFP) formats, where a whole block of values shares the same common exponent.
Now that 3D is really coming to mobile devices, plenty of people will get first-hand experience of emulating floating point for the first time since the 80's. :-)
Jouni
Jouni Mannonen | Game Designer, Consultant
Any college level engineering numerical methods course will teach you all the pitfalls involved with using floating point calculations on modern processors and how to minimize the impact of rounding errors (cumulative and otherwise) on your calculations.
Hell, any decent numerical methods book should cover stuff like that as well.
The inexactness portion of his argument is quite wrong. His example claims single precision floating point only allows for 8K values between 1023.0 and 1024.0. Consider that under IEEE-754 the numbers would be represented respectively as 1.99904875 * 2^9 and 1.00000000 * 2^10, _with a full 24 bits of precision in the mantissa_, thus ensuring the number of possible values between 1023.0 and 1024.0 actually reaches 2^24.
Francois.
Would you mind tell us what those "issues" where. Because the articles hardly deal with "issues" at all. What they deal with is the theoretic limitations that must exist in floating point, due to the fact that we have finite hardware, while real analysis assumes infinite precision. This should not have changed between 1991 and now (especially, since we have all standardized on IEEE floating point formats, but even if the article was from 1960, you should easily be able to "translate" it to your favourite floating point format (which is probably IEEE)).
Could someone, please, give me a pointer to some newer thoughts and/or new facts surrounding floating point programming.
There are very few new thoughts with regards to floating point programming, just as there are very few new thoughts on the use of "if-then-else"-branches or "while"-loops. Floating point programming is basically a solved problem. The only problem with it is that it sometimes flies in the face of intuition, and most programmers are ignorant about it. This has not changed since 1991 either.
The articles you mentioned are very good articles for understanding issues surrounding floating point. Just make sure you read them with your brain, instead of just feeding your favourite compiler with any examples you see.
What has been improved since those articles were written?
Speed. Computers have become faster. (It's possible that there also have been some minor software improvements such as an ISO C addendum clarifying tricky areas with rounding modes, or something like that.)
What is still the same?
Essentially, nothing have changed.
How is the future, especially with the new platforms IA64 and AMD64?
Very predictable. Nothing will change there either. Non-IEEE floating point vector instructions, or "multimedia" instruction sets will probably continue to be unstandardized and platform-dependent.
I am most interested in the x86 and x86-64 architectures
There is nothing special about those architectures with respect to floating point (well, the x86 reuses its floating point registers for MMX instructions, but you shouldn't need to know that unless you use assembler).
It's past 1am and some **** is throwing inexact representations and fuzzy logic at me.... This must be a nightmare... Must... wake... up... Aaaaaargh!
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
I've worked on a couple of projects where this is very important. One was writing control software for metrology equipment, industrial strength QA kit that measured manufactured parts down to fractions of a micron or even nanometres to make sure they were in spec. Another was a geometric modelling tool used in CAD applications and the like.
In neither case am I aware of any physical real world failure caused by a problem with the floating point calculations. You do have to be really careful with manipulating the numbers, though.
For example, the loss of significance when you subtract can be horrible if you've got two position vectors close together, and you're trying to calculate a translation vector from one to the other. The error in that translation vector can be enormous if the points you started with were very close: you might get only one or two significant figures, when the rest of your values have 15 or more. If you're interested in the direction of the vector, that can give you errors of +/- several degrees!
Inevitably, there are always going to be bugs in complex mathematical software, and I've seen plenty of wrong answers from programs like the above. Fortunately, it's normally possible to have checks and balances that at least identify and highlight inconsistencies so, in the worst case, at least nobody relies on them. You can also use ruthless automated testing procedures, which run zillions of calculations every night and flag the smallest changes in the results, so no-one accidentally breaks a verified algorithm with a change later. The combination makes it reasonably unlikely that any algorithm would fail catastrophically with the sort of consequences you're talking about.
The possibility is always there, of course, because programming is subject to human error. However, FWIW, I've worked on software that's used to design cars, and software that controls the QA machinery to make sure they're put together right, and I still drive one. :-)
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
One interesting approach I have seen is the use of strings to store almost arbitrary decimal positions. You can set a maximum length, at which point it rounds. But the nice thing is that the rounding is done in the decimal number system instead of binary, so it is closer to how business managers expect it to be rounded (like you would do it on paper). This approach obviously is not ideal for scientific computing, but is geared toward business uses where rounding accuracy is more important than speed. PHP used to include the "BC" library that did this kind of thing. I don't know what happened to it.
Table-ized A.I.
Slashdot is jumping the shark. I'm just driving the boat.
Great post, though. You're absolutely correct.
--
Old programmers don't die, they're just cast into a void
BCD (binary coded decimal) is floating point too and perfectly accurate.
thank God the internet isn't a human right.
For 64 bits machines: float of 64 bits and double of 128 bits.
For doubles of 128 bits, is better to grow more bits of precision for mantisse and less bits for exponent.
what is the accuracy of tan(tan(tan(tan(tan(tan(tan(tan(tan(tan(sqrt(PI))) )))))))) at double of 128 bits?
More precision implicates more slow the computation of "cosino"!!!
open4free
Overloading only makes sense in statically typed languages. When you have dynamic typing, the compiler can't statically determine the types of the arguments, this needs to be done at runtime (at least in the worst case, see below). The "+" function in lisp is just that, a function (although the compiler will typically inline parts of it). And you can override it (well, many implementations would disallow that for reasons of speed, but you can always use a new name, such as "add", "sum", "plus", "my+", or something similar. Since lisp uses the same syntax, whether the symbol looks like an operator, this isn't as bad as it sounds, writing (plus a b) or (add a b) is not much worse than simply writing (+ a b).
The reasoning behind not letting you override the built-in "+" is simple. "+" needs to be at least partially inlined to get good speed. Furthermore, good lisp compilers (such as cmucl, sbcl, Franz Lisp, etc) all have insanely advanced compilers that can do type-inference to analyze your code and try to find guarantees for e.g. numbers used as arguments to "+" always being fixed-size integers, thus removing any kind of run-time testing, and allowing the same speed as statically typed languages. Common Lisp has a number of primitives for adding type declarations to expressions to help the compiler in this process.
If you must insist on using the name "+", you would typically save your old "+" function under a new name, and use it to define a new function called "+" that does the appropriate thing for all kinds of arguments, including your newly defined data-type. Needless to say, this is not particulary efficient, but it's the most obvious way to do it in a dynamically typed language. If you use namespaces, you can ensure that your new enhanced "+" function will only be used in those modules that need it.
If common lisp was designed today, it would probably have a much more integrated object system, making built-in functions such as "+" being methods (some scheme implementations do this). Then, instead of overriding the fast built-in "+" to call a slow user-defined function, you would simply define new methods for "+" and your new datatype.
This means that fast things remain fast, even when you extend the meaning of "+" to new datatypes. Under the hood, this will be implemented as a fast "+" for builtin types, such as fixed-size integers, and anything else, such as complex numbers, interval arithmetic, strings, etc, going through the object system.
Adding new "slow" methods would not disturb the fast stuff, but since method lookup would typically go through a hash-table (vtables are only useful for "static" classes where you can't add new methods at runtime), it could potentially slow down other "slow" methods, but not by much, hash-tables are pretty fast regardless of size.
Unfortunately, according to the common lisp standard, "+" is a function and not a method. And typically, most implementations will fight hard to let you override it (I am not entirely sure what the standard says about it). Your best bet is to simply use a different name then "+", together with the CLOS object system. This is also more in line with the classic lisp philosophy, which often provides a "fast" function for simpler arguments, and a "slow" for more generic arguments, with different names. But ideally, we want a new funky fully object-oriented lisp.
I am not sure what python, ruby, javascript, or other more modern object-oriented scripting languages do here, but I would guess most of them allow you to do what you want (at least I'm pretty sure javascript will do the trick).
Hmm, you had a good course. I took my undergraduate numerical methods course from the math department at my school, and he pounded home that on everything we did -- we needed to be able to put a bound on the error. I thought it was a great course.
Later, in graduate school (this time in physics), I had a similar course, but now the instructor spent very little time talking about error bounds. In fact, nobody I talked to in physics talked about that sort of stuff, it wasn't just something funny with my instructor. I don't know a single physicist [at least in my field...] who thinks about this stuff. It strikes me as wierd.
I don't blame it particularly on my instructor, because he's done amazing number crunching in his time, and he taught us some fabulous things. I lament that I've lost some of those error-bounding skills I had developed, and when I go to conferences and see people present results, I occasionally ponder if any of these new results are the result of subtracting one giant number from another giant number, and then doing the wrong thing with that result.
Float point is a well-defined and easy to understand representation. Of course, that doesn't mean it's easy to use--mathematically, it can be pretty complicated to deal with at times. Perhaps the biggest sin is to think of floating point numbers as "real numbers"--they aren't.
Unfortunately, IEEE 754, the most widely used floating point standard, fixes none of the complexities of using floating point but creates many completely unnecessary complexities of its own. Many CPUs just give up and throw any kind of specialized IEEE features into software, making them nominally compliant but unusable. And many programming languages refuse to implement the inane and broken semantics specified for IEEE comparison operators.
The only good thing that can be said about IEEE 754 is that even a lousy standard is better than nothing at all. And, on the bright side, you can usually put CPUs and compilers into modes where they behave somewhat sanely (no denormalized numbers, sane comparisons, no NaNs).
You can usually count my money using a 1-bit integral type with 1-bit for error checking.
"Interger" is not a word.
HTH.
Maybe I'm just not experienced enough with certain areas of programming (quite possible), but who cares if the 30th decimal point is rounded? As you add each decimal place to the right, you increas the precision by a factor of 10.
After all, 1.0000000000 is 100 billion times more accurate that 1. Why would you need any more precision?
Those articles are still quite valid - and will remain so.
So long as a float is still 32 bits and a double 64, you'll get about that degree of precision. It's not that the hardware is inaccurate - they all do pretty much the best they can with the information provided.
Roundoff errors and other evils of floating point representations are here to stay.
However, you can't just automatically decide to punt and use fixed point arithmetic. There is a 'tension' between dynamic range and precision. If you want reliable precision, you can't have large dynamic ranges for your numbers and vice-versa.
The biggest and best improvement we've seen since the early '90s is that doing your work in double precision is much less of a penalty than it used to be (when compared to working in single precision or integers).
With 64 bit machines, we should expect that penalty to become yet smaller.
So if speed is an issue, modern machines can be more precise - but if speed was not an issue, machines of the early '90s were every bit as precise as the latest wizz-bang 64 bit CPU. IEEE math hasn't changed much (at all?) in that time.
www.sjbaker.org
More importantly, because some processors
do their FP at 80bits and others at 64bits,
you can't depend on answers to even simple
FP math coming out bit identical on two
machines ( not even between certain Intel
parts ) which really sucks if you are
developing anything peer-to-peer with
the expectation of synchronicity.
The IEEE "standard" is an insult to the word.
It allows for uniform FP programming, but they fell short of drawing any useful lines in the saned w.r.t bit patterns and other impl details.