The Trouble With Rounding Floats
lukfil writes "We all know of floating point numbers, so much so that we reach for them each time we write code that does math. But do we ever stop to think what goes on inside that floating point unit and whether we can really trust it?"
This is why I use the decNumber library from IBM.
l The decNumber library implements the General Decimal Arithmetic Specification[1] in ANSI C. This specification defines a decimal arithmetic which meets the requirements of commercial, financial, and human-oriented applications.
http://www2.hursley.ibm.com/decimal/decnumber.htm
The library fully implements the specification, and hence supports integer, fixed-point, and floating-point decimal numbers directly, including infinite, NaN (Not a Number), and subnormal values.
The code is optimized and tunable for common values (tens of digits) but can be used without alteration for up to a billion digits of precision and 9-digit exponents. It also provides functions for conversions between concrete representations of decimal numbers, including Packed Decimal (4-bit Binary Coded Decimal) and three compressed formats of decimal floating-point (4-, 8-, and 16-byte).
If you are actually concerned about rounding and precision, use decimal instead.
Since these are computers, and they deal primarily with binary internally, why not store the numerical value and the 2nd power instead? Oh, and since we generally need more bits of accuracy in the numerical value than the exponent (do you often deal with numbers 2**(2**32)?), why not allocate a "reasonable" number of bits to the exponent and leave more for the numerical value.
Uh oh, we just re-invented floating point. Oh well, nice try.
If you were just trying to get better accuracy by using base 10 rather than base 2, you're just hiding the problem (and making the hardware quite a bit more complex). If you want true accuracy, abandon floating point and use a bignum system.
--
perl -e'$_=shift;die eval' '"$^X $0\047\$_=shift;die eval\047 \047$_\047"' at -e line 1.
Sources for the above:
Pensioners shortchanged of 'float'
Ariane casting problem (float -> 16 bit int)
You beat me to it. I was just thinking about how much better TFA would be if they explained the specifics of how the Office Space team ripped off the banking system.
Information wants a fueled airplane waiting at the hangar and no one gets hurt.
The use of transforms for handling numerical calculations is an old trick. It is probably best-known in its use as a very quick way to multiply or divide using logarithms and a slide-rule, prior to the advent of widely-available scientific calculators and computers. Nonetheless, devices based on logarithmic calculations (such as the mechanical CURTA calculator) can wipe the floor with most floating-point maths units - this despite the fact that the CURTA dates back to the mid 1940s.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
A much better article is linked from this one near the bottom: what every computer scientist should know about floating-point arithmetic
Is there any fundamental reason why decimal arithmetic in a computer should be more accurate than binary arithmetic in a computer?
No, no, the problem is not with the precision! The problem is that when input and output is decimal, but the calculation is binary, then you get additional errors from the conversion that badly educated programmers do not expect.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Binary Coded Decimal would be one.
I looked for a page that described the advantages
of BCD, but I could not find one. So I'll have a
stab at it myself. Basically, while slower, BCD
can maintain arbitrary precision. If you have
monitary items and you have a good handle on the
range of values, you can store and operate on these
values without any rounding losses at all.
emt 377 emt 4
Welcome to a very poor article on what's been taught in early Comp Sci for many many years.
Any serious developer of business software knows all about this and avoids floating point at all cost for financial calculations. Scientists however do use them carefully since the math they do is usually much more performance (speed) sensitive and the calculations are a little more complex than what tends to be done on the business side (ie _most_ business calcs are relatively simple).
These posts express my own personal views, not those of my employer
No, C will automatically recast a number as needed in cases like the above.
The issue is actually a pretty commonly understood situation when going from decimal floating point numbers to binary IEEE floats (I have another comment on here describing how they're stored), and it basically comes down to this:
Floats of any sort are stored as an int with an int shift (a.aa x b^c). As such, there will be aliasing problems based on the prime components of b. A known percentage of divisors will produce repeating numbers. For example, any division of 3,5,7,11.... in base 2 will be repeating. Any division of 3,7,11,13... in base 10 will be repeating.
No, there's nothing you can do about it. Use higher precision if needed, and otherwise get over it.
110100 1101000 1101000 1100110 0 1101111 1101000 1100011 1
Actually the problem was that they used a float to store the system time (time since power on) in the ground radar unit. It allowed the clock to be used in calculations without a conversion. A float will store an integer just fine (and accurately) until the number gets too large and then the units part drops off the bottom of the precision and the increment operator no longer makes any sense. This was a design decision that made sense for the role for which the missle platform was originally designed. The patriot was originally designed to be used in the European Theater (if the cold war ever turned hot) and as such would never remain in one location for more than a very few days.The clock is reset everytime they move the battery (they power off the ground tracking radar when they move). The use in the gulf war was in a strategic role (not tactical) which kept them continuously operating in a single location for long periods of time, and the shortcut they used came back to haunt them (as usual). If they had reset the system every few days, the problem would not have occured.
Atlas stands on the earth and carries the celestial sphere on his shoulders.
And I'm not sure if it can be solved altogether. When you spend a little time meditating over the IEEE 754, you notice a few flaws. The first and most obvious is, of course, that, no matter how precise you want to make it, somewhere there's a cutoff. And, especially when you multiply with floats, that error grows as well. But there's another problem. Two actually.
The first one is the one mentioned in the article, and something everyone who didn't sleep through his IT classes should know: Computers calculate binary, and converting floats from binary to decimal isn't possible without error. There is no way to represent 0.37 in binary, in IEEE754. No matter how many bits you spend on the mantissa. Now, you can argue that, if you make it "big enough", it doesn't matter anymore since it's well within the error margin and when you round it to, say, 5 after decimal, the error vanishes. True. But when you start calculating, when you multiply or, worse, exponentiate, the error grows in big leaps.
Another, less obvious, problem is hidden underneath the way the IEEE754 works: Your error grows as your numbers grow. This might seem obvious, but it is interesting how many people overlook this flaw and problem in everyday life. Since according to the IEEE754 standard, real numbers are stored as exponent and mantissa, if you're dealing with BIG numbers, a fair deal of your mantissa is spent on the "pre-comma" part of your number, so you're losing precision. You can't reliably say that "a double is good for 5 behind dot, no matter what", you have to take into account how many of those precious mantissa bits are spent before you even get to ponder what's left for your precision.
This isn't so much a problem of processors. It's a problem of people understanding how their processors work.
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
Unfortunately, most of the obvious alternatives are either somewhat restrictive, or have relatively poor performance. For example, on a 64-bit machine, a 64-bit integer works quite nicely -- but of course, most people aren't (yet) using 64-bit machines. There are quite a few arbitrary precision integer packages available, but most of them are substantially slower than a float or a double for most calculations.
Unfortunately, quite a bit of calculation with money really will run into problems with being stored in a 32-bit integer. If you're dealing strictly with US currency, you're right: the problem only arises with quite large amounts of money. OTOH, if you have to deal with international currency, the problem can (and will) arise far sooner. Just for a couple obvious examples, there are around 27 Russian Kopecks or about 39 Zambian Kwacha to one US penny. Even inside of the US, some things have prices that include fractions of a cent (e.g. Gasoline often has .9 cents tacked onto the end of its price, and some stock/bond/commodities markets use prices in eights of a cent).
In a lot of cases, the best alternative to using floating point is to use floating point. No, that's not a typo. What you want to do is store an integer number of pennies -- but store it in a double (or on ocassion, a long double). A typical implementation of double can also be used as a 53-bit integer type. A 32-bit number is right on the edge where it's often usable, but can easily run into problems. A 53-bit integer makes those problems much more remote -- to the point that by the time you're dealing with such a large amount of money, you probably don't want to know about pennies anymore. A long double will store a 20 digit integer -- so it should be usable even for something like figuring the stock dividend of a large company.
As long as you're careful about order of calculation and when you do rounding, this gives the added bonus of working nicely for things like converting one currency to another, that are relatively difficult to do with pure integers.
The universe is a figment of its own imagination.
For the uneducated, the reason that this is stupid is that IEEE-754 floating point numbers cannot REPRESENT all values, they APPROXIMATE them. There is no way to properly represent the value 0.01 as a float (0.01 is best approximated by 3C23D70A, or 9.9999998e-3). So, for instance, if you were to add up 100 pennies, you would have 99.999998 cents, not 100. Repetitive additions (like credits and debits from an account) or multiplications (interest calculations, amortizations, etc.) simply make the problem worse, which is why floats should NEVER be used to track money. A fixed decimal system should always be used for financial systems.
--That's the point of being root, you can do anything you want, even if it's stupid.
pardon my ignorance but why does the problem exist today? can't it be fixed?
The problem exists today, and cannot be fixed, because it is a fundamental problem, not one that is a bug or anything like that. If you are going to store a number by any variation on the technique of writing down the digits, you won't be able to represent certain numbers precisely.
when i use my calculator, it doesn't give rounded off numbers.
Oh, but it does. When you ask your calculator what 1 / 3 is, it tells you it's .33333, for some particular, fixed, and finite number of threes. But the real answer is an infinite number of threes. So it is in fact rounding.
is it that insignificant?
Well, kinda, yeah. Can you think of any applications for which 10 digits of precision isn't enough? If you can measure something to ten digits of precision, and you want to make something a mile long, you have enough precision so that the variation won't be visible to the naked eye. An 80 bit double gives something like 18 digits of precision.
If you're interested in rounding (and who isn't?) you might want to read An introduction to different rounding algorithms.
I believe that we solved that problem in my first Computer Science course by using Integers and long integers. Essentially we took the monetary value, turned it from dollars into cents for calculation, then back into dollars for output. The only error would be in our method of using division for conversion, which could have been remedied by some other methods I can think of which are probably solved by much better people than I.
SRSLY.
no, although a double or larger increases the precision, all floating point-style numbers suffer from this problem. You might get away with it for a larger number, but you are wasting bits, and stil cannot guarantee that the number is always going to be accurate. In addition, as the number gets larger, more precision is lost. For instance, 782533.37 as a float is 493F0C55, or 782533.31, and as a rounded float is 493F0C56, or 782533.38. Neither would be acceptable, and even a double will lose precision at about 16 significant digits, which could be a problem for daily interest calculations or another high-precision calculation. What should be used is a packed decimal (BCD) or fixed decimal (int or long and an associated scale factor). The whole point is that for a financial institution, there is likely a fixed precision level that is acceptable, but floats, doubles, etc., cannot guarantee that any given precision will be maintained, as it depends upon all of the factors involved (value and size of each number in the calculation). For the same reasons, floating point numbers should never be used in program flow control, either, as doing a for(float x = 0.0; x 1.0; x = x + 0.01) will iterate 101 times, since the first time x is not less than 1.0 is when it reaches 1.00999..., and if you are looking for a specific value, it may never reach it (like for(x = 0.0; x != 1.0; x = x + 0.01)). If you instead assume that you will always deal with 2 decimal places (or 3, or whatever), you can guarantee that your addition or multiplication will be accurate, and can scale the answer later if necessary.
--That's the point of being root, you can do anything you want, even if it's stupid.
For example, if your input consist of one large number, and tons of small ones, then rounding-errors mean that starting with the large number gives a much smaller result than starting with the small ones.
If I scale it down to smaller numbers, you see why:
1.0*10^5 + 1.0*10^1 = 1.0*10^5
So, adding a "small" number to a "large" number gives you simply the large number.
If you repeat this, a million times, your result is still simply the large number.
So you could end up concluding that 1.0*10^5 + (1.0*10^1 + 1.0*10^1 ..[1000000 times]...) = 1.0*10^5
That is an order of magnitude wrong. The correct result is 1.1*10^6
Practical result ? You need to think about your input. If it *may* look like this, you need to add up by repeatedly adding the two smallest numbers. Easy to do with a priority-tree. pseudocode like this:
MS-Excel, by the way, does *NOT* do this in it's SUM() function, if you feed it a "large" number and *many* "small" numbers, you get horrendously wrong results. Because of the relatively high precision of floats and doubles though, you need to use larger numbers than in my example here.
It's Transcendental number.
Not trolling, just helping since you asked.
So you can laugh all you want to...
P.
On solution as you put, is to priority sort your operations.
Another solution, is the Kahan summation algorithm.
Wich, grosso-modo, keeps track of the error at each step, and injects it back at the next.
In your example, in each iteration, the algorithme notice that tha 1.0e1 is missing from the sum and carries it to the next addition. A few iterations later, the carry is big enough to be added to the result.
The advantages are : you don't need to first load all components in a tree, then itteratively sort them and process them all until you're done. In fact you can even use this algorithme in a streaming fashion, were you don't enven need to know how much value will come.
The disadvantages are : some compilers are able to guess that the carry "should mathematically be 0" (actually true in a perfect world with infinite precision numbers) and could "optimise" the code back to a plain normal sum function bypassing the algorithm (and won't subsequently use any other sum-correction algorithm).
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
I'm happy to comment on it without being anonymous. I designed and oversaw the implementation of the LSE feeds (to and from) for the stockbroking part of a large UK high street bank which shall be NatW^H^Hmeless. If you tried to implement the internals using floating point arithmetic it would be pretty much impossible to get it to pass the LSE's conformance tests, which all assume you will use integer arithmetic and explicit rounding according to their rules.
It's worse than that. In some kinds of calculations, the error can be so large that the result is entirely meaningless. That is, for example when doing a subtraction between two nearly equal floating point values, both of which were approximated, the result may have more noise than signal. This isn't the usual case, but when writing general code that cares about precise values, you can code yourself a beartrap that only springs in rare circumstances.
h tml
There is an excellent article about all of this detail, linked from TFA at sun: http://docs.sun.com/source/806-3568/ncg_goldberg.
Granted, I have never written any code where this matters, but I had never realized really just how bad some of the implications are in some cases.
-josh
Basically, I agree with you, but as Hamming pointed out in the 1950s you can get yourself into trouble with some thing like:
A = small number, e.g. size of smallest feature in a Celeron-M CPU in microns
B = big number, e.g. distance to Andromeda galaxy in microns
C = (A+B) ... (some set of clever operations) ... - B
The math is fine, but the implementation won't yield the correct answer. because the value of interest got scaled off. That isn't rocket science, but it is suprisingly hard to catch.
That said, Floating Point works suprisingly well for most things. Back in the 1960s, things like 2+2=3.99999 were a fairly common phenomena. Hardly ever happens today. I spent about three decades in the defense industry writing and testing systems that got things from point X to point Y. I think I saw a reasonable selection of the possible errors that folks can make. Other than subtracting big numbers from other big numbers and expecting the result to be precise, I can't recall a single problem with floating point.
If we're looking for something that is basically broken to write an article about, forget floating point. Try event driven software architectures. Or the notion that it is possible to write unambiguous specifications that are also comprehensible.
You can't see ANYTHING from a car, You've got to get out of the goddamned contraption and walk...Edward Abbey
This Wikipedia page has good background information.
This sig all sigs devours
The first paper recommended for learning more about floating point arithmetic is usually Goldberg's famous What Every Computer Scientist Should Know About Floating-Point Arithmetic .
I can't remember whether the paper specifically discusses the failure of floating point arithmetic to obey the mathematical laws of arithmetic, but even if not, the background it provides is probably enough for you to understand the reasoning yourself.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Everyone knows about rounding problems with floats. What people don't seem to realize (and this goes for many real "Rocket Scientists" I've known) is that a float only gives you a fixed amount of digits - that's all. You can have a highly precise small number or a really big number. You cannot ever have a precise, big number. It just so happens that these same "Rocket Scientists" like to represent time as a floating point based on some 1970 epoch. Guess how accurate that is going to be in a couple of years. How do they solve this problem? The "double double" type of course!