Where Intel Processors Fail At Math (Again)
rastos1 writes: In a recent blog, software developer Bruce Dawson pointed out some issues with the way the FSIN instruction is described in the "Intel® 64 and IA-32 Architectures Software Developer's Manual," noting that the result of FSIN can be very inaccurate in some cases, if compared to the exact mathematical value of the sine function.
Dawson says, "I was shocked when I discovered this. Both the fsin instruction and Intel's documentation are hugely inaccurate, and the inaccurate documentation has led to poor decisions being made. ... Intel has known for years that these instructions are not as accurate as promised. They are now making updates to their documentation. Updating the instruction is not a realistic option."
Intel processors have had a problem with math in the past, too.
Dawson says, "I was shocked when I discovered this. Both the fsin instruction and Intel's documentation are hugely inaccurate, and the inaccurate documentation has led to poor decisions being made. ... Intel has known for years that these instructions are not as accurate as promised. They are now making updates to their documentation. Updating the instruction is not a realistic option."
Intel processors have had a problem with math in the past, too.
with new maths
I should get an AMD CPU and put the extra money towards a graphic card since GPUs do math extremely well in parallel.
The main goal for Floating Point coprocessor sine calculations is to get a good enough result in a set number of cycles.
Given that fully approximating sine takes about as many concrete operations as bits in the value, getting it exactly right isn't usually a trade off people want to make.
There's a reason the C standard specifies that mathematical trig functions are platform dependent. If you want it precise, do it yourself to the level of precision you need.
http://www.shsmedia.com/pentiu...
Remember... "Don't Divide, Intel Inside"
We already know it's a sin to eat pi.
Table-ized A.I.
The documentation says that the result will be correct until the last decimal place. So if the CPU says the answer is:
0.123 456 789 123 456 789
You have have a close approximation, accurate to the 17th decimal place, according to the documentation.
The problem is, the correct answer may be:
0.123 444 555 666 777 888
The documentation says it's fairly precise. In truth, it's only good to the fourth decimal place in some cases, whereas Intel documented the function to be accurate to 66 bits or so.
The headline is quite inaccurate. The processors are doing what they're designed to do; approximate the results of certain operations to a "good enough" value to achieve an optimal result:work ratio. Sort of like how the NFL measures first-downs with a stick, a chain, and some eyeballs rather than bringing in a research team armed with scanning electron microscopes to tell us how many Planck lengths short of the first down they were.
This is a documentation failure. They're fixing the documentation. For anyone who would actually care about perfect accuracy in these kinds of operations, there are any number of different solutions to achieve the desired, more accurate result. The headline and the summary make it seem as though there's a problem with the processor which is simply incorrect.
-- "Government is the great fiction through which everybody endeavors to live at the expense of everybody else."
They do:
https://docs.python.org/2/library/decimal.html
http://en.wikipedia.org/wiki/Arbitrary-precision_arithmetic
Just because your world is limited to .NET it does not mean there aren't other things out there... Did you even bother looking up?
We are pentium of borg. Division is futile. You will be approximated.
From what I remember it was the first revision of the Pentium 1 aka the Pentium 0.999998163849
SJW n. One who posts facts.
Here's an example from TFA:
tan(1.5707963267948966193)
actual -39867976298117107068
x87 fpu -36893488147419103232
error 743622037674500958.81 ulp
The positive or the negative root?
I just use the average of the two, for predictable output.
Socialism: a lie told by totalitarians and believed by fools.
There's nothing I find particularly alarming here and the behaviour is in fact pretty much what I would expect for computing sin(x). Sure, maybe the doc needs updating, but nobody would really expect fsin to do much better than what it does. And in fact, if you wanted to maintain good accuracy even for large values (up to the double-precision range), then you would need a 2048-bit subtraction just for the range reduction! As far as I can tell, the accuracy up to pi/2 is pretty good. If you want good accuracy beyond that, you better do the range reduction yourself. In general, I would also argue that if you have accuracy issues with fsin, then your code is probably broken to begin with.
Opus: the Swiss army knife of audio codec
Here's what the complaint is about: The Intel FSIN instruction performs an argument reduction to calculate the values of the sine function, but not with the exact value of pi, but using a 66 bit approximation of pi. If your argument is close to a multiple of pi, then the argument reduction doesn't give the correct result.
HOWEVER, if your argument is an extended precision number close to pi = 3.14..., then the last bit in the mantissa of that number has a value of 2^-62. So if you calculate an argument close to pi, the unavoidable bounds for the rounding error are 2^-63. This error in the argument is about 10 times larger than the error caused by using an approximation for pi in the argument reduction. If you use double precision, the error in the argument is about 20,000 times larger than the error caused by the argument reduction.
All this has been known for years; posting it today and claiming there is any problem is just ridiculous.
It seems that the blogger didn't actually read the documentation that he claimed to read. The exact behaviour is documented in "Intel® 64 and IA-32 Architectures Software Developerâ(TM)s Manual Volume 1: Basic Architecture" of March 2012 on page 8-31. I don't have an older copy of that manual anymore, but I have written code according to that exact documentation sometime around 2001, so I am quite confident that it was in the 2001 version of the document.
This is what the documentation says: "The internal value of Ï that the x87 FPU uses for argument reduction and other computations is as follows: Ï = 0.f â-- 2^2 where: f = C90FDAA2 2168C234 C". A more precise approximation according to Wikipedia would have been f = C90FDAA2 2168C234 C4C6 4...; the difference between pi and the approximation used by Intel is about 0.0764 * 2^-64.
If you let x = pi, then people would ordinarily expect that sin (x) = 0. That, however, would be wrong. Storing pi into a floating-point number produces a rounding error. Rounded to extended precision (64 bit mantissa) instead of the usual double precision (53 bit mantissa) produces a result of 4 * 0.C90FDAA2 2168C235 instead of 4 x C90FDAA2 2168C234 C4C6 4...; this is too large by 4 * (1 - 0.C4C64...) * 2^-64. The sine of that number would also be 4 * (1 - 0.C4C64...) * 2^-64.
But FSIN doesn't subtract pi from that number x, instead it subtracts 4 * 0.C90FDAA2 2168C234 C. So we get a result of 4 * (1 - 0.C) * 2^-64 instead of 4 * (1 - 0.C4C64...) * 2^-64. That's what he complains about. The reality is that the correct result would have been zero, but we couldn't get that because trying to assign pi even to an extended precision number gives some rounding error.
Now in practice, if you calculate an argument for the sine function, and that argument is close to pi, even if you manage to get a correctly rounded extended precision result, you must expect a rounding error up to 2^-63, and therefore an error in the result up to 2^-63, even if the calculation of the result is perfect. FSIN gives a result that is about 0.0764 * 2^-64 away from that, so the inevitable error caused by rounding the argument is increased by a massive 3.82 percent. Doing the calculation in double precision, as almost everyone does, makes the rounding error 2048 times larger and FSIN is now 0.00185 percent worse than optimal.
Dawson points to an 'optimisation' in gcc 4.3: constant folding is done using the higher-precision MPFR library. At least the gcc developers seem to think it's an optimisation, but unless it's disabled by default, it is actually a bug. In the absence of undefined behaviour, optimisations must not change observable behaviour. And, as Dawson demonstrates, this one does.
If you need MPFR precision, you should use MPFR explicitly.
Come on, guys, you'll ever only use FPU instructions when you need speed, not precision.
Anyone remember 0x5f375a86?
The precision used in Quake's source code wasn't even nearly comparable to the FPU, but was fast enough.
In other words, you'll never calculate shopping cart totals minus discounts and other stuff this way (or, at least, you shouldn't!)
There's BigDecimal in Ruby/Java, decimal.Decimal in Python, GMP in C/C++, etc...