Where Intel Processors Fail At Math (Again)
rastos1 writes: In a recent blog, software developer Bruce Dawson pointed out some issues with the way the FSIN instruction is described in the "Intel® 64 and IA-32 Architectures Software Developer's Manual," noting that the result of FSIN can be very inaccurate in some cases, if compared to the exact mathematical value of the sine function.
Dawson says, "I was shocked when I discovered this. Both the fsin instruction and Intel's documentation are hugely inaccurate, and the inaccurate documentation has led to poor decisions being made. ... Intel has known for years that these instructions are not as accurate as promised. They are now making updates to their documentation. Updating the instruction is not a realistic option."
Intel processors have had a problem with math in the past, too.
Dawson says, "I was shocked when I discovered this. Both the fsin instruction and Intel's documentation are hugely inaccurate, and the inaccurate documentation has led to poor decisions being made. ... Intel has known for years that these instructions are not as accurate as promised. They are now making updates to their documentation. Updating the instruction is not a realistic option."
Intel processors have had a problem with math in the past, too.
with new maths
I should get an AMD CPU and put the extra money towards a graphic card since GPUs do math extremely well in parallel.
O RLY?
Get free satoshi (Bitcoin) and Dogecoins
The main goal for Floating Point coprocessor sine calculations is to get a good enough result in a set number of cycles.
Given that fully approximating sine takes about as many concrete operations as bits in the value, getting it exactly right isn't usually a trade off people want to make.
There's a reason the C standard specifies that mathematical trig functions are platform dependent. If you want it precise, do it yourself to the level of precision you need.
No surprise here since they keep adding new instructions and often nobody bothered to use them except during game/multimedia optimization, where precision is the least concern.
We already know it's a sin to eat pi.
Table-ized A.I.
A quick peek at Mr. Dawsons twitter shows a passing attempt at supporting Game Journalists and their corruptions.
Given that this happens, he's making a mountain out of a mole hill for political reasons.
https://software.intel.com/en-us/forums/topic/289702
Discusses the discrepancy in the FSIN function back in 2010.
Regards -
Now I finally know what the saying "Love the sinner, hate FSIN" means.
At least it was first according to my Intel processor.
Just use SQRT(1-FCOS^2)
The documentation says that the result will be correct until the last decimal place. So if the CPU says the answer is:
0.123 456 789 123 456 789
You have have a close approximation, accurate to the 17th decimal place, according to the documentation.
The problem is, the correct answer may be:
0.123 444 555 666 777 888
The documentation says it's fairly precise. In truth, it's only good to the fourth decimal place in some cases, whereas Intel documented the function to be accurate to 66 bits or so.
The headline is quite inaccurate. The processors are doing what they're designed to do; approximate the results of certain operations to a "good enough" value to achieve an optimal result:work ratio. Sort of like how the NFL measures first-downs with a stick, a chain, and some eyeballs rather than bringing in a research team armed with scanning electron microscopes to tell us how many Planck lengths short of the first down they were.
This is a documentation failure. They're fixing the documentation. For anyone who would actually care about perfect accuracy in these kinds of operations, there are any number of different solutions to achieve the desired, more accurate result. The headline and the summary make it seem as though there's a problem with the processor which is simply incorrect.
-- "Government is the great fiction through which everybody endeavors to live at the expense of everybody else."
Google docs is filled with rounding errors leading to fun situations where calculated dollar amounts are changed by a penny depending on where the original was calculated.
999.99$ can become 1000.00$.
They do:
https://docs.python.org/2/library/decimal.html
http://en.wikipedia.org/wiki/Arbitrary-precision_arithmetic
Just because your world is limited to .NET it does not mean there aren't other things out there... Did you even bother looking up?
Thanks for posting a link from the summary.
In the future, you may wish to consider actually reading the summary. Might add a little life to your keyboard and avoid unnecessary use of limited internets. Thanks.
Are the FSIN results more or less accurate than the trig tables inside the covers of math textbooks?
The Intel engineers watched Superman III, and they have a plan.
Get free satoshi (Bitcoin) and Dogecoins
The only time a mathematician every has an orgasm.
Division is futile. You will be approximated.
If you want it precise, do it yourself to the level of precision you need.
People just don't realize that FPUs are **inherently** approximations, anyone's FPU, its not Intel specific. There are inaccuracies converting to and from binary, there are inaccuracies depending on the relative magnitude of operands, the are inaccuracies due to rounding, etc ...
Do you know one way to tell if a calculator app is implemented using the FPU. Try 0.5 - 0.4 - 0.1, you may not get zero if a FPU is used. That is why handheld calculators often implement calculations using decimal math rather than a FPU. The better apps do so as well. This includes an iOS scientific/stats/business/hex calculator app that I wrote. Decimal math for operations, Taylor series based trig calculations, etc.
Isn't fsin used in some (pseudo)random number generators?
Sinbad reads Shashdot now? That be crazy.
Well, no.
From TFA, the absolute error closely approximates 0.000000000000000000004.
So you'll only see a relative error as large as you're showing (off in the fifth decimal place), if the correct answer is something like 0.000000000000000012345, which might show up as 0.000000000000000012344.
"I do not agree with what you say, but I will defend to the death your right to say it"
As most of the effort for better math is in the newer instruction sets using 8087 instructions makes no sense whatsoever. A non-issue unless the poster, the blogger and his ego all want a stroke today.
Waste of time, new headline: Idiot uses 8087 code, get shunned by /.
First, it's an x87 instruction. This unit has been deprecated for a decade.
Second, It was never meant to produce correct rounding for extended precision, and never claimed to. The documentation has always clearly stated 1 ULP of precision.
Apparently, some newbie doesn't understand what this means, and somehow it's a thing.
Here's an example from TFA:
tan(1.5707963267948966193)
actual -39867976298117107068
x87 fpu -36893488147419103232
error 743622037674500958.81 ulp
I dabble around in computer graphics, and use Cinema 4D's team-render on a number of older computers. They are all different makes, but all have Core i7 processors in them. 2 of the older machines run Mac OS, 1 runs Windows 8. For most computations done by the rendering engine, all machines are in agreement and no visible differences appear in the different image tiles returned by each of the machines. However, some of the Monte-Carlo rendering options (subsurface scattering, light-mapping) return wildly different results from different 'versions' of the same processor... Would this math issue be the root cause of these inconsistencies? Or would like more likely be down to OS differences between the libraries used by the software?
Even C/C++ has decimals in the main language if you use a gcc (or derivative):
https://gcc.gnu.org/onlinedocs/gcc/Decimal-Float.html
TFS basically shows how how the author has essentially zero experience with x86, and indeed programming in general. FCOS is part of the original 387 instruction set that has been deprecated since the Pentium IV. The "article" (read: blog) is about as informative as any statement beginning with "in my day...".
and the chip aint one.
Round off the usual suspects...
Java, even when running on Intel, have precisely defined behavior for floating point. You aren't going to run into this cpu specific nonsense there. Obviously there is a performance penalty to pay, and the developers that write JVMs must run compliance tests.
There's nothing I find particularly alarming here and the behaviour is in fact pretty much what I would expect for computing sin(x). Sure, maybe the doc needs updating, but nobody would really expect fsin to do much better than what it does. And in fact, if you wanted to maintain good accuracy even for large values (up to the double-precision range), then you would need a 2048-bit subtraction just for the range reduction! As far as I can tell, the accuracy up to pi/2 is pretty good. If you want good accuracy beyond that, you better do the range reduction yourself. In general, I would also argue that if you have accuracy issues with fsin, then your code is probably broken to begin with.
Opus: the Swiss army knife of audio codec
Here's what the complaint is about: The Intel FSIN instruction performs an argument reduction to calculate the values of the sine function, but not with the exact value of pi, but using a 66 bit approximation of pi. If your argument is close to a multiple of pi, then the argument reduction doesn't give the correct result.
HOWEVER, if your argument is an extended precision number close to pi = 3.14..., then the last bit in the mantissa of that number has a value of 2^-62. So if you calculate an argument close to pi, the unavoidable bounds for the rounding error are 2^-63. This error in the argument is about 10 times larger than the error caused by using an approximation for pi in the argument reduction. If you use double precision, the error in the argument is about 20,000 times larger than the error caused by the argument reduction.
All this has been known for years; posting it today and claiming there is any problem is just ridiculous.
Java also has a decimal type, though without operator overloading it isn't as pleasant to work with.
http://docs.oracle.com/javase/...
I don't know. If the processor was supposed to be designed from the documentation, then I would say the processor is at fault. I don't see how you can say the documentation is wrong if you don't know what the design was supposed to be. I code from documentation, so if my code doesn't match the documentation then my code is wrong. I usually don't got to point the blame on the docs. I might try that next time though. Seems to be working for Intel....
Wow. Really sensational article with pointless useful content. This is a long known issue. Never been a secret. For example see: http://hal.archives-ouvertes.fr/docs/00/28/14/29/PDF/floating-point-article.pdf
ok, so intel fixed their documentation now. move along, nothing to see here.
It seems that the blogger didn't actually read the documentation that he claimed to read. The exact behaviour is documented in "Intel® 64 and IA-32 Architectures Software Developerâ(TM)s Manual Volume 1: Basic Architecture" of March 2012 on page 8-31. I don't have an older copy of that manual anymore, but I have written code according to that exact documentation sometime around 2001, so I am quite confident that it was in the 2001 version of the document.
This is what the documentation says: "The internal value of Ï that the x87 FPU uses for argument reduction and other computations is as follows: Ï = 0.f â-- 2^2 where: f = C90FDAA2 2168C234 C". A more precise approximation according to Wikipedia would have been f = C90FDAA2 2168C234 C4C6 4...; the difference between pi and the approximation used by Intel is about 0.0764 * 2^-64.
If you let x = pi, then people would ordinarily expect that sin (x) = 0. That, however, would be wrong. Storing pi into a floating-point number produces a rounding error. Rounded to extended precision (64 bit mantissa) instead of the usual double precision (53 bit mantissa) produces a result of 4 * 0.C90FDAA2 2168C235 instead of 4 x C90FDAA2 2168C234 C4C6 4...; this is too large by 4 * (1 - 0.C4C64...) * 2^-64. The sine of that number would also be 4 * (1 - 0.C4C64...) * 2^-64.
But FSIN doesn't subtract pi from that number x, instead it subtracts 4 * 0.C90FDAA2 2168C234 C. So we get a result of 4 * (1 - 0.C) * 2^-64 instead of 4 * (1 - 0.C4C64...) * 2^-64. That's what he complains about. The reality is that the correct result would have been zero, but we couldn't get that because trying to assign pi even to an extended precision number gives some rounding error.
Now in practice, if you calculate an argument for the sine function, and that argument is close to pi, even if you manage to get a correctly rounded extended precision result, you must expect a rounding error up to 2^-63, and therefore an error in the result up to 2^-63, even if the calculation of the result is perfect. FSIN gives a result that is about 0.0764 * 2^-64 away from that, so the inevitable error caused by rounding the argument is increased by a massive 3.82 percent. Doing the calculation in double precision, as almost everyone does, makes the rounding error 2048 times larger and FSIN is now 0.00185 percent worse than optimal.
for anything implemented in an APU, last I checked.
Double precision for the most part only pops up in 200 dollar plus last gen, and 300+ current gen cards. It's one of my big gripes with the collusion in the industry. Nvidia stopped including DP in their cheap parts and wow, next stepping so did AMD!
Take the fp numbers, divide by cores and mhz, compare between chips/archs.
A number of other cpus get better clock for clock FPU ratings than Intel, but Intel has been dramatically better on the speed since the early '00s, and nowadays better on the IPC numbers. That often does not result in better FPU numbers however due to both the number of registers and the processing capacity of the FPUs in non-x86 arches.
Coming soon: Intel Core i7 School Dropout Edition
/fp:strict
Failing that, use a pen and paper.
Dawson points to an 'optimisation' in gcc 4.3: constant folding is done using the higher-precision MPFR library. At least the gcc developers seem to think it's an optimisation, but unless it's disabled by default, it is actually a bug. In the absence of undefined behaviour, optimisations must not change observable behaviour. And, as Dawson demonstrates, this one does.
If you need MPFR precision, you should use MPFR explicitly.
you get numbers that *look* right, as opposed to numbers that *are* right.
I have an old war story about a customer that tried to squeeze an IP over ISDN bridge through a (lossy) voice compression. I tried to explain to him that the IP packets at the other end *sounded* right -- ah, old times.
Come on, guys, you'll ever only use FPU instructions when you need speed, not precision.
Anyone remember 0x5f375a86?
The precision used in Quake's source code wasn't even nearly comparable to the FPU, but was fast enough.
In other words, you'll never calculate shopping cart totals minus discounts and other stuff this way (or, at least, you shouldn't!)
There's BigDecimal in Ruby/Java, decimal.Decimal in Python, GMP in C/C++, etc...
Nice to know my integers aren't affected. :-
Xcode using a recent version of Clang with the standard library supplied by Clang does the following:
1. The long double function sinl uses the FSIN instruction, except it checks when the argument is too large (> 2^63) and does an argument reduction using a 64 bit approximation of pi.
2. The double function sin uses SSE register and floating point arithmetic, using argument reduction with a value > 100 bits for pi.
If you are using FSIN, it means you are doing 32-bit compiles, and are a true luddite - go back to using your Pentium 1 on Windows 95. 64-bit systems have been generally available since 1992 (Alpha). No, I'm not an Intel FanBoi - I worked for AMD at one point.
This is why you should not take the sine of a large number in the first place:
In abstract terms, the sine function has a large relative condition number (infinity for multiples of pi) for large input arguments.
More specifically, double numbers are normally rounded to the equivalent of about 16 decimal places; so pi is represented as 3.141592653589793. (Real machines work in base-2, of course, but I will assume base-10 because it's more readable and the problem is the same.) As the sine of any multiple of pi should be zero, let's take sin(pi*1e15), which the machine represents as sin(3141592653589793). Obviously, 3141592653589793 is not a multiple of pi, so the result of this computation is not 0 but -0.2362.
The "bug" only matters if the intended argument of the sine is a number that can be represented without any round-off error, which is unlikely in any real-world application. Otherwise, the error of the computation is dominated by the inaccurate representation of the argument, and you cannot blame the implementation of the sine function for that.
All this has been known for years; posting it today and claiming there is any problem is just ridiculous.
So; Mr Guru; given that you "knew" about this years ago I guess you contacted Intel to update the docs. Oh no; you just "knew seekrehtl". Well then I guess there is a story here because the guy with the article actually bothered to pubish and get the docs updated. You might have got the publicitly if you really had known and did something about it. Give up on the sour grapes.
So, if your angle gets to low values (choose where that is for yourself) , then you switch to using that approximation and cut out a large chink of calculation. Saves a lot of hassle. It's as basic as checking for "divide by zero" situations.
Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"