Origin of Quake3's Fast InvSqrt()
geo writes "Beyond3D.com's Ryszard Sommefeldt dons his seersucker hunting jacket and meerschaum pipe to take on his secret identity as graphics code sleuth extraordinaire. In today's thrilling installment, the origins of one of the more famous snippets of graphics code in recent years is under the microscope — Quake3's Fast InvSqrt(), which has been known to cause strong geeks to go wobbly in the knees while contemplating its simple beauty and power." From the article: ""
From the words of John Carmack himself, if you discover and implement an algorithm by yourself, even though it may have already been discovered already, you deserve credit for finding it on your own.
[Insert rant about software patents]
> int i = (int)x;
converts a float to an int, potentially cutting off the digits past the decimal point and/or leading digits that don't fit into the 32 bit integer range.
> int i = *(int*);
reinterpretes the bit pattern of the float number as an integer, so the exponent and the sign bit become part of the number.
int i = (int)x would convert the floating point number to its rounded integer equivalent.
int i = *(int*)x assigns i the 32-bit value that represents how x is stored in memory as a float.
The inverse of the square root is the square. The reciprocal of the square root is what is being calculated. In the case of multiplication I guess it's a reasonable name but it's pretty poor form in my view.
Seriously, try looking away from the genius who obviously wrote it.
- There is no single comment which would make reading and understanding what happens here much easier!
- Introduction of a magic number with no explanation whatsoever
- Magic pointer arithmetics without demystification
- Portability? Abuse of a single processor architecture, without warning that this would not work on non-x86
I know it is good code. But it is simply bad code!First off, this function calculates 1.0/sqrt(x), not sqrt(x). InvSqrt is a particularily nasty function because both the divide and the square root stall the floating point pipeline on IA32 processors. As a result, instead of shooting out one result per cycle that the pipelining normally allows, the processor will stall for 32 cycles for the divide after it has stalled for the 43 cycles for the square root(P4). This is a big hit to realtime performance and it also prevents 76 multiplies from getting done while the pipeline is stalled. Secondly, IA32 processors are super scalar and have multiple integer units which can do portions of this calculation in parallel. This algorithm is brilliant because it uses the integer units for a portion of the most difficult part of the calculation and the remaining floating point multiplies only take about 6 clock cycles on the FPU. The difference in clock cycles you are counting is likely because the routine as written will be implemented as a function call and the stack push overhead will eat you alive. If this is implemented inline, it's about 6 times as good as simply calling the processor's assembly instructions for root and divide in sequence with the penalty that it isn't as accurate. It is virtually impossible to beat sqrt on IA-32 but 1.0/sqrt can be computed faster with newton raphson iteration in one fell swoop than by coposition of the operations. I've worked several years implementing similar optimizations in the reference implementation of ISO/IEC 18026, a standard for digital map conversion. Most of the routines that had optimizations like this added to them saw at least 30% speed improvements. This is a bit of a soft number because many things were reordered to make the pipeline fill better but in general, a complicated function especially of trig fucntions that can be computed in one iteration of well designed newton-raphson will be much faster than the coposition of the CPU's implementation of the component functions. In short, don't write off careful numerics they can provide great sped improvements, just don't use them in code that people will want to understand later if you don't document exactly what you did and why.
The real genius here is Isaac Newton. 'course, that's not news to anyone.
You mean that Newton thought about taking advantage of the IEEE float format to initialize the algorithm using "i = 0x5f3759df - (i>>1);"? Wow, now that's a clever guy!
Opus: the Swiss army knife of audio codec
I think the point of the parent post was to point out that the summary should have enough information that you can decide if you want to RTFA or not. You should not have to RTFA to decide if you want to RTFA article or not.
As far as the name of the function - true, but you never know, it could be something different. A small explaination would have helped.
RonB
It is human nature to take shortcuts in thinking.
The result of taking the square root of a number and then squaring it will be the absolute value of the original number.
Really? What if the number is negative?
I think you mean to say "the original number", not "the absolute value of the original number". When given a negative argument, this composition will either return an error (because there is no support for complex numbers) or a negative result equal to the input.
RTFA much?
You live in America, don't you?
I imagine they ran some profiling tools and found that they were spending a substantial amount of time in calls to sqrt(), and figured that the division was also time-consuming and for an operation that is performed on so many pixels, it was worthwhile to optimize this particular routine.
I've written enough 3D graphics code -- including a textured polygon rasterizer that would probably cry and try to delete itself if it saw something like Quake 3 -- to know that they didn't have to run a profiler to know that they'd be spending too much time doing 1/sqrt(x) if they didn't have a highly optimized routine for it. It's an inherent operation in so many 3D calculations it isn't funny, and while by the time Quake 3 came out hardware floating point units were pretty fast, FP divides and FP square roots were very lengthy operations that more importantly couldn't be pipelined.
But if you're just trying to, say, figure out which pixel to color, and you approximate the pixel to a few decimal places...I think you're good to go.
Yeah, pretty much. Back when I wrote my code (pentium days) you had an FP unit but it wasn't very good, so I used fixed point math (using integer instructions) which had sufficient precision for a 320x200 display. Getting enough performance out of the core algorithms was still hard enough that I had to take a lot of shortcuts, like instead of doing the right thing by using a divide every pixel to calculate which texel to use, I used a divide every 8 pixels and linearly interpolated in between. I'm sure that Quake (the contemporary 3D engine of the day which would also make my code cry) contained many more clever optimizations and approximations, because it wouldn't have been possible on the hardware of that day without them.
In fact, approximating FP values for 3D code is so common that the 3DNow and SSE instruction sets contain instructions that approximate the square root and inverse square root to about half of single-precision floating point. The non-approximation instruction uses a lookup table to get an initial guess, then uses a couple iterations of Newton's Method to refine the result. The short cut instructions simply return the value in the lookup table.
So yeah, basically the AC OP has no idea what he is talking about, and from that basis of ignorance is denigrating what is in reality a very clever and extremely useful hack.
The enemies of Democracy are
Attribution.
That is why most, if not all, software patents are bogus. Just because you reinvent something published by a PhD working in a committee that disbanded 10 years before you knew 'C' came after 'B' in the alphabet does not make you reinvention patent worthy. The history of invsqrt() crosses disciplines of hardware and software design, spec development, graphics and math theory. With such a fascinating function having a hard to track history it is no wonder that things like James W. Hunt's 1976 diff concept can be patented in 2006 by Microsoft. (As mentioned by QuantumG on slashdot.)
The trouble had in tracking the history of InvSqrt() is really sad. Computing is an industry that hypes how digital storage of information and perfect copies eliminates the isse of decay. While the ever expanding secondary storage (just now getting to really usefull) means oblivating the scaling issues with retaining the meta-information that denotes this very history. I guess the moral of the story is: sign your contributions. It won't take up that much space in the comments.
--Alan Perlis
And will the real Fast InvSqrt() author please stand up?
"You cannot have a General Will unless you have shared experiences. You cannot be fair to people you don't know."
And badly named! It's not the inverse square root (which would simply be the square); it's the reciprocal of the square root.
I don't think that's quite what he meant...