Origin of Quake3's Fast InvSqrt()
geo writes "Beyond3D.com's Ryszard Sommefeldt dons his seersucker hunting jacket and meerschaum pipe to take on his secret identity as graphics code sleuth extraordinaire. In today's thrilling installment, the origins of one of the more famous snippets of graphics code in recent years is under the microscope — Quake3's Fast InvSqrt(), which has been known to cause strong geeks to go wobbly in the knees while contemplating its simple beauty and power." From the article: ""
"English motherfucker, do you speak it?" Anyone care to explain what that function does?
Only the State obtains its revenue by coercion. - Murray Rothbard
From the words of John Carmack himself, if you discover and implement an algorithm by yourself, even though it may have already been discovered already, you deserve credit for finding it on your own.
[Insert rant about software patents]
I have a truly marvelous proof of who wrote this code which this comment box is too narrow to contain.
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
Anyone who has ever played a computer game should pay up.
The linked site seems to be down (gee, you think it might be slashdotted?), but this paper seems to be covering the same topic.
See what I've been reading.
Can someone enlighten me?
-dave
http://millionnumbers.com/ - own the number of your dreams
If you RTFA, the first person the author asks about it is J.C., and he says "nope, not me, maybe this other guy". Specifically NOT taking credit for this snip of code.
1984 was supposed to be a warning, not an instruction manual.
Comment removed based on user account deletion
The inverse square root is 1/(sqrt(x)), not x^2 (which is, I admit, the first thing I thought of, wondering why anyone would be so excited about a faster way of getting it).
Naah, just kidding. They both deserve a spot in the Clever Hacks Hall of Fame
Does InvSqurt stand for Inverted Squirt? Is this when I bring my Zune back to the store after I found out what a piece of shit it is?
Do not trust this signature.
great read, enjoyed it. brought back many good memories.
that just goes to show you tweeds in VM land that you need to get down and dirty sometimes.
Suddenly I'm very scared that Ballmer will want to InvSqrt me some pictures of his kids.
The inverse of the square root is the square. The reciprocal of the square root is what is being calculated. In the case of multiplication I guess it's a reasonable name but it's pretty poor form in my view.
http://www.icarusindie.com/DoItYourSelf/rtsr/index .php?section=ffi&page=sqrt
That page compares the time it takes to calculate the sqrt various ways including Carmacks. Short version is that modern processors are significantly faster since it can be done in hardware. It may still be useful in cases where the processor doesn't have the sqrt function available.
His version took 428 cycles compared to 107 cycles doing it in hardware on the same system.
Work Safe Porn
OK, my subject is flippant, but it is also serious. The alternative you are proposing simply extracts an integer approximation to the float x. The code as written does something different: it extracts the binary representation of the floating-point number. That is, it is extracting the raw bits involved.
Here's a more interesting question: "How does this function compute 1/x^(1/2)?" ;)
I'm asking, because I have no idea how it works.
This was a great article. Especially for the script kiddies and wanna be HaXoRz. It's great to enlighten others about the art of true technology. Artcle included the following: * Mystery (who wrote this) * A Puzzle to solve (how did they come up with some a clevery solution) * Games (Quake, Doom, 3D graphics) * Math (Inverse Square-Root) * History (Who's who of 3D programming, VooDoo->3dFx->nVidia) * Lessons (Skill, Assembly Language, Research, OpenSource is good) Kinda brought back feelings of the "Revenge of the Geeks" series by PBS, when you get to place faces, names, ideas, feelings and thoughts to technical acheivements.
Most /.ers are familiar with Fermat's last theorem. If nothing else, there was a lot of media hype when Andrew Wiles announced his solution. Beyond that, it's been in a couple episodes of Star Trek. Explaining it here is like explaining the existence of the aurora to eskimos.
Introduction
Note!
This article is a republishing of something I had up on my personal website a year or so ago before I joined Beyond3D, which is itself the culmination of an investigation started in April 2004. So if timeframes appear a little wonky, it's entirely on purpose! One for the geeks, enjoy.
Origin of Quake3's Fast InvSqrt()
To most folks the following bit of C code, found in a few places in the recently released Quake3 source code, won't mean much. To the Beyond3D crowd it might ring a bell or two. It might even make some sense.
InvSqrt()
Finding the inverse square root of a number has many applications in 3D graphics, not least of all the normalisation of 3D vectors. Without something like the nrm instruction in a modern fragment processor where you can get normalisation of an fp16 3-channel vector for free on certain NVIDIA hardware if you're (or the compiler is!) careful, or if you need to do it outside of a shader program for whatever reason, inverse square root is your friend. Most of you will know that you can calculate a square root using Newton-Raphson iteration and essentially that's what the code above does, but with a twist.
How the code works
The magic of the code, even if you can't follow it, stands out as the i = 0x5f3759df - (i>>1); line. Simplified, Newton-Raphson is an approximation that starts off with a guess and refines it with iteration. Taking advantage of the nature of 32-bit x86 processors, i, an integer, is initially set to the value of the floating point number you want to take the inverse square of, using an integer cast. i is then set to 0x5f3759df, minus itself shifted one bit to the right. The right shift drops the least significant bit of i, essentially halving it.
Using the integer cast of the seeded value, i is reused and the initial guess for Newton is calculated using the magic seed value minus a free divide by 2 courtesy of the CPU.
But why that constant to start the guessing game? Chris Lomont wrote a paper analysing it while at Purdue in 2003. He'd seen the code on the gamedev.net forums and that's probably also where DemoCoder saw it before commenting in the first NV40 Doom3 thread on B3D. Chris's analysis for his paper explains it for those interested in the base math behind the implementation. Suffice to say the constant used to start the Newton iteration is a very clever one. The paper's summary wonders who wrote it and whether they got there by guessing or derivation.
So who did write it? John Carmack?
While discussing NV40's render path in the Doom3 engine as mentioned previously, the code was brought up and attributed to John Carmack; and he's the obvious choice since it appears in the source for one of his engines. Michael Abrash was mooted as a possible author too. Michael stands up here as x86 assembly optimiser extraordinaire, author of the legendary Zen of Assembly Language and Zen of Graphics Programming tomes, and employee of id during Quake's development where he worked alongside Carmack on optimising Quake's software renderer for the CPUs around at the time.
Asking John whether it was him or Michael returned a "not quite".
-----Original Message-----
From: John Carmack
Sent: 26 April 2004 19:51
Subject: Re: Origin of fast approximated inverse square root
At 06:38 PM 4/26/2004 +0100, you wrote:
>Hi John,
>
>There's a discussion on Beyond3D.com's forums about who the author of
>the following is:
>
>float InvSqrt (float x){
> float xhalf = 0.5f*x;
> int i = *(int*)
> i = 0x5f3759df - (i>>1);
> x = *(float*)
> x = x*(1.5f - xhalf*x*x);
> return x;
>}
>
>Is that something we can attribute to you? Analysis shows it to be
>extremely clever in its method and supposedly from the Q3 source.
>Most people say it's your work, a few say it's Michael Abrash's. Do
>you know who's responsible, possibly with a history of sorts?
Not me,
Nope. It's when it goes in instead of out.
...the work of John Romero. Apparently he was going to call it the MakeYouMyBitch() function. ;P
-"...bad old ideas look confusingly fresh when they are packaged as technology" - Jaron Lanier (Digital Maoism on Edge.o
I haven't seen code like this since I stopped coding Mandelbrot sets in 68k assembler.
As a young newt I looked at the code inside fractint with awe and discovered some similar marvellous optimisations.
Building on those and converting to 68k I made what was the fastest mandelbrot calculation I could.
Another blast from the past.
liqbase
Specifically, the compilation _Notation, Notation, Notation_, page 130.
Seriously, try looking away from the genius who obviously wrote it.
- There is no single comment which would make reading and understanding what happens here much easier!
- Introduction of a magic number with no explanation whatsoever
- Magic pointer arithmetics without demystification
- Portability? Abuse of a single processor architecture, without warning that this would not work on non-x86
I know it is good code. But it is simply bad code!float InvSqrt(float x) { float xhalf = 0.5f*x; int i = *(int*) i = 0x5f3759df - (i >> 1); x = *(float*) x = x*(1.5f - xhalf*x*x); return x; } Asking John whether it was him or Michael returned a "not quite". But it's supposed to return a float!
I will forever be a student.
All floating point operations are approximations, writing good floating point code is all about understanding the approximations and tradeoffs you are making.
and floading point operations can be one of the slowest parts of code
i have to agree on the lack of comments in the source for games, it can make them very hard to penatrate (particularlly when combined with heavy use of macros)
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
They're computing the inverse square root, usually written as something like x^(-2) (which is the same as 1/sqrt(x) of course) via, umm, err, magic :-)
... that's one of the crazy parts the PDF analyzes). Then it changes the int-float back into a plain float and does one step of Newton's method using the guess derived from the magic part.
:-) Read the PDF linked in the article if you want someone who still knows what they're talking about to explain it; I'm just trying my best to remember anything after all these years. I swear that analysis class gave me some Pavlovian instinct to shudder in dread whenever I see them pulling out the strange substitutions and Taylor expansions so they can calculate the error...
Seriously, they're using some evil numerical techniques that approximate the actual value of that function to withing a few percent very quickly. You could do more steps of Newton's root finding method or other complex things, but that would be slow and this is only meant to find things that are "good enough" for you to draw on someone's screen.
This function is used in graphics programming, specifically in things like the Quake 3 source code (as well as other places). First, it calculates x/2, then it gets the bits of the float x as an integer (without modifying them), then it does the "magic" step with the magic number they calculated via evil mathematics to give the best starting point they can for Newton's method using the x >> 1 to divide the float-as-integer in two (and make it positive, since your sign bit falls into the exponent of the float!
Don't mind me. I may have a degree in mathematics, and I even (somehow!) made it through at least one class on numerical analysis, but all I can tell you is that this stuff is evil
Introduction Note! This article is a republishing of something I had up on my personal website a year or so ago before I joined Beyond3D, which is itself the culmination of an investigation started in April 2004. So if timeframes appear a little wonky, it's entirely on purpose! One for the geeks, enjoy. Origin of Quake3's Fast InvSqrt() To most folks the following bit of C code, found in a few places in the recently released Quake3 source code, won't mean much. To the Beyond3D crowd it might ring a bell or two. It might even make some sense. float InvSqrt (float x){ float xhalf = 0.5f*x; int i = *(int*) i = 0x5f3759df - (i>>1); x = *(float*) x = x*(1.5f - xhalf*x*x); return x; } Finding the inverse square root of a number has many applications in 3D graphics, not least of all the normalisation of 3D vectors. Without something like the nrm instruction in a modern fragment processor where you can get normalisation of an fp16 3-channel vector for free on certain NVIDIA hardware if you're (or the compiler is!) careful, or if you need to do it outside of a shader program for whatever reason, inverse square root is your friend. Most of you will know that you can calculate a square root using Newton-Raphson iteration and essentially that's what the code above does, but with a twist. How the code works The magic of the code, even if you can't follow it, stands out as the i = 0x5f3759df - (i>>1); line. Simplified, Newton-Raphson is an approximation that starts off with a guess and refines it with iteration. Taking advantage of the nature of 32-bit x86 processors, i, an integer, is initially set to the value of the floating point number you want to take the inverse square of, using an integer cast. i is then set to 0x5f3759df, minus itself shifted one bit to the right. The right shift drops the least significant bit of i, essentially halving it. Using the integer cast of the seeded value, i is reused and the initial guess for Newton is calculated using the magic seed value minus a free divide by 2 courtesy of the CPU. But why that constant to start the guessing game? Chris Lomont wrote a paper analysing it while at Purdue in 2003. He'd seen the code on the gamedev.net forums and that's probably also where DemoCoder saw it before commenting in the first NV40 Doom3 thread on B3D. Chris's analysis for his paper explains it for those interested in the base math behind the implementation. Suffice to say the constant used to start the Newton iteration is a very clever one. The paper's summary wonders who wrote it and whether they got there by guessing or derivation. So who did write it? John Carmack? While discussing NV40's render path in the Doom3 engine as mentioned previously, the code was brought up and attributed to John Carmack; and he's the obvious choice since it appears in the source for one of his engines. Michael Abrash was mooted as a possible author too. Michael stands up here as x86 assembly optimiser extraordinaire, author of the legendary Zen of Assembly Language and Zen of Graphics Programming tomes, and employee of id during Quake's development where he worked alongside Carmack on optimising Quake's software renderer for the CPUs around at the time. Asking John whether it was him or Michael returned a "not quite". -----Original Message----- From: John Carmack Sent: 26 April 2004 19:51 Subject: Re: Origin of fast approximated inverse square root At 06:38 PM 4/26/2004 +0100, you wrote: >Hi John, > >There's a discussion on Beyond3D.com's forums about who the author of >the following is: > >float InvSqrt (float x){ > float xhalf = 0.5f*x; > int i = *(int*) > i = 0x5f3759df - (i>>1); > x = *(float*) > x = x*(1.5f - xhalf*x*x); > return x; >} > >Is that something we can attribute to you? Analysis shows it to be >extremely clever in its method and supposedly from the Q3 source. >Most people say it's your work, a few say it's Michael Abrash's. Do >you know who's responsible, possibly with a history of sorts? No
HAKMEM is classic bathroom reading for hackers. If you want to do it up old-school, print a copy from original scans, double-sided.
As the article says, this snippet has been around since the original Quake came about. Back then (a decade?) the hardware was so much slower than it is today that this approx function was absolutely needed to write the game. It's not new, nor mysterious. In fact I first encountered this code on a forum where somebody linked to the top10 most interesting code snippets (or something to that end)... I've even used it my own code here and there. The crazy hex number exploits IEEE floating point.
:)
A not that is not mentioned here: you can repeat the IEEE magic number line again for a more accurate result
There are 10 types of people in the world. Those who understand binary and those who do not.
Look son, crazy people...
Back around 1990, I wrote a game called Alpha Waves (Preview). Back then, the problem was not using integers to approximate floating point math, it was avoiding integer multiplications, because those were expensive compared to additions. For those of you with a curious mind, you can take a look at http://cc3d.free.fr/cube.s for the original source code.
So here is a quizz, the first one answering wins the admiration of the crowd:
- How did that engine avoid multiplications?
- How did it compute sine and cosines?
- What low-level hardware trick did it use in two-player mode, and how could that be used for performance tuning?
I believe Alpha Waves was the first video game on personal computers with full-screen "real" 3D (i.e. not scaled sprites), and I also believe it was the first 3D game with two simultaneous players sharing a screen. Would anybody have data to confirm or infirm that?
--
-- Physics for Dummies
-- Did you try Tao3D? http://tao3d.sourceforge.net
Note that the final magic number isn't the one implicated in the paper on lamont.org though the paper does test it with two other possible candidates. Gary Tarolli mentions that he may have changed which particular magic number was used in the final code but that he didn't write the code itself. Likely, if this particular fast square root function is embedded as a circuit and the circuit patented, the legal documents will attribute the code to Gary since he made the most significant contribution of actually choosing which number was used in the code which was released.
The question still remains, though: who wrote the routine?
the NPG electrode was replaced with carbon blac
I wish there was an option to mod some of these posts "+1 Whoa".
Perhaps this is from the old days of the Future Crew demo scene? They did some amazing things with limited CPU cycles back then (1988-1993).
I know my Dad used this technique in the 60's or 70's for vector normalization. It was quite valuable on systems that had no hardware floating point divide (and nothing in that era had a hardware square-root).
Being non-iterative it also has a deterministic speed, which is valuable in
real-time applications.
Integer adds/subtracts on floating-point representations were also sometimes used to approximate multiplication and division (by adding/subtracting the
exponent)
I believe you meant to say x^(-1/2)
Too bad the people modding you up don't have math degrees. =P
:(){
We called it Numerical Analysis at Penn State. I didn't have to take it, I was only getting a minor in Comp Sci, and a major in Comp Eng.
...in high school.
But...I learned how to approximate using Newton's method in Calc 1.
Actually, I think the genius is the guy who came up with the magic bit sequence.
:(){
Here's an old version of one of my webpages:
c ities.com/SiliconValley/9498/sqroot.html
http://web.archive.org/web/19990210111728/www.geo
And here's an updated version of the same page:
http://www.azillionmonkeys.com/qed/sqroot.html
It isn't an exact rendering of the code in question, but it explains enough for any skilled hacker to 1) understand what's going on and 2) modify the code to create the resulting code that's in the Quake 3 source. Furthermore this web page has existed since about 1997 (archive.org doesn't go back that far for some reason.)
Now *IF* in fact the code origin comes from someone who took ideas from my site, I should point out that *I* am not the originator of the idea either (though I did write the relevant code). Bruce Holloway (who I credit on the page) was the first person to point out this technique to me at around the 1997 timeframe (prior to this, I created my own method which is similar, but not really as fast). (Vesa Karvonen informed by of the technique (through a code snippet with no explanation) at roughly the same time as well.) It was a technique well known to hard core 3D accelerator and CPU implementors, and follows an intentional design idea from the IEEE-754 specification.
Prof. William Kahan, one of the key people who specified the IEEE-754 standard (the standard for floating point the many CPUs use, starting with Intel's 8087 coprocessor) apparently presented this idea, and is the source for where Bruce Holloway got the idea. The IEEE-754 standard came out around the 1982 time frame. Though, its very likely that these ideas originate from even earlier in computing history.
On intel machines, Carmack is often, but not always, faster then the FPU. In fact, for some it can be faster to use Carmack to compute the non-reciprocal square root. And, no. I don't quite know why G5 sucks so bad in this test. The same source code is used for all of them.
Support SETI@home
This is John Carmack we're talking about. I imagine he knows how to use a code profiler. The approximations it involve are likely to cause bugs. Floating point always involves approximations. If you don't like them, use a better approximation (double precision). But if you're just trying to, say, figure out which pixel to color, and you approximate the pixel to a few decimal places...I think you're good to go.
I mean, it's not like they're running scientific simulations with this. It's Quake3. Furthermore, converting floating-points to integers and then doing calculations on them is a trick that has been known for years, long before Quake3. They weren't converting floats to ints. They manipulated the bitwise representation of the float. I'm not sure if that's what you meant, though.
:(){
I think the original constant might have been derived with simple Monte Carlo methods, and the method itself is probably from some book from long time ago, from the 50's when vacuum tube computers first came onto the scene and such speed gains by optimizing binary code to the limit by phd mathematicians was at the forefront of research. To quote the pdf article you cite, "However the analysis should be thoroughly tested on a real machine, since hardware often does not play well with theory." That's nonsense - the machine is predictable, therefore plays well with theory, it's you who get overly engulfed and lost in the derivations by following too many threads of thought and losing track of your assumptions and approximations, it's the theory that's got the problem, and the machine is just an easy and lazy way to double check your thoughts, but you could technically do the same calculations the machine does with a paper and pencil, it would just take longer. I think this article is a bait, to see who will point out subtle issues in the report that might have been directly planted, and such people who dare figure out wat's wrong will instantly win a prize of a dozen shitflies settling down onto them from the NSA/FBI/CIA/MSFT/Dept. of Homeland security, and instantly win a "slavejob-dreamjob" at a fancy place or more like a "fancy prison" of none of their choice, to continue developing their talent and put it to good work for Da Man.
It starts by taking a guess at the right answer, and then improving the guess until it's accurate enough to use.
The first step depends heavily on the fact that a floating point number on a computer is represented as a significand (aka mantissa) and an exponent (a power of two). For the moment, consider taking just the square root of X instead of its inverse. You could separate out the exponent part of the floating point number, divide it by two, and then put the result back together with the original significand, and have a reasonable starting point.
From there, you could improve your guesses to get a better approximation. The simplest version of that would be like a high-low game -- you split the difference between the current guess and the previous guess, and then add or subtract that depending on whether your previous guess was high or low. Eventually, you'll get arbitrarily close to the correct answer.
This can take quite a few iterations to get to the right answer though. To improve that, Newton-Raphson looks at the curve of the function you're working with, and projects a line tangent to the curve at the point of the current guess. Where that line crosses the origin gives you the next guess. That's probably a lot easier to understand from picture.
In this case, we're looking for the inverse square root, which changes the curve, but not the basic idea. As a general rule, the closer your first guess, the fewer iterations you need to get some particular level of accuracy. That's the point of the:
While the originator of this constant is unknown, and some of it is rather obscure, the basic idea of most of it is fairly simple: we start by shifting the original number right a bit. This divides both the mantissa and the exponent part by two, with the possibility that IF the exponent was odd, it shifts a bit from the exponent into the mantissa. The subtraction from the magic number then does a couple of things. For one thing, if a bit from the exponent was shifted into the mantissa, it removes it. The rest of the subtraction is trickier. If memory serves, it's based on the harmonic mean of the difference between sqrt(x) and (x/2) for every possible floating point number of the size you're using.
This is where the fact that it's 1/sqrt(x) instead of sqrt(x) means a lot: 1/sqrt(x) is a curve, but it's a fairly flat curve -- much flatter than sqrt(x). The result is that we can approximate a point on the curve fairly accurately with a line. In this case, it's really two lines, which gets it a bit closer still.
From there, the number has had a bit of extra tweaking done -- it doesn't actually give the most accurate first guess, but its errors are often enough in the opposite direction from those you get in the Newton-Raphson iteration steps that it gives slightly more accurate final results.
The universe is a figment of its own imagination.
I saw the method used in the SGI OpenGL sample implementation. It was quite a while ago.
http://oss.sgi.com/projects/ogl-sample/
Anyone have access to the version control for this at SGI?
It is quite possible several people came up with similar ideas independently. Particularly if they were exposed to similar IEEE-754 hacking tricks, such as the fast log/exp.
I may twist orthodoxy to partly justify a tyrant. But I can easily make up a German philosophy to justify him entirely.
(I'm (sorry))
You don't (happen to (program (lisp))) do you?
Thank you very much! Nothing beats an example.
"In our tactical decisions, we are operating contrary to our strategic interest."
The algorithm is simple Newton-Raphson -- make a good initial guess, then iterate making the guess better. I think Newton-Raphson on 1/sqrt picks up 5-6 bits each try in the line "x = x*(1.5f - xhalf*x*x)". It can be repeated to get a more accurate result each time it's repeated.
The problem with Newton-Raphson is making a good first guess--otherwise, you need an extra iteration or two. And that's what the magic number is doing, making a good first guess.
So let's work out what a good first guess would look like for 1/sqrt(f), to see where this code came from.
Floating Point numbers are stored with a mantissa and an exponent: f = mantissa * (2 ^ exponent), where exponent is 8-bits wide and the mantissa is 23-bits wide.
Let's take an example: 1/sqrt(16) would have f = 1.0 * (2 ^ 4). We want the result 0.25 which is f = 1.0 * ( 2 ^ -2).
So our first guess should take our exponent, negate it, and cut it in half. (Try more examples to see that this works--it's basically the definition of 1/sqrt(f)). We'll ignore the mantissa--if we can just get within a factor of 2 of the answer in one step, we're doing pretty well.
Unfortunately, the exponent is stored in FP numbers in an offset format. In memory, The mantissa is in the low 23 bits, and the most-significant bit is the sign (which will be 0 if we're taking roots). For now, let's just assume we have our exponents as 8-bit values, to work out what we need to do with the +127 offset.
We want new_actual_exp = -(actual_exp)/2. But in memory, exp = (actual_exp + 127). Or, actual_exp = exp - 127.
Substituting gives (new_exp - 127) = -(exp - 127)/2. Simplify this to: new_exp = 127 - (exp - 127)/2 => new_exp = 3*127/2 - (exp / 2).
Now the exponent is shifted 23 places in memory, so let's write out our code (and ignore the mantissa completely for now...): rewriting as hex: Well, first, it's arguable whether it should be 0x5f000000 or 0x5f400000 (The "4" is actually in the mantissa). I'm guessing resolving that dilemma led to the original author discovering that choosing a particular pattern of bits in the mantissa can help make the initial guess even more accurate, leading to the 0x5f3759df constant.
I haven't worked it out, but Chris Lomont http://www.lomont.org/Math/Papers/2003/InvSqrt.pd
You're correct, 1/sqrt(x) == x^(-1/2). I'm not very alert today, sorry.
:P
The rest is correct, though
I compiled the function and tested it with few numbers...and compared the output with 1/sqrt(x)
InvSart(4) = 0.499154
1/sqrt(4) = 0.5
InvSart(10) = 0.315686
1/sqrt(10) = 0.316228
InvSart(2) = 0.70693
1/sqrt(2) = 0.707107
InvSart(100) = 0.0998449
1/sqrt(100) = 0.1
so it's pretty accurate to around three figures...
> To quote the pdf article you cite, "However the analysis should be thoroughly tested on a real machine, since hardware often does not play well with theory." That's nonsense - the machine is predictable,
Err, the reason they said that is because the method makes assumptions about the way a float is represented in binary. If you change how it works (or want to use doubles instead of floats, etc.), you'll have to rethink that method or your errors won't be nice and small. Any particular machine may be predictable, but the way we work with floating point numbers on a computer might *not* be if we discover some new & better way to model rational binary numbers.
I think I have found a similar solution for the reciprocal of the FOURTH root. The code looks almost the same, but it requires a new constant and you shift the bits 2 places instead of just 1. float Inv4thrt(float x) { float x0 = x; int i = *(int*) // get bits for floating value
i = 0x4F541EEF - (i>>2); // gives initial guess y0
x = *(float*) // convert bits back to float
x = (x/4.0f)*(5 - x0 * x*x*x*x); // Newton step, repeating increases accuracy
return x;
}
Note that the constant I have here is not very carefully tuned so the error may be slightly larger than with the sqrt.
Pretty cool, I think.
The code may or may not be a good way of solving the problem, but it's lousy code nonetheless:
* it's badly named: "reciprocal square root" would be better (however, that's a problem with graphics lingo)
* it lacks a comment as to where it came from
* it lacks a comment as to how it works or how the constant was determined
* it lacks a comment as to how well it works
* it lacks a comment as to the range of arguments it's valid for
* it lacks (optional) error checking
* it lacks a comment as to what environment it works in
* it lacks a straightforward base implementation
* there don't seem to be any test cases either
The code posted in the article is actually not exactly the Q3 source code. The Q3 source code actually contains two slightly different versions of the function, one called "Q_rsqrt" commented (with "// what the fuck") and a single iteration, and second one that's uncommented and named incorrectly "SquareRootFloat" and uses two iterations. Neither of them say what the function actually does, and the fact that there are two of them is also not so great.
Yeah, that kind of code causes me "to go wobbly in the knees"--by annoying the hell out of me--because the programmer who wrote it is wasting my time. Pretty predictably, sooner or later, this code will break something, and it will first take time to figure out that this code is responsible, then it will take some time to figure out what it is supposed to do, and finally, it will be a pain to figure out how to fix it. Or, I'd just rewrite it.
The format choices for Slashdot comments are:
For this topic it seems we need TeX.
I imagine they ran some profiling tools and found that they were spending a substantial amount of time in calls to sqrt(), and figured that the division was also time-consuming and for an operation that is performed on so many pixels, it was worthwhile to optimize this particular routine.
I've written enough 3D graphics code -- including a textured polygon rasterizer that would probably cry and try to delete itself if it saw something like Quake 3 -- to know that they didn't have to run a profiler to know that they'd be spending too much time doing 1/sqrt(x) if they didn't have a highly optimized routine for it. It's an inherent operation in so many 3D calculations it isn't funny, and while by the time Quake 3 came out hardware floating point units were pretty fast, FP divides and FP square roots were very lengthy operations that more importantly couldn't be pipelined.
But if you're just trying to, say, figure out which pixel to color, and you approximate the pixel to a few decimal places...I think you're good to go.
Yeah, pretty much. Back when I wrote my code (pentium days) you had an FP unit but it wasn't very good, so I used fixed point math (using integer instructions) which had sufficient precision for a 320x200 display. Getting enough performance out of the core algorithms was still hard enough that I had to take a lot of shortcuts, like instead of doing the right thing by using a divide every pixel to calculate which texel to use, I used a divide every 8 pixels and linearly interpolated in between. I'm sure that Quake (the contemporary 3D engine of the day which would also make my code cry) contained many more clever optimizations and approximations, because it wouldn't have been possible on the hardware of that day without them.
In fact, approximating FP values for 3D code is so common that the 3DNow and SSE instruction sets contain instructions that approximate the square root and inverse square root to about half of single-precision floating point. The non-approximation instruction uses a lookup table to get an initial guess, then uses a couple iterations of Newton's Method to refine the result. The short cut instructions simply return the value in the lookup table.
So yeah, basically the AC OP has no idea what he is talking about, and from that basis of ignorance is denigrating what is in reality a very clever and extremely useful hack.
The enemies of Democracy are
Attribution.
That is why most, if not all, software patents are bogus. Just because you reinvent something published by a PhD working in a committee that disbanded 10 years before you knew 'C' came after 'B' in the alphabet does not make you reinvention patent worthy. The history of invsqrt() crosses disciplines of hardware and software design, spec development, graphics and math theory. With such a fascinating function having a hard to track history it is no wonder that things like James W. Hunt's 1976 diff concept can be patented in 2006 by Microsoft. (As mentioned by QuantumG on slashdot.)
The trouble had in tracking the history of InvSqrt() is really sad. Computing is an industry that hypes how digital storage of information and perfect copies eliminates the isse of decay. While the ever expanding secondary storage (just now getting to really usefull) means oblivating the scaling issues with retaining the meta-information that denotes this very history. I guess the moral of the story is: sign your contributions. It won't take up that much space in the comments.
--Alan Perlis
And will the real Fast InvSqrt() author please stand up?
"You cannot have a General Will unless you have shared experiences. You cannot be fair to people you don't know."
Dude, ever think that full docs might me in a .txt or .doc file?
If we could do html documents inside source comment blocks that would be awesome instead of crappy ascii comments with no
ability to do images fonts or diagrams
Liberty freedom are no1, not dicks in suits.
Apart from the unnecessary obscurity of the code, the code actually simply computes the wrong values with current versions of gcc and optimization turned on. See here:
t ml
t ml
http://gcc.gnu.org/ml/gcc-bugs/2006-03/msg02943.h
http://gcc.gnu.org/ml/gcc-bugs/2006-03/msg02957.h
It also makes unstated assumptions about the values it gets called with; call it with something else and you get bad results.
Java sucks, right?
But it IS just an approximation. It doesn't give the right answers.
Ex: InvSqrt(25) returns 0.19969 instead of the expected result of 0.2. (ie. sqrt(25) = 5, 1/5 = 0.2, for those wondering about the math).
I am the maverick of Slashdot
The code is extremely clever. Net: It finds the inverse square root [1/sqrt(n)] using a great initial guess and one iteration of Newton's approximation method. It avoids excessive division, the square root operation, and multiplication, which are computationally expensive.
.4, .8) to see how it converges. With me so far? Newton's method finds roots, and finds them fast if given a good guess.
I'm not an expert, but heres how I understand it:
1. Background: Newton's method finds roots of any function
------
What does factoring an equation have to do with finding 1/sqrt(n)? A lot. Give me a number n. I now make the function
f(x) = 1/sqrt(x) - n
Notice that when you find an x where f(x) = 0, it means x is the inverse square root of n:
f(x) = 0
1/sqrt(x) - n = 0
1/sqrt(x) = n
x = 1/sqrt(n)
In other words, I need to find the root of that equation. Newton's method lets you do this by picking a starting value, seeing how far off you are, and getting closer and closer with each iteration. There's more info online. With Newton's method, call your initial guess "g". An better approximation for the root is
guess_new = g - f(g)/f'(g)
In our case, f(x) = 1/x^2 - i (where i is the initial value, as seen in the code). We use the power rule to see that f'(x) = -2x^-3, and plug it into the guess_new equation above:
guess_new = x - (1/x^2 - i)/-2x^-3
guess_new = x(1.5 - ix^2)
which is exactly what the code above has. If you keep plugging "guess_new" back in the equation, you can get closer and closer to the answer.
Here is a demo of multiple iterations to find inverse square: http://tinyurl.com/vh7hg/ Try plugging in different initial guesses (.2,
2. Now our problem becomes: How can we make a good guess?
------
If we had a lot of time, we could just pick a random number and keep iterating using the method above. But that would be slow - we want a *good* guess.
Well, our best guess for the inverse square root is the inverse square root itself! What's a good way to get 1/sqrt(n)?
This is the first level of magic. Assume you have a number in exponent form, like this:
10^6 = 1 million
If you want to find the regular square root of 1 million, just divide the exponent by 2.
sqrt(10^6) = 10^(6/2) = 10^3 = 1 thousand.
If you want the *inverse* square root, divide the exponent by -2 to flip the sign.
invsqrt(10^6) = 10^(6/-2) = 10^-3 = 1/thousand
Ok so far? Our goal is to divide the exponent of i (our number) by -2 to get a really awesome guess for Newton's approximation method.
3. Floats are stored in mantissa-exponent form
------
This is the key. Floating-point numbers have an explicit exponent and mantissa component. Theoretically, we could mask out the bits for the exponent and do division.
But division is expensive; the code uses another clever hack. Shifting bits is the same as dividing by 2 (or 4, 16, or any other power; the remainder is truncated, which is OK for an approximation).
So we can divide by 2 easily. And if we want a negative number, instead of multiplying by -1 (expensive), we can just subtract the number from "0" (cheap).
The program converts the floating point into an integer (using the pointer tricks), shifts the bits by 1 to halve the exponent, and subtracts from "0" (the magic number - hold on) to negate it.
4. Why the magic number 0x5f37...?
------
We can't just subtract from zero, there's too much going on. First, by shifting the bits we mave move some of the exponent bits into the mantissa. Also, there are different cases of odd/even exponents. The paper goes into lots of special cases, I didn't really understand them all first time around. But the magic number tries to minimize errors, and there can be several magic numbers used.
5. What's the result?
------
The result is that you get a great initial value to
I see. So you say the unpredictable part is the assumption of floating point representation in sign, mantissa and exponent terms, which can vary from architecture to architecture.
This function does not aproximate UNTIL you get good answer. The thing is that this initial guess is so good that just one approximation gives you almost two decimal points for any float. That is cute and you do not need more precision in games. No branching, you nail it every time.
I'm gonna float (pun) an idea here: How about using a look-up table, and then extrapolate linearly between the 2 nearest matches in the table?
-Tablizer
Table-ized A.I.
So, if you do
fortune -m 'bits in'
on any modern linux with fortune installed you will get the following,
(along with usually at least one other tidbit)
#define BITCOUNT(x) (((BX_(x)+(BX_(x)>>4)) & 0x0F0F0F0F) % 255)
#define BX_(x) ((x) - (((x)>>1)&0x77777777) \
- (((x)>>2)&0x33333333) \
- (((x)>>3)&0x11111111))
It works, and I have used it to good advantage in circumstances where
nothing else would _quite_ do...
What, you ask, does it do? It gives you the number of bits which are set in a
given byte/word/dword...
So BITCOUNT(0x81) == 2
and so on.
Innocent people shouldn't be forced to pay for inferior software development.
--"Code Complete" Microsoft Press
1)I read a lot of comments explaining the algorithm andeverytime they say it uses an iteration (like the Newton-Raphson method) to approximate 1/sqrt(x). But the problem is I don't see where the iteration is! I have already done some C and C++ programming, but I don't understand how this code snippet can iterate without any loop. I don't see any "for(...)" in it. Can you do iterations simply with this line?: i = 0x5f3759df - (i>>1); As far as I know ">>1" simply shifts the bits to the right and 0x5f3759df is simply a number in its hexadecimal form. So this line looks like a simple substraction to me. So my question is: How does it iterate? 2)There is some C-code I didn't understand: float xhalf = 0.5f*x; x = x*(1.5f - xhalf*x*x); What does the "f" after the number do?
Sorry, first time I was posting.
I didn't pay attention to the fact that it was in HTML by default.
Here it is again but in an easier to read form:
1)I read a lot of comments explaining the algorithm and everytime they say it uses an iteration (like the Newton-Raphson method) to approximate 1/sqrt(x).
But the problem is I don't see where the iteration is! I have already done some C and C++ programming, but I don't understand how this code snippet can iterate without any loop.
I don't see any "for(...)" in it.
Can you do iterations simply with this line?:
i = 0x5f3759df - (i>>1); As far as I know ">>1" simply shifts the bits to the right and 0x5f3759df is simply a number in its hexadecimal form. So this line looks like a simple substraction to me.
So my question is: How does it iterate?
2)There is some C-code I didn't understand:
float xhalf = 0.5f*x;
x = x*(1.5f - xhalf*x*x);
What does the "f" after the number do?
This was explained in Dr. Dobb's Journal magazine's feature called Algorithm Alley, can't remember the year or month. Tried googling in Yahoo and Google but didn't help.
But effectively that calculated the number of bits by parallelizing over the whole number - calculate number of bits in whole, calculate number of bits in two halves, calculate number of bits in four quads, etc. up to "calculate number of bits in 1 bit" (which is trivial). Then you sum them together.
Answer to 2)
The f after 0.5 and the 1.5 simply means "use the floating point version of this number". Multiplying two numbers which have different types (e.g. multiplying an int and a float) or doing assignment between different types means that one of the numbers will have to be automatically converted to the other's type and which conversion happens depends on the types of both numbers. By default, I think type of the literal 0.5 (without any letters after it) is a double. By adding the f (assuming x is a float) the types of the operands are the same so there is no implicit type conversion thus eliminating the chance of the wrong conversion happening automatically. You care about what type of multiply is done because certain types can lead to unwanted loss of precision.
As for 1)
Well if you iterate by one step you don't need a loop. Presumably most looping iteration happens because your first guess wasn't accurate enough. If your first guess WAS accurate enough why would you needlessly calculate more accuracy? I'm only speculating though - I don't know if this is really the case.
The actual approach in the algorithm is to iterate, with each iteration getting closer to the final step. However, the magic constant sets the starting point for this iteration so close that iterating is not actually required unless you want a lot more (unnecessary) accuracy.
- Give a man a fire and he's warm for a day, but set him on fire and he's warm for the rest of his life.
-- Did you try Tao3D? http://tao3d.sourceforge.net
Or is it fast already? Does the hardware just decrease the exponent with one when it sees that the value to multiply x with is 0.5? Otherwise, it sounds like that's something you would want to do manually, right?
Swedish plasma phys. PhD student; MSc EE; knows maths, programming, electronics; finance interest; seeks opportunities
The 'f' in 0.5f keeps the compiler away from casting 0.5 to the default 64bit double. So we get to stay in the cheaper 32 bit domain for our calculation.
There is only one iteration, and as many other have said, this is because the initial guess is so clever so one iteration will be sufficient for the job in question. If you wanted to do a second iteration you would repeat the last line rather than set up a loop. So there would not be any 'for' or 'while' statements in that case either.
send + more == money?
Really, by the comments on this page, it's amazing how people use functions they don't understand! What's this? Faith?
Programming should not be based on faith but on reasoning.
The article even fails to mention that the function is just a rough approximation at 1/sqrt(), this function would be better labeled RoughInvSqrt(). And yes bitwise manipulation of float point values is cool, yes, that's programming.
What's in a sig?
I think he just ran simulations with constants and he really didn't use any math to come up with 0x5f3759df. Someone mentioned that 0x5f375a86 appears to perform slightly better. Can't it be that this person simply had some code that executed InvSqrt() all the time until the he found a good constant?
Now, IF they make a better floating point standard (I doubt it; this one is perfect in many ways), the only thing you really need is a new magic number.
And IF they wanted to use double precision numbers...they probably care about precision and wouldn't be using a routine that is only good for approximation.
:(){
The original article that came out more than a year and a half ago, right after Quake's source code has been revealed, has been published at Code Maestro: http://www.codemaestro.com/reviews/review00000105. html
If someone doesn't have any good ideas of his own for writing articles, perhaps he should have picked another profession...
This routine is good for precision. In the article he mentions some swede who had to get 48 significant digits for a thermo calculation for a doctorate student. Note that the 2nd iteration in the pdf is orders of magnitude more precise then the first, so all you need is enough iterations. You can always do a (do while current-previous(or a few previous moving average) epsilon), and build a loop, but that has to test a lot of conditionals ready for a branch(deep cpu pipeline predictors don't like branches) as opposed to just lining up 1 or 2 iterations in the code that makes it fly through the pipeline. True, when you do a lot of iterations, the speed gain from getting the first iteration so accurate doesn't count that much. It would save say 2 cycles out of a 20 cycle step, as opposed to making all the difference when there is only one iteration carried out, without a loop, as it is in the sample code given.
I can shed some more light on the 1/sqrt(x) topic. I coded a vectorized version of this function for the Ardent Titan "graphics supercomputer" in 1987. I did not originate the idea but got it from Cleve Moler, original founder of MathWorks and author of MatLab. I did some work with Gary Tarolli when he was at Stellar, a competitor of Ardent which eventually merged with it to form Stardent computer. He may have picked it up there. The function was widely used in the Ardent 3D code base. I also had a 1/cuberoot(x) version which worked in a similar but slightly more complex manner. It was not used in 3D graphics but did find application in some mechanical engineering analysis programs. - Greg Walsh
The point is that pointer-based type punning isn't really allowed by the language definition. The optimizer can take advantage of that to emit simpler code that, in this case, produces the wrong answer. Using a union short-circuits that kind of optimization.
This isn't to say any particular Gcc produces wrong answers with the code as written, but if you complained about one that did, they would just laugh at you and (if they liked you enough) tell you to rewrite it like the above instead.
Funny, this was on an extremely hard interview I had recently.
I couldn't answer the question...
Ben
They already spotted you about four letters more in that function name than most C programmers would be comfortable with. Count yourself lucky -- most would have made it something like isr(x). Of course, I'm a crankly Java programmer and it would have been public static double InverseSquareRoot(double number) had I had my druthers. And the code would have been "return 1.0 / Math.sqrt(x);" because I am a Java programmer and if I wanted speed I would be using a language which thinks 7 characters is too long for an identifier.
Help poke pirates in the eyepatch, arr.
That last part didn't render right.
(1 = mantissa 2; the non-fractional portion is also implied, so the bits of the mantissa are actually just the fractional portion)
Should be
(1 <= mantissa < 2; the non-fractional portion is also implied, so the bits of the mantissa are actually just the fractional portion)
:(){