Slashdot Mirror


Origin of Quake3's Fast InvSqrt()

geo writes "Beyond3D.com's Ryszard Sommefeldt dons his seersucker hunting jacket and meerschaum pipe to take on his secret identity as graphics code sleuth extraordinaire. In today's thrilling installment, the origins of one of the more famous snippets of graphics code in recent years is under the microscope — Quake3's Fast InvSqrt(), which has been known to cause strong geeks to go wobbly in the knees while contemplating its simple beauty and power." From the article: ""

95 of 402 comments (clear)

  1. A famous quote by ArchieBunker · · Score: 5, Funny

    "English motherfucker, do you speak it?" Anyone care to explain what that function does?

    --
    Only the State obtains its revenue by coercion. - Murray Rothbard
    1. Re:A famous quote by GigsVT · · Score: 5, Informative

      1/sqrt(x)

      --
      I've had enough abrasive sigs. Kittens are cute and fuzzy.
    2. Re:A famous quote by Trigun · · Score: 5, Funny

      But faster!

    3. Re:A famous quote by Raul654 · · Score: 3, Informative

      Given a number, that function calculates the inverse of the square root - which, according to TFA, is very common in graphics applications.

      --


      To make laws that man cannot, and will not obey, serves to bring all law into contempt.
      --E.C. Stanton
    4. Re:A famous quote by Red+Flayer · · Score: 4, Informative
      It's just John Carmack's way of telling everyone he knows mathematics and please worship the ground he walks on.
      RTFA.

      Carmack quite graciously denied the code was his and helped direct the author closer to the true source.
      --
      "Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
    5. Re:A famous quote by abradsn · · Score: 3, Informative

      used whenever there is light reflecting off an object.

    6. Re:A famous quote by kill-1 · · Score: 4, Informative

      And less accurate!!!

    7. Re:A famous quote by 91degrees · · Score: 5, Informative

      Okay - At times you want a normalised vector. A lot of the time you will have vectors of arbitrary length For example, the light is at the origin, the point is at (12, 4, 3). So the vector from the point to the light is (-13, -4, -3). The length of this vector can be calculated easily using pythagoras's theorem (sqrt (12^2+4^2+3^2). It's 13 units in length. We want a unit vector (i.e. a vector 1 unit in length) So we divide by the length to get (12/13, 4/13, 3/13).

      This is great for a 3D rendering application, but in a game speed is critical. This pair of calculations involves a square root and a divide. Both of thse are at least an order of magnitude slower than multiplications and additions.

      So what this function does is provide a value you can multiply each component by to get a unit vector.

      Well, there's the what and why parts. As for the , I have no idea. I think it uses magic.

    8. Re:A famous quote by mogrify · · Score: 5, Funny

      (x^(1/2))^2 != 1

      Interesting smiley... is that a dead man with a fraction in his mouth and a prominent Adam's Apple, wearing a bow tie and a dress and standing on a toy race car?

      What's your point, man?

      --
      perl -e 'foreach(values %SIG){$_="IGNORE";}while(){}'
    9. Re:A famous quote by Anonymous Coward · · Score: 3, Informative
      Um,
      (x^(1/2))^2 != 1
      is pretty obvious and I doubt anyone would need to be reminded of it.. Did you mean
      (x^(1/2))^2 != x for x < 0
      ?
    10. Re:A famous quote by Fred_A · · Score: 2, Funny

      I think this is a hoax. I've been playing Quake 3 extensively and I haven't once seen anything reflected off any object. What's more it was all very slow. Granted this might have been because of my Tseng ET4000 (which I'm planning on upgrading eventually). Although I did set it to CGA to make things snappier.

      Anyway I'm not convinced.

      --

      May contain traces of nut.
      Made from the freshest electrons.
    11. Re:A famous quote by lagfest · · Score: 3, Informative

      The accuracy only depends on how many times you repeat the last line. So no, not really.

    12. Re:A famous quote by the_greywolf · · Score: 2, Informative

      The inverse square root is used to calculate a normal vector. Normal vectors, are, in turn, used in lighting calculations (to determine, for example, the light intensity for a given vertex), among other things.

      AMD's 3D Now! instruction set includes a set of instructions (TFSQRTS/TFSQRTD) for approximating the inverse square root.

      --
      grey wolf
      LET FORTRAN DIE!
    13. Re:A famous quote by mike260 · · Score: 5, Informative

      Nope, the reason you want a fast 1/sqrt(x) in graphics is so you can normalise 3D vectors quickly (something that most 3D apps have to do a LOT).

      Slow:
          const float length = sqrt( v.x*v.x + v.y*v.y + v.z*v.z );
          v.x /= length;
          v.y /= length;
          v.z /= length;

      Fast:
          const float recip_length = InvSqrt( v.x*v.x + v.y*v.y + v.z*v.z );
          v.x *= recip_length;
          v.y *= recip_length;
          v.z *= recip_length;

      The 2nd version has no divides, and no call to sqrt, which makes it *loads* faster.

    14. Re:A famous quote by arth1 · · Score: 2, Informative
      Thanks. I kept thinking that the inverse of taking the square root would be to square something but obviously that isn't what that code did.

      You are correct, the inverse to x^n is x^(1/n), from which it follows that the inverse to a square root is the inverse inverse square, or just square.
      What TFA calls "inverse square root" is really "inverse of the square root", a small but significant difference.
      Not that it matters much, cause those who use the function tend to know what is meant.

      Regards,
      --
      *Art
    15. Re:A famous quote by Overly+Critical+Guy · · Score: 2, Funny

      Are you sure you're not actually running Doom 3? Not only does no light reflect off of anything, but no light is emitted in the first place.

      --
      "Sufferin' succotash."
    16. Re:A famous quote by jmv · · Score: 4, Insightful

      The real genius here is Isaac Newton. 'course, that's not news to anyone.

      You mean that Newton thought about taking advantage of the IEEE float format to initialize the algorithm using "i = 0x5f3759df - (i>>1);"? Wow, now that's a clever guy!

    17. Re:A famous quote by rblancarte · · Score: 2, Insightful

      I think the point of the parent post was to point out that the summary should have enough information that you can decide if you want to RTFA or not. You should not have to RTFA to decide if you want to RTFA article or not.

      As far as the name of the function - true, but you never know, it could be something different. A small explaination would have helped.

      RonB

      --
      It is human nature to take shortcuts in thinking.
    18. Re:A famous quote by ebyrob · · Score: 3, Funny

      If this doesn't make your head swim:

      int i = *(int*)&x;
      i = 0x5f3759df - (i >> 1);


      Then I'm afraid the whole article is going to be lost on you...

      We've got a floating point being operated on as an integer.
      We've got a mysterious constant.
      We've got a two's complement sign-flip combined with a bit-shift.

      The only thing missing from this party is hookers and beer.

    19. Re:A famous quote by Bullet-Dodger · · Score: 2, Funny

      But it comes with a free Frogurt!

    20. Re:A famous quote by saforrest · · Score: 3, Insightful

      The result of taking the square root of a number and then squaring it will be the absolute value of the original number.

      Really? What if the number is negative?

      I think you mean to say "the original number", not "the absolute value of the original number". When given a negative argument, this composition will either return an error (because there is no support for complex numbers) or a negative result equal to the input.

    21. Re:A famous quote by Tim+Browse · · Score: 2, Insightful

      Carmack left school after two semesters to work as a freelance programmer. I don't think he made it to Numerical Methods or the equivalent. That makes this code all that much more impressive.

      RTFA much?

    22. Re:A famous quote by Directrix1 · · Score: 2, Funny

      The Frogurt is also cursed.

      --
      Occam's razor is the blind faith in the natural selection of least resistance and in universal oversimplification. -- EF
    23. Re:A famous quote by KZigurs · · Score: 3, Interesting

      Not so much _faster_ than _guaranteed_ execution time for any given precision required (and if you have 480 pixels on axis on the screen you can play with that for quite a lot).

      This is why console games (ps, ps2, xbx, 360, bla bla bla huj) actually stays competitive to PC (more powerful, of course) - since developers has a good idea about actual CPU/GPU available at any given moment, they can safely close to the border way more confidently than on pc. And on PC they usually resort to generic 'will give you the best that I can' routines anyway.

      (at least that what I can say after observing xbox360 devel team for 6 months. scary stuff, they do, scary stuff.)

    24. Re:A famous quote by johnw · · Score: 2, Insightful

      And badly named! It's not the inverse square root (which would simply be the square); it's the reciprocal of the square root.

    25. Re:A famous quote by Mr+Chund+Man · · Score: 2, Insightful

      I don't think that's quite what he meant...

    26. Re:A famous quote by Puff+Daddy · · Score: 2, Informative

      You're right, I was thinking of the operations backwards. The absolute value function is essentially equivalent to squaring a number and then taking the square root of the result, not the other way around.

  2. And so why do we care? by Codename46 · · Score: 2, Insightful

    From the words of John Carmack himself, if you discover and implement an algorithm by yourself, even though it may have already been discovered already, you deserve credit for finding it on your own.

    [Insert rant about software patents]

    1. Re:And so why do we care? by From+A+Far+Away+Land · · Score: 4, Funny

      " Fast InvSqrt(), which has been known to cause strong geeks to go wobbly in the knees "

      I was a little worried when Slashdot posted the Britney Spears beaver pictures, but they now have their credibility back as the home of "News for Nerds".

    2. Re:And so why do we care? by Anonymous Coward · · Score: 5, Funny

      Britney Spears beaver pictures Link please.

    3. Re:And so why do we care? by gstoddart · · Score: 2, Funny

      I was a little worried when Slashdot posted the Britney Spears beaver pictures

      Hmmmm ... why can't we dupe stories like that more often? :-P
      --
      Lost at C:>. Found at C.
    4. Re:And so why do we care? by Tx · · Score: 5, Informative

      Crap, lets try that again, with the link this time ;).Here you go.

      --
      Oh no... it's the future.
    5. Re:And so why do we care? by Pollardito · · Score: 4, Funny
      The pic links are all over - check at foobies.com or similar.

      However, if it really did get posted to /., I'd like a link so I can read the comments...
      i imagine it'd be all sorts of variations of "what's that?", "it might be a tribble", and "it's like a wookie, but smaller"
    6. Re:And so why do we care? by JabberWokky · · Score: 4, Funny
      Clearly you haven't seen the pictures. It looks much more like a Deltan.

      --
      Evan

      --
      "$30 for the One True Ring. $10 each additional ring!" -- JRR "Bob" Tolkien
    7. Re:And so why do we care? by sootman · · Score: 5, Interesting

      Holy crap--you forgot the link but the mods followed your instructions anyway.

      Mods: I want +5, Funny for this. No, no, wait: +5, Informative. No, wait, anyone can google something and be "informative." I want a +5, Interesting.

      Thanks.

      --
      Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
  3. I know who wrote it by circletimessquare · · Score: 4, Funny

    I have a truly marvelous proof of who wrote this code which this comment box is too narrow to contain.

    --
    intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
  4. Obviously SCO's intellectual property! by gukin · · Score: 4, Funny

    Anyone who has ever played a computer game should pay up.

  5. This paper seems to have the info by nels_tomlinson · · Score: 5, Informative

    The linked site seems to be down (gee, you think it might be slashdotted?), but this paper seems to be covering the same topic.

    1. Re:This paper seems to have the info by SkankinMonkey · · Score: 3, Informative

      There's also the mirrordot link

    2. Re:This paper seems to have the info by Toasty16 · · Score: 4, Funny

      Ohhhhhh, now I get it... [Looks furtively around to see if anyone knows he doesn't get it]

  6. What's with use of Pointers? by MankyD · · Score: 2, Interesting
    Why does the coder use
    int i = *(int*)
    instead of just
    int i = (int)x;


    Can someone enlighten me?
    --
    -dave
    http://millionnumbers.com/ - own the number of your dreams
    1. Re:What's with use of Pointers? by MeanMF · · Score: 2, Insightful

      int i = (int)x would convert the floating point number to its rounded integer equivalent.

      int i = *(int*)x assigns i the 32-bit value that represents how x is stored in memory as a float.

    2. Re:What's with use of Pointers? by ewhac · · Score: 5, Informative
      C "understands" ints and floats. If you do the simple cast:

      int i = (int)x;

      Then C will simply convert the float value into an integer value (throwing away fractional part). But this isn't what we want. We want to operate on the bits of an IEEE floating point value directly, and integers are the best way to do that.

      So first, we lie to the compiler by telling it we have a pointer to an int:

      (int *) &f

      And then we deference the pointer to get it into an operable int:

      i = *(int *) &f

      Note what's important here is to keep the compiler from modifying any part of the original 32-bit value.

      Schwab

    3. Re:What's with use of Pointers? by eis271828 · · Score: 5, Informative

      (int) x would convert the floating point value to an integer (truncation, basically).
      *(int*) &x treats the bits as an integer, with no behind the scenes conversion to an actual int value.

    4. Re:What's with use of Pointers? by radarsat1 · · Score: 4, Informative

      Because in C's own weird way, that's the only way of refering to a float as an int without changing the bits.

      If you do this:
      int i = (int)3.0f;

      You get i=3, like what you'd get from the floor() function.

      If you do this:
      float f = 3.0f;
      int i = *(int*)


      Then i contains a bit-for-bit copy of the IEEE floating-point representation of 3.0.

      It's because C knows how to cast a float to an int by applying the floor function. However, if you do it the second way, you aren't casting a float to an int, you are casting a pointer-to-float to a pointer-to-int and then dereferencing it.

      By the way, I just wanted to say... this is one of the most interesting things I've read on Slashdot in a while. Wow. That function is just amazing. I only wish I understood how it worked. I know nothing about what a "Newton-Raphson iteration" is.

    5. Re:What's with use of Pointers? by ToxikFetus · · Score: 5, Informative
      Why does the coder use

      (1) int i = *(int*)

      instead of just

      (2) int i = (int)x;
      (some of my points added for emphasis)

      I'll take a swing at this one. It's because the author doesn't want the value of x, but the integer representation of the value at x's memory address.

      If x is 3.14159, (2) will result in i==3, whereas (1) will result in whatever the 4-byte IEEE-754 representation of 3.14159 is (0x40490FD0, if Google is correct). By using (1), the author is able to use integer bitwise opeartions (>>) to perform "free" floating point operations. When i is sent back into floating point form via:

      x = *(float*)

      x now contains the value of the integer operation:

      i = 0x5f3759df - (i >> 1);

      which was presumably faster than an identical floating point operation. It's a nifty little solution, especially with regard to the selection of the magic number.
    6. Re:What's with use of Pointers? by Geoffreyerffoeg · · Score: 2, Informative

      Note what's important here is to keep the compiler from modifying any part of the original 32-bit value.

      Moreover, when compiled, the optimizer does the right thing - since access to the variable is actually a memory dereference (or a register access, but we'll ignore it for now), *(int*)& means "tell the compiler this is an int, but don't do anything in the code", whereas (int) means "create a new temporary that's the int part of the variable".

    7. Re:What's with use of Pointers? by dsci · · Score: 4, Informative

      Newton-Raphson is a general algorithm for finding root of an equation f(x)=0.

      You start with some INITIAL GUESS (the real beauty of this algorithm) X(0), then apply:

      X(n+1) = X(n) - f(X(n)) / f'(X(n))

      where
      X(n+1) is the NEXT guess after the value you 'know',
      X(n) is that most recent value you know,
      f(X(n)) is the function evaluated at X(n) and
      f'(X(n)) is the first derivative of f(x) evaluated at X(n).

      It's not foolproof and a BOTH whether it converges at al AND how FAST it converges depends on the initial guess, X(0)

      The "Secant Method" is an improvement that makes it a little 'smarter,' at the expense of more computation (this is often a positive trade-off on numerical modeling codes, since the 'smarter' algorithm does tend to converge faster). There are other improvements as well, such as the Los Alamos Linear Feedback Solver (a slightly modified secant method that converges about 10-17% faster, at least for some types of problems) that I use in my own codes.

      Obligatory wikipediea followup: Newton's Method

      --
      Computational Chemistry products and services.
    8. Re:What's with use of Pointers? by The+boojum · · Score: 2, Informative

      [Forgive the LaTeX notation -- I'm not sure how else to best express the math without sub and sup tags.]

      Newton-Raphson is a root finding method. Given a starting point it finds successively more accurate approximations to the root. The formula for it looks like: x_{n+1} = x_n - f(x_n)/f'(x_n), where f(x) is the function to find the root of, f'(x_n) is the derivative of that function and x_n are the successive approximations. Essentially, given a guess it looks at the slope of the function at that point follows a tangent line until it crosses the zero line. That point becomes the next guess.

      Finding the reciprocal square root of a is equivalent to solving x^{-2} - a = 0 which means that you use f(x) = x^{-2} - a. The derivative of that is f'(x) = -2x^{-3}. Putting those into the Newton-Raphson formula gives you x_{n+1} = x_n - (x_n^{-2} - a) / -2x_n^{-3}, which simplifies down to x_{n+1} = x_n(3 - a*x_n^2)/2. This is equivalent to the last line before the return in that InvSqrt() function.

      Convergence of Newton-Raphson is quadratic which means that each time you repeat that line with a previous guess you double the number of digits of accuracy (up to the limit of the hardware of course).

    9. Re:What's with use of Pointers? by mark-t · · Score: 2, Interesting

      If the compiler is willing to hold the union in a register, then yes, using a union should almost certainly be faster... and would potentially save two 32-bit memory copy instructions. In fact, the biggest problem with the approach that the article mentions is that it uses the address-of operator, which generally implies to the compiler that the argument cannot be stored in a register, which would almost certainly prevent the compiler from using assorted possible optimizations.

    10. Re:What's with use of Pointers? by greengearbox · · Score: 2, Informative

      That method can't be used, but see Float.floatToRawIntBits.

    11. Re:What's with use of Pointers? by dsci · · Score: 2, Informative

      How on earth does subtracting the result from a magic value then give you the inverse square root?

      Subtracting the right shifted value from the magic value does NOT give the 'inverse square root.' This line of code only gives the initial guess for the Newton-Raphson algorithm.

      While the initial guess is a huge part of the real beauty of the algorithm (ie, why it can converge to a sufficiently accurate value after ONLY ONE ITERATION), the 'real work' is done in the

      x = x*( 1.5f - xhalf*x*x )

      line. That's that application of one iteration of Newton-Raphson (rewritten a bit from what I and others posted earlier, but for this problem algebraically the same). To get more accuracy in the calculation, apply this line repeatedly.

      The explanation of the why the magic value - (right shifted int represenation of the float) is a good initial guess is given in the Paper by Chris Lomont linked to in article. Specifically, look on Section 4 beginning on Page 3 of Lomont's article; that's where the fun starts.

      In a nutshell, from Lomont, the initial guess is computed "by multiplying the exponent by -1/2, and then picking bits to minimize error." I have not worked through all the details on why this is a good initial guess, but will reemphasize that getting a good result from Newton-Raphson DEPENDS on the initial guess. So, it is not surprising that someone hunted for a general algorithm to find a good initial guess.

      --
      Computational Chemistry products and services.
    12. Re:What's with use of Pointers? by arniebuteft · · Score: 5, Informative
      The article at http://www.math.purdue.edu/~clomont/Math/Papers/20 03/InvSqrt.pdf really explains it better, but the point is to trick the code into performing integer operations on a float. You start the function with a float, which is 32 bits, arranged in a very specific sequence: first bit is sign (0 for positive, 1 for neg), next 8 bits are exponent (offset by 127 to allow positive and negative exponents), and the last 23 bits are the mantissa (normalized to assume the decimal point has a binary 1 to its left). So, the simple good old number 21 in float is the sequence 0 10000011 01010000000000000000000, for the sign, exponent, and mantissa (omit the spaces of course). That same number 21, stored as a 32-bit integer is 00000000 00000000 00000000 00010101 (again, omit the spaces).

      The trick of this function is to take the 32 bits of data that are really a float, but process it as if it's an integer. So you take that cumbersome number 21 as a float, then BAM! presto, turn it directly to an integer not through type conversion, but by simply treating those same 32 bits as if they were representing an integer all along.

      Let's use the number 21 as an example in the function call.

      The binary representation of 21 as a float is 01000001 10101000 00000000 00000000 (broken into 8-bit words for clarity). The function then goes to create a new integer i, whose value is also 01000001 10101000 00000000 00000000 (which happens to be 1101529088 in decimal). The magical line of the code, i = 0x5f3759df - (i>>1), takes that integer i, shifts its bits one to the right (turning our 01000001 10101000 00000000 00000000 into 00100000 11010100 00000000 00000000, or 550764544 in decimal), then subtracts it (still doing integer math here) from 0x5f3759df (which is 01011111 00110111 01011001 11011111 or 1597463007 in decimal), and winds up with 00111110 01100011 01011001 11011111 (or 1046698463 in decimal).

      Now, for its next trick, it takes that wonky integer 1046698463, and turns it back into a floating point number, by the same trick used above, i.e. simply by looking at those same 32 bits, and pretending they're a float, not an int. The binary representation of 1046698463, 00111110 01100011 01011001 11011111, is the same as 0.22202251851558685 in float.

      From here on out, it's all floating math. Apply the Newton-Rhapson method (thats the next line), we get x = 0.22202251851558685 * (1.5 - ( (21*.5) * 0.22202251851558685^2 )) = 0.218117811. We return this value at the closing of the function. As it turns out, the inverse square root of 21 is 0.21821789... (thanks Google calc). So, I have no idea WHY the Float to Int to Float trick works, but it works very well.

      Whew!

    13. Re:What's with use of Pointers? by HTH+NE1 · · Score: 2, Interesting

      Because in C's own weird way, that's the only way of referring to a float as an int without changing the bits.

      <yoda>No. There is another.</yoda>

      You can also lie to va_arg() about the type of the argument to achieve the same thing, but it's not as efficient.

      --
      Oh, say does that Star-Spangled Banner entwine / The myrtle of Venus with Bacchus's vine?
  7. Old and busted: Duff's device by Stavr0 · · Score: 3, Funny
    New hotness: Fast InvSqrt()

    Naah, just kidding. They both deserve a spot in the Clever Hacks Hall of Fame

  8. Hmm... by porkmusket · · Score: 2, Funny

    Suddenly I'm very scared that Ballmer will want to InvSqrt me some pictures of his kids.

    1. Re:Hmm... by gstoddart · · Score: 2, Funny
      Suddenly I'm very scared that Ballmer will want to InvSqrt me some pictures of his kids.

      Ummmm .... you mean, squirt his kids back to him? :-P
      --
      Lost at C:>. Found at C.
  9. Poor function name by bloobloo · · Score: 4, Insightful

    The inverse of the square root is the square. The reciprocal of the square root is what is being calculated. In the case of multiplication I guess it's a reasonable name but it's pretty poor form in my view.

    1. Re:Poor function name by truedfx · · Score: 3, Funny
      x^(-1/2) != x^(2).
      Er... no. It's entirely possible for x^(-1/2) and x^(2) to be equal. This happens when x=1. (I know what you meant, but what you posted is very different.)
    2. Re:Poor function name by rg3 · · Score: 2, Insightful

      The word "inverse" has several uses in mathematics. One of them is the one you mentioned, which is to say "inverse function". In that sense, working with positive numbers, the inverse function of the square root is the square, yes. However, there is also the concept of "inverse element" for sets and operations. It's a very generic concept that relates to an operation and the identity element for that operation. For the sum, the identify element is 0, given that x + 0 = x. The inverse element of another element regarding a given operation (and it's usually written x^(-1) for the element x), is one such that x op. x^(-1) = i, being i the identity element.

      Again, going back to the addition operation, the identity element is 0, x + x^(-1) = 0. In this case, x^(-1) is -x. For the multiplication operation, the identify element is 1 and the inverse element x^(-1) of any x is 1/x. In my math classes in high school, the teacher always called -x "opposite x" and 1/x "inverse x". So, basically, if you read "square root" not as a function, but as a number, the "inverse square root" is 1/sqrt(x), and the "inverse function for the square root function", x^2.

      All of this to say what? That it's not that easy and it can be discussed.

  10. It was fast by KalvinB · · Score: 3, Informative

    http://www.icarusindie.com/DoItYourSelf/rtsr/index .php?section=ffi&page=sqrt

    That page compares the time it takes to calculate the sqrt various ways including Carmacks. Short version is that modern processors are significantly faster since it can be done in hardware. It may still be useful in cases where the processor doesn't have the sqrt function available.

    His version took 428 cycles compared to 107 cycles doing it in hardware on the same system.

    1. Re:It was fast by systemeng · · Score: 5, Insightful

      First off, this function calculates 1.0/sqrt(x), not sqrt(x). InvSqrt is a particularily nasty function because both the divide and the square root stall the floating point pipeline on IA32 processors. As a result, instead of shooting out one result per cycle that the pipelining normally allows, the processor will stall for 32 cycles for the divide after it has stalled for the 43 cycles for the square root(P4). This is a big hit to realtime performance and it also prevents 76 multiplies from getting done while the pipeline is stalled. Secondly, IA32 processors are super scalar and have multiple integer units which can do portions of this calculation in parallel. This algorithm is brilliant because it uses the integer units for a portion of the most difficult part of the calculation and the remaining floating point multiplies only take about 6 clock cycles on the FPU. The difference in clock cycles you are counting is likely because the routine as written will be implemented as a function call and the stack push overhead will eat you alive. If this is implemented inline, it's about 6 times as good as simply calling the processor's assembly instructions for root and divide in sequence with the penalty that it isn't as accurate. It is virtually impossible to beat sqrt on IA-32 but 1.0/sqrt can be computed faster with newton raphson iteration in one fell swoop than by coposition of the operations. I've worked several years implementing similar optimizations in the reference implementation of ISO/IEC 18026, a standard for digital map conversion. Most of the routines that had optimizations like this added to them saw at least 30% speed improvements. This is a bit of a soft number because many things were reordered to make the pipeline fill better but in general, a complicated function especially of trig fucntions that can be computed in one iteration of well designed newton-raphson will be much faster than the coposition of the CPU's implementation of the component functions. In short, don't write off careful numerics they can provide great sped improvements, just don't use them in code that people will want to understand later if you don't document exactly what you did and why.

    2. Re:It was fast by adam31 · · Score: 4, Informative
      Okay, let's try x86...

      rsqrtss xmm1, xmm0
      about 5 cycles. And it can pipeline.

      Not a fan of x86? Maybe altivec...
      vrsqrtefp V2, V1
      depends, but 12 cycles probably and pipelined.

      On PS3's SPU it's rsqrte (6 cycles), on 3dNow it's pfrsqrt (8 cycles) both pipelined. Even PS2 had rsqrt (13 cycles). There's just no reason for software reciprocal square root. It's a cool trick, but it's not even useful anymore.

    3. Re:It was fast by Hamilton+Lovecraft · · Score: 2, Informative

      RSQRTSS is an approximation using a lookup table, accurate to about 12 bits. It neatly replaces the integer trick in the q3 version, but you may still want a NR step to tighten it up afterwards. Also, a few of us are still worried about non-SSE platforms.

      --
      step 3: god dammit, it doesn't work
    4. Re:It was fast by systemeng · · Score: 3, Informative

      If you have need of a low precision result and you have a vector processor, absolutely. You're right and thanks for the hard numbers. My work is in scientific computing converting digital maps where my final result must be acctuate to better than 1.0e-13 and intermediates are multiplied by numbers on the order of the square of the earth's radius in meters. I'm stuck with the FPU where things aren't so rosy. We usually combine the newton-raphson iteration of 1/sqrt(f(x)) with the newton raphson solution for f(x) where f(x) is something that can only be solved numerically. Once you're stuck with newton-raphson solution, you might as well get the most out of it by using the method to solve 1/(sqrt(f(x)) and get the inverse and the root for free.

  11. Re: A better question: by Anonymous Coward · · Score: 2, Interesting

    Here's a more interesting question: "How does this function compute 1/x^(1/2)?"
    I'm asking, because I have no idea how it works. ;)

  12. Mirrordot the airticle cut-and-paste by zoftie · · Score: 5, Informative

    Introduction
    Note!

    This article is a republishing of something I had up on my personal website a year or so ago before I joined Beyond3D, which is itself the culmination of an investigation started in April 2004. So if timeframes appear a little wonky, it's entirely on purpose! One for the geeks, enjoy.
    Origin of Quake3's Fast InvSqrt()

    To most folks the following bit of C code, found in a few places in the recently released Quake3 source code, won't mean much. To the Beyond3D crowd it might ring a bell or two. It might even make some sense.

    InvSqrt()

    Finding the inverse square root of a number has many applications in 3D graphics, not least of all the normalisation of 3D vectors. Without something like the nrm instruction in a modern fragment processor where you can get normalisation of an fp16 3-channel vector for free on certain NVIDIA hardware if you're (or the compiler is!) careful, or if you need to do it outside of a shader program for whatever reason, inverse square root is your friend. Most of you will know that you can calculate a square root using Newton-Raphson iteration and essentially that's what the code above does, but with a twist.
    How the code works

    The magic of the code, even if you can't follow it, stands out as the i = 0x5f3759df - (i>>1); line. Simplified, Newton-Raphson is an approximation that starts off with a guess and refines it with iteration. Taking advantage of the nature of 32-bit x86 processors, i, an integer, is initially set to the value of the floating point number you want to take the inverse square of, using an integer cast. i is then set to 0x5f3759df, minus itself shifted one bit to the right. The right shift drops the least significant bit of i, essentially halving it.

    Using the integer cast of the seeded value, i is reused and the initial guess for Newton is calculated using the magic seed value minus a free divide by 2 courtesy of the CPU.

    But why that constant to start the guessing game? Chris Lomont wrote a paper analysing it while at Purdue in 2003. He'd seen the code on the gamedev.net forums and that's probably also where DemoCoder saw it before commenting in the first NV40 Doom3 thread on B3D. Chris's analysis for his paper explains it for those interested in the base math behind the implementation. Suffice to say the constant used to start the Newton iteration is a very clever one. The paper's summary wonders who wrote it and whether they got there by guessing or derivation.
    So who did write it? John Carmack?

    While discussing NV40's render path in the Doom3 engine as mentioned previously, the code was brought up and attributed to John Carmack; and he's the obvious choice since it appears in the source for one of his engines. Michael Abrash was mooted as a possible author too. Michael stands up here as x86 assembly optimiser extraordinaire, author of the legendary Zen of Assembly Language and Zen of Graphics Programming tomes, and employee of id during Quake's development where he worked alongside Carmack on optimising Quake's software renderer for the CPUs around at the time.

    Asking John whether it was him or Michael returned a "not quite".

    -----Original Message-----
    From: John Carmack
    Sent: 26 April 2004 19:51
    Subject: Re: Origin of fast approximated inverse square root

    At 06:38 PM 4/26/2004 +0100, you wrote:

    >Hi John,
    >
    >There's a discussion on Beyond3D.com's forums about who the author of
    >the following is:
    >
    >float InvSqrt (float x){
    > float xhalf = 0.5f*x;
    > int i = *(int*)
    > i = 0x5f3759df - (i>>1);
    > x = *(float*)
    > x = x*(1.5f - xhalf*x*x);
    > return x;
    >}
    >
    >Is that something we can attribute to you? Analysis shows it to be
    >extremely clever in its method and supposedly from the Q3 source.
    >Most people say it's your work, a few say it's Michael Abrash's. Do
    >you know who's responsible, possibly with a history of sorts?

    Not me,

  13. It Was Obviously... by eno2001 · · Score: 3, Funny

    ...the work of John Romero. Apparently he was going to call it the MakeYouMyBitch() function. ;P

    --
    -"...bad old ideas look confusingly fresh when they are packaged as technology" - Jaron Lanier (Digital Maoism on Edge.o
    1. Re:It Was Obviously... by Chris+Burke · · Score: 4, Funny

      He calls every function that. Have you seen his code? Freaking unreadable.

      int MakeYouMyBitch7 () {
          int my_bitch = MakeYouMyBitch() * MakeYouMyBitch2();
          return MakeYouMyBitch36(my_bitch);
      }

      Just terrible.

      --

      The enemies of Democracy are
  14. Re:x * x, right? by Eudial · · Score: 2, Informative
    The inverse square root is 1/(sqrt(x)), not x^2 (which is, I admit, the first thing I thought of, wondering why anyone would be so excited about a faster way of getting it).


    Well, you're sort of right.

    f^-1(x) = x^2 is the inverse of f(x) = sqrt(x) where x > 0

    --
    GAAH! MY PRINTER IS ON FIRE!!! PUT IT OUT! PUT IT OUT!
  15. It might be damn smart.. by kan0r · · Score: 5, Insightful
    But the first thing I thought when I saw this was: "Damn, that code is a mess!"
    Seriously, try looking away from the genius who obviously wrote it.
    • There is no single comment which would make reading and understanding what happens here much easier!
    • Introduction of a magic number with no explanation whatsoever
    • Magic pointer arithmetics without demystification
    • Portability? Abuse of a single processor architecture, without warning that this would not work on non-x86
    I know it is good code. But it is simply bad code!
    1. Re:It might be damn smart.. by Anonymous Coward · · Score: 2, Insightful

      There is no single comment which would make reading and understanding what happens here much easier

      Exactly. It's very simple code, and if you can't tell how it works by looking at it you need to first brush up on your applied math and IEEE 754. Then it will be obvious.

    2. Re:It might be damn smart.. by Tyler+Durden · · Score: 2, Insightful

      Ummm... how is endianness going to affect the calculation at all? A 64 bit architecture should royally screw up the calculation, but given when the code was written a 32 bit architecture would be a perfectly reasonable assumption.

      If I wanted to be picky, I'd harp on the magic number or at least add a comment at the top saying it used Newton-Raphson iteration. As far as the bit fiddling goes, anyone who does 3D programming in C will instantly recognize what it does.

      --
      Happy people make bad consumers.
    3. Re:It might be damn smart.. by Goaway · · Score: 2, Informative

      Anybody who does 3D programming in C at that level will instantly recognize Newton-Raphson.

    4. Re:It might be damn smart.. by exp(pi*sqrt(163)) · · Score: 2, Funny

      Yeah, the computing industry is full of people who resent smart programmers who don't program by the book. That's fine. Every programmer eventually finds their own level and there's no shortage of startup businesses who need web page designers.

      --
      Doesn't it make you feel good to know that our freedoms are protected by politicans, lawyers and journalists.
    5. Re:It might be damn smart.. by Anonymous Coward · · Score: 2, Informative

      The actual code in Quake 3 is commented quite well:

      long i;
      float x2, y;
      const float threehalfs = 1.5F;

      x2 = number * 0.5F;
      y = number;
      i = * ( long * ) // evil floating point bit level hacking
      i = 0x5f3759df - ( i >> 1 ); // what the fuck?
      y = * ( float * )
      y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration

    6. Re:It might be damn smart.. by CryoPenguin · · Score: 2, Informative

      A 64 bit architecture wouldn't necessarily harm anything.
      The only non-portable assumptions made by the code are: int is 32 bits, and float is stored in IEEE-754 format.
      The first assumption is still true in 64 bit gcc, and even if it isn't you can just write int32_t instead.

  16. Error! by jrmiller84 · · Score: 3, Funny

    float InvSqrt(float x) { float xhalf = 0.5f*x; int i = *(int*) i = 0x5f3759df - (i >> 1); x = *(float*) x = x*(1.5f - xhalf*x*x); return x; } Asking John whether it was him or Michael returned a "not quite". But it's supposed to return a float!

    --
    I will forever be a student.
  17. Evil math--it approximates the inverse square root by Anonymous Coward · · Score: 2, Informative

    They're computing the inverse square root, usually written as something like x^(-2) (which is the same as 1/sqrt(x) of course) via, umm, err, magic :-)

    Seriously, they're using some evil numerical techniques that approximate the actual value of that function to withing a few percent very quickly. You could do more steps of Newton's root finding method or other complex things, but that would be slow and this is only meant to find things that are "good enough" for you to draw on someone's screen.

    This function is used in graphics programming, specifically in things like the Quake 3 source code (as well as other places). First, it calculates x/2, then it gets the bits of the float x as an integer (without modifying them), then it does the "magic" step with the magic number they calculated via evil mathematics to give the best starting point they can for Newton's method using the x >> 1 to divide the float-as-integer in two (and make it positive, since your sign bit falls into the exponent of the float! ... that's one of the crazy parts the PDF analyzes). Then it changes the int-float back into a plain float and does one step of Newton's method using the guess derived from the magic part.

    Don't mind me. I may have a degree in mathematics, and I even (somehow!) made it through at least one class on numerical analysis, but all I can tell you is that this stuff is evil :-) Read the PDF linked in the article if you want someone who still knows what they're talking about to explain it; I'm just trying my best to remember anything after all these years. I swear that analysis class gave me some Pavlovian instinct to shudder in dread whenever I see them pulling out the strange substitutions and Taylor expansions so they can calculate the error...

  18. hakmem by trb · · Score: 2, Interesting
    The article barely mentions HAKMEM, but the invsqrt hack is reminiscent of the HAKMEM programming hacks, which were published in 1972. Several of these hacks use bit fiddling with magic constants to perform tasks in straight-line code, that you would ordinarily think of doing with iteration.

    HAKMEM is classic bathroom reading for hackers. If you want to do it up old-school, print a copy from original scans, double-sided.

  19. Alternate mod type by JBHarris · · Score: 2, Funny

    I wish there was an option to mod some of these posts "+1 Whoa".

  20. Correction by DeadCatX2 · · Score: 5, Informative

    x^(-2) (which is the same as 1/sqrt(x) of course) Actually x^(-2) == 1/(x^2).

    I believe you meant to say x^(-1/2)

    Too bad the people modding you up don't have math degrees. =P
    --
    :(){ :|:& };:
    1. Re:Correction by lkeagle · · Score: 3, Insightful

      You live in America, don't you?

  21. It may in part be related to something I did ... by Ninja+Programmer · · Score: 5, Informative

    Here's an old version of one of my webpages:

            http://web.archive.org/web/19990210111728/www.geoc ities.com/SiliconValley/9498/sqroot.html

    And here's an updated version of the same page:

            http://www.azillionmonkeys.com/qed/sqroot.html

    It isn't an exact rendering of the code in question, but it explains enough for any skilled hacker to 1) understand what's going on and 2) modify the code to create the resulting code that's in the Quake 3 source. Furthermore this web page has existed since about 1997 (archive.org doesn't go back that far for some reason.)

    Now *IF* in fact the code origin comes from someone who took ideas from my site, I should point out that *I* am not the originator of the idea either (though I did write the relevant code). Bruce Holloway (who I credit on the page) was the first person to point out this technique to me at around the 1997 timeframe (prior to this, I created my own method which is similar, but not really as fast). (Vesa Karvonen informed by of the technique (through a code snippet with no explanation) at roughly the same time as well.) It was a technique well known to hard core 3D accelerator and CPU implementors, and follows an intentional design idea from the IEEE-754 specification.

    Prof. William Kahan, one of the key people who specified the IEEE-754 standard (the standard for floating point the many CPUs use, starting with Intel's 8087 coprocessor) apparently presented this idea, and is the source for where Bruce Holloway got the idea. The IEEE-754 standard came out around the 1982 time frame. Though, its very likely that these ideas originate from even earlier in computing history.

  22. Re:Bad programmer, no cookie by DeadCatX2 · · Score: 2, Interesting

    It is probably a premature optimization that will slow computations down. I imagine they ran some profiling tools and found that they were spending a substantial amount of time in calls to sqrt(), and figured that the division was also time-consuming and for an operation that is performed on so many pixels, it was worthwhile to optimize this particular routine.

    This is John Carmack we're talking about. I imagine he knows how to use a code profiler.

    The approximations it involve are likely to cause bugs. Floating point always involves approximations. If you don't like them, use a better approximation (double precision). But if you're just trying to, say, figure out which pixel to color, and you approximate the pixel to a few decimal places...I think you're good to go.

    I mean, it's not like they're running scientific simulations with this. It's Quake3.

    Furthermore, converting floating-points to integers and then doing calculations on them is a trick that has been known for years, long before Quake3. They weren't converting floats to ints. They manipulated the bitwise representation of the float. I'm not sure if that's what you meant, though.
    --
    :(){ :|:& };:
  23. Re: A better question: by Jerry+Coffin · · Score: 5, Informative
    How does this function compute 1/x^(1/2)?


    It starts by taking a guess at the right answer, and then improving the guess until it's accurate enough to use.

    The first step depends heavily on the fact that a floating point number on a computer is represented as a significand (aka mantissa) and an exponent (a power of two). For the moment, consider taking just the square root of X instead of its inverse. You could separate out the exponent part of the floating point number, divide it by two, and then put the result back together with the original significand, and have a reasonable starting point.

    From there, you could improve your guesses to get a better approximation. The simplest version of that would be like a high-low game -- you split the difference between the current guess and the previous guess, and then add or subtract that depending on whether your previous guess was high or low. Eventually, you'll get arbitrarily close to the correct answer.

    This can take quite a few iterations to get to the right answer though. To improve that, Newton-Raphson looks at the curve of the function you're working with, and projects a line tangent to the curve at the point of the current guess. Where that line crosses the origin gives you the next guess. That's probably a lot easier to understand from picture.

    In this case, we're looking for the inverse square root, which changes the curve, but not the basic idea. As a general rule, the closer your first guess, the fewer iterations you need to get some particular level of accuracy. That's the point of the:

    i = 0x5f3759df - (i>>1);

    While the originator of this constant is unknown, and some of it is rather obscure, the basic idea of most of it is fairly simple: we start by shifting the original number right a bit. This divides both the mantissa and the exponent part by two, with the possibility that IF the exponent was odd, it shifts a bit from the exponent into the mantissa. The subtraction from the magic number then does a couple of things. For one thing, if a bit from the exponent was shifted into the mantissa, it removes it. The rest of the subtraction is trickier. If memory serves, it's based on the harmonic mean of the difference between sqrt(x) and (x/2) for every possible floating point number of the size you're using.

    This is where the fact that it's 1/sqrt(x) instead of sqrt(x) means a lot: 1/sqrt(x) is a curve, but it's a fairly flat curve -- much flatter than sqrt(x). The result is that we can approximate a point on the curve fairly accurately with a line. In this case, it's really two lines, which gets it a bit closer still.

    From there, the number has had a bit of extra tweaking done -- it doesn't actually give the most accurate first guess, but its errors are often enough in the opposite direction from those you get in the Newton-Raphson iteration steps that it gives slightly more accurate final results.
    --
    The universe is a figment of its own imagination.
  24. Re:It may in part be related to something I did .. by tknd · · Score: 3, Funny

    (I'm (sorry))

    You don't (happen to (program (lisp))) do you?

  25. Clever trick! by kent.dickey · · Score: 5, Informative
    To summarize, the article is about a piece of code to approximate 1/sqrtf(f):

    float InvSqrt (float x){
      float xhalf = 0.5f*x;
      int i = *(int*)&x;
      i = 0x5f3759df - (i>>1);
      x = *(float*)&i;
      x = x*(1.5f - xhalf*x*x);
      return x;
    }
    The trick is the "i = 0x5f3759df" line. It's certainly a magic number.

    The algorithm is simple Newton-Raphson -- make a good initial guess, then iterate making the guess better. I think Newton-Raphson on 1/sqrt picks up 5-6 bits each try in the line "x = x*(1.5f - xhalf*x*x)". It can be repeated to get a more accurate result each time it's repeated.

    The problem with Newton-Raphson is making a good first guess--otherwise, you need an extra iteration or two. And that's what the magic number is doing, making a good first guess.

    So let's work out what a good first guess would look like for 1/sqrt(f), to see where this code came from.

    Floating Point numbers are stored with a mantissa and an exponent: f = mantissa * (2 ^ exponent), where exponent is 8-bits wide and the mantissa is 23-bits wide.

    Let's take an example: 1/sqrt(16) would have f = 1.0 * (2 ^ 4). We want the result 0.25 which is f = 1.0 * ( 2 ^ -2).

    So our first guess should take our exponent, negate it, and cut it in half. (Try more examples to see that this works--it's basically the definition of 1/sqrt(f)). We'll ignore the mantissa--if we can just get within a factor of 2 of the answer in one step, we're doing pretty well.

    Unfortunately, the exponent is stored in FP numbers in an offset format. In memory,

    exp_field = (actual_exp + 127) << 23
    The mantissa is in the low 23 bits, and the most-significant bit is the sign (which will be 0 if we're taking roots). For now, let's just assume we have our exponents as 8-bit values, to work out what we need to do with the +127 offset.

    We want new_actual_exp = -(actual_exp)/2. But in memory, exp = (actual_exp + 127). Or, actual_exp = exp - 127.

    Substituting gives (new_exp - 127) = -(exp - 127)/2. Simplify this to: new_exp = 127 - (exp - 127)/2 => new_exp = 3*127/2 - (exp / 2).

    Now the exponent is shifted 23 places in memory, so let's write out our code (and ignore the mantissa completely for now...):

    i = ((3*127)/2) << 23) - (i >> 1);
    rewriting as hex:

    i = 0x5f400000 - (i >> 1);
    Well, first, it's arguable whether it should be 0x5f000000 or 0x5f400000 (The "4" is actually in the mantissa). I'm guessing resolving that dilemma led to the original author discovering that choosing a particular pattern of bits in the mantissa can help make the initial guess even more accurate, leading to the 0x5f3759df constant.

    I haven't worked it out, but Chris Lomont http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf shows this first guess is accurate to about 4-5 bits of significance for all floating point values. That's a good result, considering that mucking with the exponents was just hoping to get us within 1-2 bits of significance.
  26. Similar solution for x^(-1/4) by jothaxe · · Score: 2, Interesting

    I think I have found a similar solution for the reciprocal of the FOURTH root. The code looks almost the same, but it requires a new constant and you shift the bits 2 places instead of just 1. float Inv4thrt(float x) { float x0 = x; int i = *(int*) // get bits for floating value i = 0x4F541EEF - (i>>2); // gives initial guess y0 x = *(float*) // convert bits back to float x = (x/4.0f)*(5 - x0 * x*x*x*x); // Newton step, repeating increases accuracy return x; } Note that the constant I have here is not very carefully tuned so the error may be slightly larger than with the sqrt. Pretty cool, I think.

  27. Re:Bad programmer, no cookie by Chris+Burke · · Score: 4, Insightful

    I imagine they ran some profiling tools and found that they were spending a substantial amount of time in calls to sqrt(), and figured that the division was also time-consuming and for an operation that is performed on so many pixels, it was worthwhile to optimize this particular routine.

    I've written enough 3D graphics code -- including a textured polygon rasterizer that would probably cry and try to delete itself if it saw something like Quake 3 -- to know that they didn't have to run a profiler to know that they'd be spending too much time doing 1/sqrt(x) if they didn't have a highly optimized routine for it. It's an inherent operation in so many 3D calculations it isn't funny, and while by the time Quake 3 came out hardware floating point units were pretty fast, FP divides and FP square roots were very lengthy operations that more importantly couldn't be pipelined.

    But if you're just trying to, say, figure out which pixel to color, and you approximate the pixel to a few decimal places...I think you're good to go.

    Yeah, pretty much. Back when I wrote my code (pentium days) you had an FP unit but it wasn't very good, so I used fixed point math (using integer instructions) which had sufficient precision for a 320x200 display. Getting enough performance out of the core algorithms was still hard enough that I had to take a lot of shortcuts, like instead of doing the right thing by using a divide every pixel to calculate which texel to use, I used a divide every 8 pixels and linearly interpolated in between. I'm sure that Quake (the contemporary 3D engine of the day which would also make my code cry) contained many more clever optimizations and approximations, because it wouldn't have been possible on the hardware of that day without them.

    In fact, approximating FP values for 3D code is so common that the 3DNow and SSE instruction sets contain instructions that approximate the square root and inverse square root to about half of single-precision floating point. The non-approximation instruction uses a lookup table to get an initial guess, then uses a couple iterations of Newton's Method to refine the result. The short cut instructions simply return the value in the lookup table.

    So yeah, basically the AC OP has no idea what he is talking about, and from that basis of ignorance is denigrating what is in reality a very clever and extremely useful hack.

    --

    The enemies of Democracy are
  28. Re:It may in part be related to something I did .. by waveclaw · · Score: 2, Insightful
    Though, its very likely that these ideas originate from even earlier in computing history.

    Attribution.

    That is why most, if not all, software patents are bogus. Just because you reinvent something published by a PhD working in a committee that disbanded 10 years before you knew 'C' came after 'B' in the alphabet does not make you reinvention patent worthy. The history of invsqrt() crosses disciplines of hardware and software design, spec development, graphics and math theory. With such a fascinating function having a hard to track history it is no wonder that things like James W. Hunt's 1976 diff concept can be patented in 2006 by Microsoft. (As mentioned by QuantumG on slashdot.)

    The trouble had in tracking the history of InvSqrt() is really sad. Computing is an industry that hypes how digital storage of information and perfect copies eliminates the isse of decay. While the ever expanding secondary storage (just now getting to really usefull) means oblivating the scaling issues with retaining the meta-information that denotes this very history. I guess the moral of the story is: sign your contributions. It won't take up that much space in the comments.

    "Don't have good ideas if you aren't willing to be responsible for them."

    --Alan Perlis

    And will the real Fast InvSqrt() author please stand up?
    --

    "You cannot have a General Will unless you have shared experiences. You cannot be fair to people you don't know."
  29. The f after 0.5 says 0.5 is a float not a double by Sits · · Score: 2, Informative

    Answer to 2)
    The f after 0.5 and the 1.5 simply means "use the floating point version of this number". Multiplying two numbers which have different types (e.g. multiplying an int and a float) or doing assignment between different types means that one of the numbers will have to be automatically converted to the other's type and which conversion happens depends on the types of both numbers. By default, I think type of the literal 0.5 (without any letters after it) is a double. By adding the f (assuming x is a float) the types of the operands are the same so there is no implicit type conversion thus eliminating the chance of the wrong conversion happening automatically. You care about what type of multiply is done because certain types can lead to unwanted loss of precision.

    As for 1)
    Well if you iterate by one step you don't need a loop. Presumably most looping iteration happens because your first guess wasn't accurate enough. If your first guess WAS accurate enough why would you needlessly calculate more accuracy? I'm only speculating though - I don't know if this is really the case.

  30. Re:Bad programmer, no cookie by descubes · · Score: 2, Informative

    In fact, approximating FP values for 3D code is so common that the 3DNow and SSE instruction sets contain instructions that approximate the square root and inverse square root to about half of single-precision floating point. That's also the only way to compute it on Itanium: you have to iterate using the floating-point reciprocal square root approximation instruction. There is no instruction giving you the final result.
    --
    -- Did you try Tao3D? http://tao3d.sourceforge.net
  31. Re:Lookup Table? by mikera · · Score: 2, Informative

    In modern processors, lookups from memory are far too slow. This is especially true if you are doing lots of random lookups that might go outside the cache.

    Performance-wise, you are much better doing several calculations than a single lookup.

  32. Doesn't work any more by Markus+Registrada · · Score: 2, Interesting
    Nowadays (e.g. since 3.x Gcc) that trick doesn't work any more, as written. To get reliable results you would need to say something more like

    float inv_sqrt(float x) { /* death to StudlyCaps! */
    float xhalf = 0.5f*x;
    union { float x; int i; } ix;
    ix.x = x;
    ix.i = 0x5f3759df - (ix.i >> 1);
    x = ix.x;
    x *= 1.5f - xhalf * x * x;
    return x;
    }

    The point is that pointer-based type punning isn't really allowed by the language definition. The optimizer can take advantage of that to emit simpler code that, in this case, produces the wrong answer. Using a union short-circuits that kind of optimization.

    This isn't to say any particular Gcc produces wrong answers with the code as written, but if you complained about one that did, they would just laugh at you and (if they liked you enough) tell you to rewrite it like the above instead.