Slashdot Mirror


Origin of Quake3's Fast InvSqrt()

geo writes "Beyond3D.com's Ryszard Sommefeldt dons his seersucker hunting jacket and meerschaum pipe to take on his secret identity as graphics code sleuth extraordinaire. In today's thrilling installment, the origins of one of the more famous snippets of graphics code in recent years is under the microscope — Quake3's Fast InvSqrt(), which has been known to cause strong geeks to go wobbly in the knees while contemplating its simple beauty and power." From the article: ""

402 comments

  1. A famous quote by ArchieBunker · · Score: 5, Funny

    "English motherfucker, do you speak it?" Anyone care to explain what that function does?

    --
    Only the State obtains its revenue by coercion. - Murray Rothbard
    1. Re:A famous quote by GigsVT · · Score: 5, Informative

      1/sqrt(x)

      --
      I've had enough abrasive sigs. Kittens are cute and fuzzy.
    2. Re:A famous quote by Trigun · · Score: 5, Funny

      But faster!

    3. Re:A famous quote by zzyzx · · Score: 1

      Thanks. I kept thinking that the inverse of taking the square root would be to square something but obviously that isn't what that code did.

    4. Re:A famous quote by theelectron · · Score: 1

      The name of the function explains it pretty well: it is the inverse square root function.

      Just to make sure, I did RTFA and it says it in the first paragraph or so.

    5. Re:A famous quote by Anonymous Coward · · Score: 1, Informative

      As I understand it, it finds the inverse square root, which is important to find the square root of a number.

      To find the square root of say, N, you use InvSqrt(N) and then multiply the result by N to get the square root of N. Supposedly, it's faster than using the built-in sqrt() function.

    6. Re:A famous quote by Raul654 · · Score: 3, Informative

      Given a number, that function calculates the inverse of the square root - which, according to TFA, is very common in graphics applications.

      --


      To make laws that man cannot, and will not obey, serves to bring all law into contempt.
      --E.C. Stanton
    7. Re:A famous quote by Codename46 · · Score: 1, Redundant

      Right because when he denied credit in that article it totally showed that he's a pompous asshole. Next time read the article before you spew bullshit out of your cum receptacle ok?

    8. Re:A famous quote by Red+Flayer · · Score: 4, Informative
      It's just John Carmack's way of telling everyone he knows mathematics and please worship the ground he walks on.
      RTFA.

      Carmack quite graciously denied the code was his and helped direct the author closer to the true source.
      --
      "Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
    9. Re:A famous quote by nicklott · · Score: 1

      Something to do with lighting I'd imagine (unless they've also modeled gravitational attraction)

    10. Re:A famous quote by Anonymous Coward · · Score: 0

      Important reminder: (x^(1/2))^2 != 1 (well.. usually ;p )

    11. Re:A famous quote by abradsn · · Score: 3, Informative

      used whenever there is light reflecting off an object.

    12. Re:A famous quote by Anonymous Coward · · Score: 0

      Geek motherfucker! Do you speak it?

    13. Re:A famous quote by kill-1 · · Score: 4, Informative

      And less accurate!!!

    14. Re:A famous quote by 91degrees · · Score: 5, Informative

      Okay - At times you want a normalised vector. A lot of the time you will have vectors of arbitrary length For example, the light is at the origin, the point is at (12, 4, 3). So the vector from the point to the light is (-13, -4, -3). The length of this vector can be calculated easily using pythagoras's theorem (sqrt (12^2+4^2+3^2). It's 13 units in length. We want a unit vector (i.e. a vector 1 unit in length) So we divide by the length to get (12/13, 4/13, 3/13).

      This is great for a 3D rendering application, but in a game speed is critical. This pair of calculations involves a square root and a divide. Both of thse are at least an order of magnitude slower than multiplications and additions.

      So what this function does is provide a value you can multiply each component by to get a unit vector.

      Well, there's the what and why parts. As for the , I have no idea. I think it uses magic.

    15. Re:A famous quote by mogrify · · Score: 5, Funny

      (x^(1/2))^2 != 1

      Interesting smiley... is that a dead man with a fraction in his mouth and a prominent Adam's Apple, wearing a bow tie and a dress and standing on a toy race car?

      What's your point, man?

      --
      perl -e 'foreach(values %SIG){$_="IGNORE";}while(){}'
    16. Re:A famous quote by misleb · · Score: 0

      And more greenhouse gas emissions!

      --
      "THERE IS NO JUSTICE, THERE IS ONLY ME." -Death
    17. Re:A famous quote by Lord+Ender · · Score: 1, Troll

      Don't they teach Numerical Methods in Computer Science curricula anymore? That's a required course at Ohio State--for undergrads.

      Of course, most students have a hard time with that class, and can't actually remember how to do anything after the course other than Monte-Carlo Integration. But you should at LEAST remember being exposed to Newtonian integration in your undergrad numerical methods class!

      The only thing amazing about this code is that someone paid good-enough attention in class to be able to apply what he learned to his job.

      The real genius here is Isaac Newton. 'course, that's not news to anyone.

      --
      A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
    18. Re:A famous quote by Anonymous Coward · · Score: 0

      as in "multiplicative inverse [of]"

    19. Re:A famous quote by Anonymous Coward · · Score: 3, Informative
      Um,
      (x^(1/2))^2 != 1
      is pretty obvious and I doubt anyone would need to be reminded of it.. Did you mean
      (x^(1/2))^2 != x for x < 0
      ?
    20. Re:A famous quote by oskard · · Score: 1

      Actually he's standing on a sled1

      --
      Sigs are for Terrorists.
    21. Re:A famous quote by GigsVT · · Score: 0, Offtopic

      I bet you don't buy into other "religions" like the bill of rights?

      You have rights, stand up for them. You own your computer, not them.

      --
      I've had enough abrasive sigs. Kittens are cute and fuzzy.
    22. Re:A famous quote by Fred_A · · Score: 2, Funny

      I think this is a hoax. I've been playing Quake 3 extensively and I haven't once seen anything reflected off any object. What's more it was all very slow. Granted this might have been because of my Tseng ET4000 (which I'm planning on upgrading eventually). Although I did set it to CGA to make things snappier.

      Anyway I'm not convinced.

      --

      May contain traces of nut.
      Made from the freshest electrons.
    23. Re:A famous quote by lagfest · · Score: 3, Informative

      The accuracy only depends on how many times you repeat the last line. So no, not really.

    24. Re:A famous quote by Hinhule · · Score: 1

      Actually, everything you see, light reflects off it, if it doesn't you wouldn't see it. So all the walls, weapons and monsters (except maybe the lagbeast of DOOM) are all reflections.

    25. Re:A famous quote by Anonymous Coward · · Score: 0

      No. It's the Ann Coulter emoticon.

    26. Re:A famous quote by the_greywolf · · Score: 2, Informative

      The inverse square root is used to calculate a normal vector. Normal vectors, are, in turn, used in lighting calculations (to determine, for example, the light intensity for a given vertex), among other things.

      AMD's 3D Now! instruction set includes a set of instructions (TFSQRTS/TFSQRTD) for approximating the inverse square root.

      --
      grey wolf
      LET FORTRAN DIE!
    27. Re:A famous quote by Thalagyrt · · Score: 1

      3D accelerators don't use raytracing for their rendering. What you said applies when doing rendering of a scene in 3ds max or Maya, but not for a video game. That wouldn't be nearly efficient enough to even play. Have fun with your 10 frames per hour. ;-)

      And yea, for the record, I've done 3d development.

      --
      Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo!
    28. Re:A famous quote by Anonymous Coward · · Score: 0

      Addendum: http://games.slashdot.org/comments.pl?sid=209372&c id=17072652

      This guy explained it in more detail. Right now he's the post right below me but I don't know how long that will last. Posted anonymous so as to avoid any form of karma whoring.

    29. Re:A famous quote by Gilmoure · · Score: 1
      --
      I drank what? -- Socrates
    30. Re:A famous quote by Alistar · · Score: 1

      --------------

      is pretty obvious and I doubt anyone would need to be reminded of it.. Did you mean

              (x^(1/2))^2 != x for x is less than 0

      ?

      --------------

      Actually it holds true for x is less than 0 as well

      (X^(1/2))^2

        Lets examine this

      Say x = -9

      x^(1/2) = -9^(1/2) = 3i

      3i^2 = 3^2 * i^2 = 9 * (-1) = -9

    31. Re:A famous quote by mike260 · · Score: 5, Informative

      Nope, the reason you want a fast 1/sqrt(x) in graphics is so you can normalise 3D vectors quickly (something that most 3D apps have to do a LOT).

      Slow:
          const float length = sqrt( v.x*v.x + v.y*v.y + v.z*v.z );
          v.x /= length;
          v.y /= length;
          v.z /= length;

      Fast:
          const float recip_length = InvSqrt( v.x*v.x + v.y*v.y + v.z*v.z );
          v.x *= recip_length;
          v.y *= recip_length;
          v.z *= recip_length;

      The 2nd version has no divides, and no call to sqrt, which makes it *loads* faster.

    32. Re:A famous quote by arth1 · · Score: 2, Informative
      Thanks. I kept thinking that the inverse of taking the square root would be to square something but obviously that isn't what that code did.

      You are correct, the inverse to x^n is x^(1/n), from which it follows that the inverse to a square root is the inverse inverse square, or just square.
      What TFA calls "inverse square root" is really "inverse of the square root", a small but significant difference.
      Not that it matters much, cause those who use the function tend to know what is meant.

      Regards,
      --
      *Art
    33. Re:A famous quote by Anonymous Coward · · Score: 0

      (-9)^(1/2) = +- 3i

    34. Re:A famous quote by foxtrot · · Score: 1

      Granted this might have been because of my Tseng ET4000

      Well, _duh_.

      It's an ET4000.

      It's OBVIOUSLY optimized for Enemy Territory, not Quake...

      -F

    35. Re:A famous quote by shobadobs · · Score: 0

      Um, no, it's 3i.

    36. Re:A famous quote by Overly+Critical+Guy · · Score: 2, Funny

      Are you sure you're not actually running Doom 3? Not only does no light reflect off of anything, but no light is emitted in the first place.

      --
      "Sufferin' succotash."
    37. Re:A famous quote by Anonymous Coward · · Score: 0

      > by Overly Critical Guy (663429) on 01:57 PM -- Friday December 01 2006 (#17073302)

      Living up to your name, I see!

    38. Re:A famous quote by Anonymous Coward · · Score: 0

      Mr Gates? Is that you?

    39. Re:A famous quote by Anonymous Coward · · Score: 0

      -9^(1/2) = 3i

      Bzzzzt! I'm sorry, but thanks for playing. -9^(1/2) = -3. It's all about the order of operations.

    40. Re:A famous quote by beej · · Score: 1

      That, and in a huge variety of other circumstances where a normalized vector is required.

      You can do a lot of vector math to calculate the relative positions of things, or how far a sphere is penetrating into a plane, say, but at the end of the day, you often want a vector that's one unit long. The process of normalizing a vector is dividing (this is where the "inverse" comes in) each element in turn by the length of the vector. The length of the vector is calculated with basically the pythagorean theorem (this is where the "square root" comes in.)

      Calculating a square root and doing a divide, while not horribly slow, is slower than many people like, especially when you are doing a helluva lot of 'em. (PS2 does a square root in 13 cycles or something like that.) So this guy is four floating point mults (not too bad), an integer subtract (fast), and a shift (very fast).

    41. Re:A famous quote by jmv · · Score: 4, Insightful

      The real genius here is Isaac Newton. 'course, that's not news to anyone.

      You mean that Newton thought about taking advantage of the IEEE float format to initialize the algorithm using "i = 0x5f3759df - (i>>1);"? Wow, now that's a clever guy!

    42. Re:A famous quote by Anonymous Coward · · Score: 0

      Not only did he deny it, as the other posters said, but this shows more than just an understanding of mathematics, but it is also a great understanding of programming advanced calculations in extremely efficient matters.

      One of the biggest tricks here is the i = 0x5f3759df - (i >> 1); line because it is doing a floating point (decimal number) calculation using an integer (uh, integer number... no decimal to the non-math inclined).

      Plus, he is getting the value of x^(-1/2) without any division in order to use it for multiplication, which is faster on CPUs than division. Once again, a nice feat and quite GOOD for the game engine rather than just showing off.

      Ironically, the CAPTCHA image text was "arrogant." I thought that was sort of funny, given the post I am responding too.

    43. Re:A famous quote by Anonymous Coward · · Score: 0

      cry more.

    44. Re:A famous quote by Puff+Daddy · · Score: 1

      I would guess that the OP's intention was to point out that in order for squaring a number to be the inverse of taking it's square root, the result of doing both would be the number 1. For example, 1/2 is the inverse of 2 because 1/2 * 2 = 1. The result of taking the square root of a number and then squaring it will be the absolute value of the original number.

    45. Re:A famous quote by Anonymous Coward · · Score: 0
      A man with a fraction in his mouth...

      This is what I read /. for. Thanks !

    46. Re:A famous quote by Keebler71 · · Score: 1
      Well, there's the what and why parts. As for the , I have no idea. I think it uses magic.

      Speaking of magic... I think your "how" magically vanished!

      --
      "It takes considerable knowledge just to realize the extent of your own ignorance." - Thomas Sowell
    47. Re:A famous quote by Xistic · · Score: 1
      The only thing amazing about this code is that someone paid good-enough attention in class to be able to apply what he learned to his job.

      Carmack left school after two semesters to work as a freelance programmer. I don't think he made it to Numerical Methods or the equivalent. That makes this code all that much more impressive.

    48. Re:A famous quote by Anonymous Coward · · Score: 0

      Hahaha, nice post. :-p

    49. Re:A famous quote by Pusene · · Score: 1

      The length of this vector can be calculated easily using pythagoras's theorem (sqrt (12^2+4^2+3^2). It's 13 units in length. We want a unit vector (i.e. a vector 1 unit in length) So we divide by the length to get (12/13, 4/13, 3/13).

      Acutally, using pythagoras is not efficient in the digital world. The length of a discreet vector [x,y,z]is always the absolute of the largest axis scalar. FOr the vector (-13,-4,-3) -13 is the largest scalar axis, and the absoulte of this is, of course, 13. Ergo the length of the vector is 13.

      --
      Error #13: No coffee. Operator halted. Please place boot device at bottom.
    50. Re:A famous quote by rblancarte · · Score: 2, Insightful

      I think the point of the parent post was to point out that the summary should have enough information that you can decide if you want to RTFA or not. You should not have to RTFA to decide if you want to RTFA article or not.

      As far as the name of the function - true, but you never know, it could be something different. A small explaination would have helped.

      RonB

      --
      It is human nature to take shortcuts in thinking.
    51. Re:A famous quote by ebyrob · · Score: 3, Funny

      If this doesn't make your head swim:

      int i = *(int*)&x;
      i = 0x5f3759df - (i >> 1);


      Then I'm afraid the whole article is going to be lost on you...

      We've got a floating point being operated on as an integer.
      We've got a mysterious constant.
      We've got a two's complement sign-flip combined with a bit-shift.

      The only thing missing from this party is hookers and beer.

    52. Re:A famous quote by Bullet-Dodger · · Score: 2, Funny

      But it comes with a free Frogurt!

    53. Re:A famous quote by CrazyClimber · · Score: 1

      I shed a tear when I read that. My ET6000 was my first "good" video card and I miss it like I miss my childhood pets.

    54. Re:A famous quote by Anonymous Coward · · Score: 0

      Nice homophobia. It'll be interesting to see the mods for this one. On one hand, you're defending Carmack. On the other hand, you're making a totally over-the-top anti-homosexual remark like a drunken high school frat boy.
      Hey, hey, let's not jump to conclusions. It could just as easily be misogynist.
    55. Re:A famous quote by It'sYerMam · · Score: 1

      That is quite clearly not true if you do things sensibly. Power laws say that (x^(1/2))^2 = x^((1/2)*2) = x^1 = x

      --
      im in ur .sig, writin ur memes.
    56. Re:A famous quote by abradsn · · Score: 1

      basically light reflects off of every object that you see. It would all be black or very dark otherwise.

    57. Re:A famous quote by abradsn · · Score: 1

      I know this. I was just trying to keep it simple for the majority of the slashdot readers who don't know what normalizing a vector means.

    58. Re:A famous quote by beej · · Score: 1

      No offense intended; I assumed you knew this. I was posting for people who didn't, though. :)

    59. Re:A famous quote by Anonymous Coward · · Score: 0

      Wrong. Length of that vector is 13.928...

      You're thinking that since the length of a vector is often designated ||v|| that it is the absolute value. This couldn't be farther from the truth.

    60. Re:A famous quote by saforrest · · Score: 3, Insightful

      The result of taking the square root of a number and then squaring it will be the absolute value of the original number.

      Really? What if the number is negative?

      I think you mean to say "the original number", not "the absolute value of the original number". When given a negative argument, this composition will either return an error (because there is no support for complex numbers) or a negative result equal to the input.

    61. Re:A famous quote by Tim+Browse · · Score: 2, Insightful

      Carmack left school after two semesters to work as a freelance programmer. I don't think he made it to Numerical Methods or the equivalent. That makes this code all that much more impressive.

      RTFA much?

    62. Re:A famous quote by Directrix1 · · Score: 2, Funny

      The Frogurt is also cursed.

      --
      Occam's razor is the blind faith in the natural selection of least resistance and in universal oversimplification. -- EF
    63. Re:A famous quote by msloan · · Score: 1

      Yes, mathematics is full of ambiguities like this. Inverse has two meanings - rational inverse, as it is here f(x) = 1/sqrt(x), or, as you were thinking, functional inverse - f(x) = x^2.

    64. Re:A famous quote by Xistic · · Score: 1

      Ack! Obviously not. I was remembering the comments to the Q3 source code release story here on /. where it was falsely attributed to Carmack. My bad.

    65. Re:A famous quote by Anonymous Coward · · Score: 0

      Dude, Carmack is so rich he flies around in spaceships. For real.

    66. Re:A famous quote by pablodiazgutierrez · · Score: 1

      x^(-.5)

    67. Re:A famous quote by Anonymous Coward · · Score: 0

      And AIDS!

    68. Re:A famous quote by KZigurs · · Score: 3, Interesting

      Not so much _faster_ than _guaranteed_ execution time for any given precision required (and if you have 480 pixels on axis on the screen you can play with that for quite a lot).

      This is why console games (ps, ps2, xbx, 360, bla bla bla huj) actually stays competitive to PC (more powerful, of course) - since developers has a good idea about actual CPU/GPU available at any given moment, they can safely close to the border way more confidently than on pc. And on PC they usually resort to generic 'will give you the best that I can' routines anyway.

      (at least that what I can say after observing xbox360 devel team for 6 months. scary stuff, they do, scary stuff.)

    69. Re:A famous quote by Sterling+Christensen · · Score: 1

      But you do get the absolute value if you square something, then take the square root. I think that's what he was confusing it with.

    70. Re:A famous quote by Red+Flayer · · Score: 1

      Well, overly critical guy, your moniker fits.

      I've never pirated anything in my life.

      So, the next time you toss out ridiculous accusations and/or stereotype all slashdot users, keep in mind that you have no fricking clue of what you're talking about.

      --
      "Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
    71. Re:A famous quote by zanderredux · · Score: 1

      meh. Newton was just ahead of his time. By a lot. :)

    72. Re:A famous quote by TheAlmightyQ · · Score: 1

      That's good.

      --
      I hope you're not pretending to be evil while secretly being good. That would be dishonest.
    73. Re:A famous quote by buswolley · · Score: 1

      and Iraq

      --

      A Good Troll is better than a Bad Human.

    74. Re:A famous quote by Anonymous Coward · · Score: 0

      Thanks for the best laugh in days!

      You are my hero.

    75. Re:A famous quote by Anonymous Coward · · Score: 0

      and his equation ;p doesn't make any sense at all.

    76. Re:A famous quote by gripen40k · · Score: 1

      This is why console games (ps, ps2, xbx, 360, bla bla bla huj) actually stays competitive to PC (more powerful, of course)
      Exactly. I remember playing Halo 1 on a high def TV and seeing lots and lots of small glitches and missing textures. I'm assuming that's because on the lower rez TV's you wouldn't see these things and so you don't really have to worry about them. It's a pretty clever way to go about things :)
      --
      Har?
    77. Re:A famous quote by Anonymous Coward · · Score: 0

      You can still get rid of divides in the original:

      const float recip_length = 1.0f / sqrt( v.x*v.x + v.y*v.y + v.z*v.z );
      v.x *= recip_length;
      v.y *= recip_length;
      v.z *= recip_length;

    78. Re:A famous quote by theelectron · · Score: 1

      I guess my point is: if you don't understand what the article is about from the given summary, then you are likely not going to be interested in the article.

    79. Re:A famous quote by Tablizer · · Score: 1

      used whenever there is light reflecting off an object.

      Unless it makes boobs jiggle more realisticly, nobody will care.

    80. Re:A famous quote by dextration · · Score: 1

      And though it seems to have lost me my place in the comments-list, I thank you for that. :D

      --
      http://www.mushoo.net/
    81. Re:A famous quote by elgatozorbas · · Score: 1

      IIRC from a numerical math course, 1/sqrt(x) is also used to calculate sqrt(x), by multiplying with x or taking the inverse, because its calculation is easier than that of sqrt(x) itself.

    82. Re:A famous quote by johnw · · Score: 2, Insightful

      And badly named! It's not the inverse square root (which would simply be the square); it's the reciprocal of the square root.

    83. Re:A famous quote by Anonymous Coward · · Score: 0

      The accuracy only depends on how many times you repeat the last line. So no, not really.
      Yeah, return x more times for better accuracy!

      float InvSqrt (float x){
      float xhalf = 0.5f*x;
      int i = *(int*)&x;
      i = 0x5f3759df - (i>>1);
      x = *(float*)&i;
      x = x*(1.5f - xhalf*x*x);
      return x;
      }
    84. Re:A famous quote by Anonymous Coward · · Score: 0

      The minus in -9 is not an operation, you complete and utter fucknut.

      If you're trying to sound authoritative about something could you at least, you know, KNOW SOMETHING ABOUT IT.

    85. Re:A famous quote by gordo3000 · · Score: 1

      depends, took me a while to figure out why they said inverse, and it may have been that they were working with vector math. and the inverse in vector math is exactly what its doing. since it was graphics code, I'd bet it came from there. but who knows. anyways, inverse sounds cooler.

    86. Re:A famous quote by Mr+Chund+Man · · Score: 2, Insightful

      I don't think that's quite what he meant...

    87. Re:A famous quote by hattmoward · · Score: 1

      He didn't imply raytracing, merely dynamic lighting. Raytracing does take into account the reflection path of light, but so does common point-based lighting, but it uses a simplified model of light's behavior. Given Light Source A that emits X light, and Polygon B that is Y distance away from A, the light reaching B is X*1/sqrt(Y).

    88. Re:A famous quote by Anonymous Coward · · Score: 0

      The negative preceding 9 is called a unary OPERATOR, and therefore it's action on its operand is indeed an operation.

      Kudos to you for making yourself look like an idiot.

    89. Re:A famous quote by Anonymous Coward · · Score: 0

      I was really pissed off when I read this. Unary negation is definately an operator---and one that binds weaker than the power operator at that---so one should really write (-3)^(1/2) to indicate sqrt(-3).

    90. Re:A famous quote by Dutch_Cap · · Score: 1

      Try pressing the 'f' button.

    91. Re:A famous quote by lagfest · · Score: 1, Troll

      yeah, yeah, i saw it you nagging coward. Right after i pressed submit. Obviously it's the last line before return.

    92. Re:A famous quote by GigsVT · · Score: 1

      Only for computers. -9 is an integer, it's a symbolic constant in and of itself, the - sign isn't an operator.

      --
      I've had enough abrasive sigs. Kittens are cute and fuzzy.
    93. Re:A famous quote by LunarCrisis · · Score: 1

      The inverse of a function f(x) is f^-1(x) and is defined such that f^-1(f(x)) == x, not one.

      1/x is the inverse of a number, as the division function is the inverse of the multiplication function. The thing about invsqrt is that it is ambiguous whether they mean inv(sqrt) or sqrt^-1.

      --
      Mr. Period: Nine is the one that's right by ten!
      Nine: One day I will kill him. Then, I will be Ten.
    94. Re:A famous quote by GigsVT · · Score: 1

      Yeah, but no game has enough cycles to do fully dynamic lighting. They might render a few light sources per scene, but generally there's a "base brightness" of objects that is independant of the lighting.

      A game like Thief, in the indoor scenes, might use mostly dynamic lighting, but keep in mind even if you shoot out the lights it isn't pitch black.

      --
      I've had enough abrasive sigs. Kittens are cute and fuzzy.
    95. Re:A famous quote by Anonymous Coward · · Score: 0

      -9 is an integer, it's a symbolic constant in and of itself, the - sign isn't an operator.

      Yes it is. We need to use unary operators in order to represent certain numbers. The square root of 2 is also a constant. Would you claim that the radical is not an operator?

    96. Re:A famous quote by mdwh2 · · Score: 1

      They might render a few light sources per scene, but generally there's a "base brightness" of objects that is independant of the lighting.

      Yes but it's those "few light sources" which need the square roots - even with only one source, it has to be calculated at every vertex (or pixel).

      Having said that, I'm not sure how this applies to Quake 3 - in that (a) I thought that used lightmaps not dynamic lighting (or it was an option?), and (b) didn't it use OpenGL (and hence would be done by the driver or the hardware)?

    97. Re:A famous quote by Puff+Daddy · · Score: 2, Informative

      You're right, I was thinking of the operations backwards. The absolute value function is essentially equivalent to squaring a number and then taking the square root of the result, not the other way around.

    98. Re:A famous quote by zacronos · · Score: 1

      No, GP was right. (-9)^(1/2) == +-3i. In other words, both 3i and -3i are solutions. If you don't believe me, it's easy to check:

      (3i)^2 == 3*3*i*i == -9
      (-3i)^2 == (-3)*(-3)*i*i == -9


      I'm not sure if it was the +- notation you didn't understand, or whether you just didn't know that -3i is also a solution, but either way GP *was* correct.

    99. Re:A famous quote by GigsVT · · Score: 1

      It used both, lightmaps for map objects, and dynamic lighting for non-map items.

      As for the driver/API issue, I don't know. I think id software has always used OpenGL.

      --
      I've had enough abrasive sigs. Kittens are cute and fuzzy.
    100. Re:A famous quote by Anonymous Coward · · Score: 0

      either way GP *was* correct

      No he was not.

      sqrt(-9)= 3i, and does not equal -3i becasue of the way the principal square root is defined.

      It is certainly true that (3i)^2 = -9 and (-3i)^2=9, but that only means that the equation x^2=-9 has two solution. Note that this equation is not equivalent to the equation x=sqrt(-9), which has only one solution.

    101. Re:A famous quote by WilliamSChips · · Score: 1

      Occam's razor is the blind faith in the natural selection of least resistance and in universal oversimplification. -- EF But it's still a damn good argument against those ridiculous 5-blade razors.
      --
      Please, for the good of Humanity, vote Obama.
    102. Re:A famous quote by WilliamSChips · · Score: 1

      What's "1.0f / sqrt( v.x*v.x + v.y*v.y + v.z*v.z )"?

      --
      Please, for the good of Humanity, vote Obama.
    103. Re:A famous quote by WilliamSChips · · Score: 1

      I have pirated music exactly once and it was a CD from a band whose other two albums I already bought but this CD I didn't see on the stores(and I don't have a credit card so I can't use iTunes Store). In addition, I got it from a friend and if I ever got the chance I would pay them(I can't use the mail because it'll just get sent back). I have also never pirated movies and the one time I tried to use a keygen it was for a game which I had already bought and played and wanted to replay but lost the original key for.

      And yet your type claims that just because I am an opponent of the MAFIAA's legal tactics and hate the patently untrue rhetoric about how nobody would make music anymore I obviously have to be a rampant pirate.

      Wait, how dare I bring facts into mindless Slashdot community bashing! TEH EVIL PIRATE SLASHDTO IS EIVIL!!!!!!!!!!!!

      --
      Please, for the good of Humanity, vote Obama.
    104. Re:A famous quote by Anonymous Coward · · Score: 0

      I didn't know free entertainment was so important.

    105. Re:A famous quote by Anonymous Coward · · Score: 0

      Overly Critical Guy is obviously of the opinion that he is the only "real" person on slashdot. Everyone else is part of an amorphous blob, holding a single uniform opinion.

    106. Re:A famous quote by Broken+scope · · Score: 1

      Oh yes, a denial from a relatively humble guy who enjoys what he does. I bet you hate your job.

      --
      You mad
    107. Re:A famous quote by Guignol · · Score: 1

      well it's not inverse square root, it's inverse of square root of (x).
      the square would be the reciprocal and it would be the unambiguous way to say so except there is no need to give that exotic name to the square function when it already has a very well known one (square).
      Inverse is very much the name of the function f(x)=1/x, at least it is in french and spanish. When we talk about the inverse of a function, we mean its reciprocal function, but when we talk about inverting a number, 'taking the inverse of a number', or apply the inverse function on a number (that is the function whose little name is 'inverse') we really mean finding the inverse of the number for the multiplication
      This is also an interesting remark because at no point did I even think about this function being the reciprocal of anything (plus, it is very clearly not a 'square' function, but I 'knew' that before I looked at the code). Looking at this thread, many people had the same reflex as you, I think it is interesting for the guy who wrote the article, because if you read the article, (maybe you did, I don't imply you did not) you will find that we didn't really get to find who actually wrote the piece of code, and maybe factoring in that the guy is not an english speaker (or maybe students do call the 1/x function inverse in other english speaking countries, well you get the idea) could help him refine his search...

    108. Re:A famous quote by Anonymous Coward · · Score: 0

      Did I miss something in my kindergarten calculus 101 class? Negative numbers cannot have a square root because by delimitation any number squared will be positive: a negative number times a negative number is a positive number.

      To find the absolute value of a number, take the square root of its square: sqrt(n^2)

    109. Re:A famous quote by Anonymous Coward · · Score: 0

      Please, please.

      Mod the parent funny!

    110. Re:A famous quote by saforrest · · Score: 1

      Did I miss something in my kindergarten calculus 101 class? Negative numbers cannot have a square root because by delimitation any number squared will be positive: a negative number times a negative number is a positive number.

      I take it then you've never heard of imaginary numbers, then?

    111. Re:A famous quote by xsonofagunx · · Score: 1

      Parent wasn't talking about free entertainment, he was talking about buying DRM'd music. One should [theoretically] be able to buy music free from DRM. Buying DRM'd music is bad because you're locking yourself into a format which may cease to be supported. That's where the PlaysForSure crowd are right now. They bought music DRM'd with PlaysForSure, and now many want to upgrade to a new player, maybe a Zune - but they can't. All the music they paid for is worthless on a Zune. Apple won't necessarily do this with ITunes, but do you really see them still releasing devices supporting legacy [circa 2006, perhaps] ITunes DRM'd music in 10 years? [Maybe?] 20? [Doubtful.] 50? [Not bloody likely]. The point is, you buy a CD [and at least in the good old days before Sony dropping rootkits and such], and you can do what you want with it. Rip it, copy it, record it onto a cassette tape, use it as a decorative coaster - whatever you want. You buy DRM'd music, and you do only what the publisher wants you to do with it, nothing more.

      And sure, you could say 'well, when it's not supported anymore, just crack the DRM and do whatever with it' but if that's the case you may as well not have bought it in the first place, since you're still breaking laws in the long run.

    112. Re:A famous quote by xsonofagunx · · Score: 1
      high school frat boy
      I'm sorry, you've chosen conflicting terms.
      Please try again.
    113. Re:A famous quote by Puff+Daddy · · Score: 1

      Thank you for the clarification. Sometimes I find that the biggest hindrance to my study of math is that I paid attention in high school. There, 1/2 was the inverse of 2, no need to point out that 1/2 is the multiplicative inverse of 2 since it was assumed that I was sleeping instead of learning something. I will not make the same mistake again.

    114. Re:A famous quote by abradsn · · Score: 1

      It also helps do that.

  2. And so why do we care? by Codename46 · · Score: 2, Insightful

    From the words of John Carmack himself, if you discover and implement an algorithm by yourself, even though it may have already been discovered already, you deserve credit for finding it on your own.

    [Insert rant about software patents]

    1. Re:And so why do we care? by From+A+Far+Away+Land · · Score: 4, Funny

      " Fast InvSqrt(), which has been known to cause strong geeks to go wobbly in the knees "

      I was a little worried when Slashdot posted the Britney Spears beaver pictures, but they now have their credibility back as the home of "News for Nerds".

    2. Re:And so why do we care? by Anonymous Coward · · Score: 5, Funny

      Britney Spears beaver pictures Link please.

    3. Re:And so why do we care? by i.r.id10t · · Score: 1

      The pic links are all over - check at foobies.com or similar.

      However, if it really did get posted to /., I'd like a link so I can read the comments...

      --
      Don't blame me, I voted for Kodos
    4. Re:And so why do we care? by gstoddart · · Score: 2, Funny

      I was a little worried when Slashdot posted the Britney Spears beaver pictures

      Hmmmm ... why can't we dupe stories like that more often? :-P
      --
      Lost at C:>. Found at C.
    5. Re:And so why do we care? by Tx · · Score: 0, Offtopic

      Here you go. And mods, I want +5 informative for this ;)

      --
      Oh no... it's the future.
    6. Re:And so why do we care? by Tx · · Score: 5, Informative

      Crap, lets try that again, with the link this time ;).Here you go.

      --
      Oh no... it's the future.
    7. Re:And so why do we care? by Pollardito · · Score: 4, Funny
      The pic links are all over - check at foobies.com or similar.

      However, if it really did get posted to /., I'd like a link so I can read the comments...
      i imagine it'd be all sorts of variations of "what's that?", "it might be a tribble", and "it's like a wookie, but smaller"
    8. Re:And so why do we care? by Anonymous Coward · · Score: 0

      Holy Shit!

      What's going on here?

    9. Re:And so why do we care? by zdzichu · · Score: 1

      You really missed this link? (NSFW and offtopic).

      --
      :wq
    10. Re:And so why do we care? by JabberWokky · · Score: 4, Funny
      Clearly you haven't seen the pictures. It looks much more like a Deltan.

      --
      Evan

      --
      "$30 for the One True Ring. $10 each additional ring!" -- JRR "Bob" Tolkien
    11. Re:And so why do we care? by Anonymous Coward · · Score: 0

      C section scar

    12. Re:And so why do we care? by Anonymous Coward · · Score: 0

      Well, in the picture she's getting out of a limo with Paris Hilton, so there was probably recently a pineapple in there.

      TaAAaa DAaAAAAaAaa!!!

    13. Re:And so why do we care? by Anonymous Coward · · Score: 0

      right, but what about the discharge?

    14. Re:And so why do we care? by sootman · · Score: 5, Interesting

      Holy crap--you forgot the link but the mods followed your instructions anyway.

      Mods: I want +5, Funny for this. No, no, wait: +5, Informative. No, wait, anyone can google something and be "informative." I want a +5, Interesting.

      Thanks.

      --
      Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
    15. Re:And so why do we care? by Anonymous Coward · · Score: 0

      Semper ubi sub ubi! 8-)

      (google it if your latin is a bit rusty)

    16. Re:And so why do we care? by Mex · · Score: 1

      It's not as great as you think: (NSFW)

      http://leforo.com/showthread.php?t=8794

    17. Re:And so why do we care? by monkeyboythom · · Score: 0
      I was a little worried when Slashdot posted the Britney Spears beaver pictures, but they now have their credibility back as the home of "News for Nerds".
      But that is probably for most of the nerds here as close to seeing, much less having, a cooter of their very own.
    18. Re:And so why do we care? by sbillard · · Score: 1

      i imagine it'd be all sorts of variations of "what's that?", "it might be a tribble", and "it's like a wookie, but smaller" It's a trap!

    19. Re:And so why do we care? by Anonymous Coward · · Score: 0

      I would have suggested something more unusual, like +5 Underrated Troll.

      Except anonymously so it wouldn't harm my karma.

    20. Re:And so why do we care? by Anonymous Coward · · Score: 0

      Eww... you're right, didn't notice that at first, but looks like she ummm... finished her menstrual cycle. How unpleasant.

      I can't decide whether it's good or bad when you see that these girls who are made up to look so great on TV are basically less attractive than a reasonable percentage of the girls I've bagged in my life. And I'm just a Slashdot poster who knows how to bathe and talk to women, not some superstud.

    21. Re:And so why do we care? by maxume · · Score: 1

      There may have been others, this one was my fault:

      http://slashdot.org/comments.pl?sid=208822&cid=170 28016

      --
      Nerd rage is the funniest rage.
    22. Re:And so why do we care? by From+A+Far+Away+Land · · Score: 1

      You got your wish. You wasted it. Why didn't you ask the mods for world peace?

    23. Re:And so why do we care? by Anonymous Coward · · Score: 0

      How unpleasant. Women really are the most disgusting creatures on the planet. Too bad they have all the vaginas.
    24. Re:And so why do we care? by bogd · · Score: 1

      Hey, it worked! Who would've known it would be that easy? :)) I can just see the sigs popping up all over the place: "I want +5, Funny!"; "Gimme +5, Interesting!"...

    25. Re:And so why do we care? by Wonko+the+Sane · · Score: 1

      You got your wish. You wasted it. Why didn't you ask the mods for world peace? Or more importantly, how about an open source nvidia driver?
    26. Re:And so why do we care? by un1xl0ser · · Score: 1

      It's like a shaved tribble with a cesarean scar above it.

      --
      v4sw6PU$hw6ln6pr4F$ck 4/6$ma3+6u7LNS$w2m4l7U$i2e4+7en6a2X h
    27. Re:And so why do we care? by phil.bachman · · Score: 1

      Not being a Star Trek type myself, I wanted to check out what a Deltan was... Beyond the mere surface similarities, your comparison's continued accuracy at all levels is astounding... I bet Britney's Deltan does emit some extremely powerful pheromones, in addition to being able to stop pain. In the fictional Star Trek universe, the Deltans are a closely humanoid species, probably Federation members. Only one confirmed Deltan has been seen on screen, Lieutenant Ilia in Star Trek: The Motion Picture, although the novelization of Star Trek II: The Wrath of Khan identifies at least one of the scientists on the Regula 1 space station as Deltan. Presuming she is typical of her species, Deltans (or at least female Deltans) are most notable for having bald heads. They have a limited telepathic ability, allowing them to stop another person from feeling pain by touching them. Deltans emit extremely powerful pheromones, provoking a strong sexual reaction in many other species. Although this article previously reported that this detail of Deltan pheromones existed only in non-canonical sources, the issue is addressed in Star Trek: The Motion Picture. Ilia reported that her "Oath of Celibacy" was on record when she reported to Kirk. While the original theatrical release does not expand on this, deleted scenes that were later incorporated into the film includes dialogue establishing that she is "as safe as any Human," and that she would never take advantage of a "sexually immature species". Deltan culture is also said to heavily emphasise sexuality - Roddenberry once said that to a Deltan, sex is like a handshake. Deltans in Starfleet were thus forbidden from engaging in sexual activity so as to limit their disruptive influence on other species. Other sources contend that sexual activity with a Deltan is such an intense experience that a non-Deltan who engages in such activity with a Deltan risks insanity.

    28. Re:And so why do we care? by phil.bachman · · Score: 1

      Eh, the Deltan stuff was taken from wikipedia.

      I guess that in comment formatting only works if it is explicitly put in HTML

  3. I know who wrote it by circletimessquare · · Score: 4, Funny

    I have a truly marvelous proof of who wrote this code which this comment box is too narrow to contain.

    --
    intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
    1. Re:I know who wrote it by WilliamSChips · · Score: 1

      And then 300 years later somebody uses advanced elliptic curves to figure it out.

      --
      Please, for the good of Humanity, vote Obama.
  4. Obviously SCO's intellectual property! by gukin · · Score: 4, Funny

    Anyone who has ever played a computer game should pay up.

  5. This paper seems to have the info by nels_tomlinson · · Score: 5, Informative

    The linked site seems to be down (gee, you think it might be slashdotted?), but this paper seems to be covering the same topic.

    1. Re:This paper seems to have the info by SkankinMonkey · · Score: 3, Informative

      There's also the mirrordot link

    2. Re:This paper seems to have the info by Toasty16 · · Score: 4, Funny

      Ohhhhhh, now I get it... [Looks furtively around to see if anyone knows he doesn't get it]

    3. Re:This paper seems to have the info by tobiasly · · Score: 1

      It's articles like this that clearly demonstrate one thing: the only single useful tag to come out of the whole Slashdot tagging beta is "slashdotted".

      Now, when I see that tag, I don't even bother clicking until it's off the front page (or use mirrordot of course).

    4. Re:This paper seems to have the info by ConceptJunkie · · Score: 1

      Don't worry, I got it too.

      wink wink

      --
      You are in a maze of twisty little passages, all alike.
  6. What's with use of Pointers? by MankyD · · Score: 2, Interesting
    Why does the coder use
    int i = *(int*)
    instead of just
    int i = (int)x;


    Can someone enlighten me?
    --
    -dave
    http://millionnumbers.com/ - own the number of your dreams
    1. Re:What's with use of Pointers? by MankyD · · Score: 1
      That first one didn't come out right:
      int i = *(int*) &x;
      --
      -dave
      http://millionnumbers.com/ - own the number of your dreams
    2. Re:What's with use of Pointers? by Anonymous Coward · · Score: 1, Insightful

      > int i = (int)x;

      converts a float to an int, potentially cutting off the digits past the decimal point and/or leading digits that don't fit into the 32 bit integer range.

      > int i = *(int*);

      reinterpretes the bit pattern of the float number as an integer, so the exponent and the sign bit become part of the number.

    3. Re:What's with use of Pointers? by MeanMF · · Score: 2, Insightful

      int i = (int)x would convert the floating point number to its rounded integer equivalent.

      int i = *(int*)x assigns i the 32-bit value that represents how x is stored in memory as a float.

    4. Re:What's with use of Pointers? by Second_Derivative · · Score: 1

      A straight int cast would round off the float into an int. The intention of the code here is to manipulate the binary representation of the float instead.

      I think it's a quick hack to halve the exponent and linearly extrapolate a new value for the mantissa.

    5. Re:What's with use of Pointers? by ewhac · · Score: 5, Informative
      C "understands" ints and floats. If you do the simple cast:

      int i = (int)x;

      Then C will simply convert the float value into an integer value (throwing away fractional part). But this isn't what we want. We want to operate on the bits of an IEEE floating point value directly, and integers are the best way to do that.

      So first, we lie to the compiler by telling it we have a pointer to an int:

      (int *) &f

      And then we deference the pointer to get it into an operable int:

      i = *(int *) &f

      Note what's important here is to keep the compiler from modifying any part of the original 32-bit value.

      Schwab

    6. Re:What's with use of Pointers? by chrisbtoo · · Score: 1

      Because he's trying to get at the binary representation of the IEEE754 floating point number, which he then manipulates a bit, and puts back where it was.

      With your suggested code, you'd be operating on the actual number, casting the float to an int, which would just be throwing away a bunch of information.

      --
      Registering accounts later than some other chrisb since 1997
    7. Re:What's with use of Pointers? by eis271828 · · Score: 5, Informative

      (int) x would convert the floating point value to an integer (truncation, basically).
      *(int*) &x treats the bits as an integer, with no behind the scenes conversion to an actual int value.

    8. Re:What's with use of Pointers? by radarsat1 · · Score: 4, Informative

      Because in C's own weird way, that's the only way of refering to a float as an int without changing the bits.

      If you do this:
      int i = (int)3.0f;

      You get i=3, like what you'd get from the floor() function.

      If you do this:
      float f = 3.0f;
      int i = *(int*)


      Then i contains a bit-for-bit copy of the IEEE floating-point representation of 3.0.

      It's because C knows how to cast a float to an int by applying the floor function. However, if you do it the second way, you aren't casting a float to an int, you are casting a pointer-to-float to a pointer-to-int and then dereferencing it.

      By the way, I just wanted to say... this is one of the most interesting things I've read on Slashdot in a while. Wow. That function is just amazing. I only wish I understood how it worked. I know nothing about what a "Newton-Raphson iteration" is.

    9. Re:What's with use of Pointers? by uhoreg · · Score: 1

      Try them out. You'll see that they give different numbers. "int i = (int) x" converts x into an integer with approximately the same value. "int i = *(int*)&x" gives you an integer with the same binary representation as the float (and which will give different results depending on endianness and size of int).

      --

      To get something done, a committee should consist of no more than three persons, two of them absent.

    10. Re:What's with use of Pointers? by ars · · Score: 1

      I think, but am not sure, that it's copying the floating point value bit for bit into an integer. It's not converting the float into an int.

      --
      -Ariel
    11. Re:What's with use of Pointers? by The+boojum · · Score: 1

      Because it's not actually a float-to-int cast as you'd normally think of it. Rather, it's mucking with the bit representation of the float. It's roughly equivalent to "union { float x; int i; };"

    12. Re:What's with use of Pointers? by Anonymous Coward · · Score: 0

      The (int)i version does data conversion from float to integer. The detour through the address prevents that, because while there is a conversion from float to int, there is none from float* to int*. So *(int*)&x delivers the binary IEEE754 float representation of x. (int)i delivers the rounded integer value of x.

    13. Re:What's with use of Pointers? by doshell · · Score: 1

      Since x is a float, int i = (int)x; would find the (best) integer approximation of x and store it in i. What the author meant to do was take the binary representation of x and have the compiler treat it as the binary representation of an int -- thus the "taking the address", followed by a cast to int *, followed by a dereference to store the actual value in i.

      If this gets you confused, I'd definitely suggest trying out both methods in your favorite compiler :)

      --
      Score: i, Imaginary
    14. Re:What's with use of Pointers? by eneville · · Score: 0

      excuse me. i did not RTFA, it was offline. i assumed x was also an int please excuse this. mod parent down if you have points please.

    15. Re:What's with use of Pointers? by Anonymous Coward · · Score: 0

      He is doing floating point computations in an integer. He is interested in the values in the bytes, so simply casting the float to an int would not work.

    16. Re:What's with use of Pointers? by systemeng · · Score: 1

      The reason is that they want to use the bit values of the floating point number in the integer unit of the processor, not get the integer representation of the number.

    17. Re:What's with use of Pointers? by ToxikFetus · · Score: 5, Informative
      Why does the coder use

      (1) int i = *(int*)

      instead of just

      (2) int i = (int)x;
      (some of my points added for emphasis)

      I'll take a swing at this one. It's because the author doesn't want the value of x, but the integer representation of the value at x's memory address.

      If x is 3.14159, (2) will result in i==3, whereas (1) will result in whatever the 4-byte IEEE-754 representation of 3.14159 is (0x40490FD0, if Google is correct). By using (1), the author is able to use integer bitwise opeartions (>>) to perform "free" floating point operations. When i is sent back into floating point form via:

      x = *(float*)

      x now contains the value of the integer operation:

      i = 0x5f3759df - (i >> 1);

      which was presumably faster than an identical floating point operation. It's a nifty little solution, especially with regard to the selection of the magic number.
    18. Re:What's with use of Pointers? by uhoreg · · Score: 1

      correction: endianness may not have an effect on the result (unless it's some very odd processor that has different endianness for floats and ints).

      --

      To get something done, a committee should consist of no more than three persons, two of them absent.

    19. Re:What's with use of Pointers? by mooingyak · · Score: 1

      One is 'take this value and convert it to an integer' (your version).

      One is 'take this set of bytes, assume it's integer data, and tell me what value you get when you dereference (their version).

      They won't necessarily render the same result.

      I haven't looked at the code, but depending on the type of X, it could provide some wildly different results.

      --
      William of Ockham had no beard. The most likely explanation is that it was chewed off by squirrels every morning.
    20. Re:What's with use of Pointers? by Geoffreyerffoeg · · Score: 2, Informative

      Note what's important here is to keep the compiler from modifying any part of the original 32-bit value.

      Moreover, when compiled, the optimizer does the right thing - since access to the variable is actually a memory dereference (or a register access, but we'll ignore it for now), *(int*)& means "tell the compiler this is an int, but don't do anything in the code", whereas (int) means "create a new temporary that's the int part of the variable".

    21. Re:What's with use of Pointers? by dsci · · Score: 4, Informative

      Newton-Raphson is a general algorithm for finding root of an equation f(x)=0.

      You start with some INITIAL GUESS (the real beauty of this algorithm) X(0), then apply:

      X(n+1) = X(n) - f(X(n)) / f'(X(n))

      where
      X(n+1) is the NEXT guess after the value you 'know',
      X(n) is that most recent value you know,
      f(X(n)) is the function evaluated at X(n) and
      f'(X(n)) is the first derivative of f(x) evaluated at X(n).

      It's not foolproof and a BOTH whether it converges at al AND how FAST it converges depends on the initial guess, X(0)

      The "Secant Method" is an improvement that makes it a little 'smarter,' at the expense of more computation (this is often a positive trade-off on numerical modeling codes, since the 'smarter' algorithm does tend to converge faster). There are other improvements as well, such as the Los Alamos Linear Feedback Solver (a slightly modified secant method that converges about 10-17% faster, at least for some types of problems) that I use in my own codes.

      Obligatory wikipediea followup: Newton's Method

      --
      Computational Chemistry products and services.
    22. Re:What's with use of Pointers? by 91degrees · · Score: 1

      Other people have answered this, so I'll ask another question.

      Isn't it a little clearer, and potentially faster (at least if your optimiser doesn't understand) to use
      union
      {
          float f;
          int i;
      }value;
      value.f = x;
      i = value.i;

    23. Re:What's with use of Pointers? by The+boojum · · Score: 2, Informative

      [Forgive the LaTeX notation -- I'm not sure how else to best express the math without sub and sup tags.]

      Newton-Raphson is a root finding method. Given a starting point it finds successively more accurate approximations to the root. The formula for it looks like: x_{n+1} = x_n - f(x_n)/f'(x_n), where f(x) is the function to find the root of, f'(x_n) is the derivative of that function and x_n are the successive approximations. Essentially, given a guess it looks at the slope of the function at that point follows a tangent line until it crosses the zero line. That point becomes the next guess.

      Finding the reciprocal square root of a is equivalent to solving x^{-2} - a = 0 which means that you use f(x) = x^{-2} - a. The derivative of that is f'(x) = -2x^{-3}. Putting those into the Newton-Raphson formula gives you x_{n+1} = x_n - (x_n^{-2} - a) / -2x_n^{-3}, which simplifies down to x_{n+1} = x_n(3 - a*x_n^2)/2. This is equivalent to the last line before the return in that InvSqrt() function.

      Convergence of Newton-Raphson is quadratic which means that each time you repeat that line with a previous guess you double the number of digits of accuracy (up to the limit of the hardware of course).

    24. Re:What's with use of Pointers? by mrogers · · Score: 1
      Then i contains a bit-for-bit copy of the IEEE floating-point representation of 3.0.
      OK, so here's what's confusing me: when you right-shift i, what effect does that have on the floating point value? I don't think it's a simple divide by two, because a float contains a sign bit, an 8-bit exponent and a 23-bit mantissa. Right-shifting would seem to divide the exponent by two (i.e. take the square root), but it would also shift the least significant bit of the exponent into the most significant bit of the mantissa (making the mantissa negative if the exponent was odd, or positive if the exponent was even?). How on earth does subtracting the result from a magic value then give you the inverse square root?
    25. Re:What's with use of Pointers? by fractalus · · Score: 1

      Newton-Raphson is a method of repeatedly estimating the solution to an equation by refining guesses, using both the equation and its first derivative. This function does just a couple of steps. More information on NR can be found at Mathworld.

      --
      People are never as simple as their stereotypes. This applies equally to Christians, Muslims, and Emacs-lovers.
    26. Re:What's with use of Pointers? by Abcd1234 · · Score: 1

      I only wish I understood how it worked. I know nothing about what a "Newton-Raphson iteration" is.

      As an aside, is an excellent illustration of the value of a real computing science degree (assuming you take it at an institution where they actually expect you to learn math, numerical methods, etc).

    27. Re:What's with use of Pointers? by Cyberax · · Score: 1

      Nope, it's actually quite close to metal. We take the address of variable and reinterpret the variable at this address as integer.

    28. Re:What's with use of Pointers? by mark-t · · Score: 2, Interesting

      If the compiler is willing to hold the union in a register, then yes, using a union should almost certainly be faster... and would potentially save two 32-bit memory copy instructions. In fact, the biggest problem with the approach that the article mentions is that it uses the address-of operator, which generally implies to the compiler that the argument cannot be stored in a register, which would almost certainly prevent the compiler from using assorted possible optimizations.

    29. Re:What's with use of Pointers? by Reality+Master+101 · · Score: 1

      Isn't it a little clearer, and potentially faster (at least if your optimiser doesn't understand) to use...

      Definitely not clearer to someone well-versed in C, but also is not considered portable (I think -- it's been a few years since I was up on the C Standard). I believe C doesn't guarantee that all the union elements will be on the same alignment. The compiler will (probably) choose to do it that way, but it doesn't have to.

      --
      Sometimes it's best to just let stupid people be stupid.
    30. Re:What's with use of Pointers? by smcdow · · Score: 1
      Note what's important here is to keep the compiler from modifying any part of the original 32-bit value.

      Just curious, can this be done in Java?
      If not, why not?

      --
      In the course of every project, it will become necessary to shoot the scientists and begin production.
    31. Re:What's with use of Pointers? by greengearbox · · Score: 2, Informative

      That method can't be used, but see Float.floatToRawIntBits.

    32. Re:What's with use of Pointers? by gnasher719 · · Score: 1

      '' Why does the coder use
      int i = *(int*)
      instead of just
      int i = (int)x; ''

      Get your copy of the C Standard out, or any book about C, and figure out for yourself what each of these statements does.

      Maybe you can try a few different values for x, like 0.0, 1.0, 2.0, 3.0, 4.0, 5.0 and print * (int *) &x, preferably in hexadecimal format; that should give you some hint.

    33. Re:What's with use of Pointers? by DeadCatX2 · · Score: 1

      Good observation on dividing the exponent by two to get the square root.

      Right-shifting i also affects the sign bit if you use a logical right shift, so that takes care of the "can't square root a negative" problem.

      Recall that the mantissa is normalized (1 = mantissa 2; the non-fractional portion is also implied, so the bits of the mantissa are actually just the fractional portion). I would theorize that subtracting the result from the appropriate magic number corrects the exponent bit that got shifted into the mantissa, as well as probably handling the reciprocal operation.

      --
      :(){ :|:& };:
    34. Re:What's with use of Pointers? by Hamilton+Lovecraft · · Score: 1

      Right-shifting would seem to divide the exponent by two (i.e. take the square root), but it would also shift the least significant bit of the exponent into the most significant bit of the mantissa (making the mantissa negative if the exponent was odd, or positive if the exponent was even?). How on earth does subtracting the result from a magic value then give you the inverse square root? The key is that it doesn't give you the inverse square root; it gives you a not-ridiculously-bad approximation which is then refined via Newton-Raphson. The sign bit is separate from the mantissa, and the mantissa has a presumed leading "1." before the bits in the float. I've seen a less clever version of the same trick (and adapted it to cube roots) without the magic number that, step 1, separated the sign bit, step 2, unbiased the exponent, step 3, shifted the exponent and mantissa parts one to the right. The result changes 1.bbbbbb... x 2^x to 1.0bbbbbb... x 2^(x/2) (if the exponent is even) or 1.1bbbbbb... x 2^((x-1)/2) (if the exponent is odd), both of which are pretty good approximations. I haven't analyzed the magic number part of this version yet but I imagine it takes care of (a) protecting the sign bit and (b) unbiasing the exponent along the way to making a slightly better approximation.
      --
      step 3: god dammit, it doesn't work
    35. Re:What's with use of Pointers? by dsci · · Score: 2, Informative

      How on earth does subtracting the result from a magic value then give you the inverse square root?

      Subtracting the right shifted value from the magic value does NOT give the 'inverse square root.' This line of code only gives the initial guess for the Newton-Raphson algorithm.

      While the initial guess is a huge part of the real beauty of the algorithm (ie, why it can converge to a sufficiently accurate value after ONLY ONE ITERATION), the 'real work' is done in the

      x = x*( 1.5f - xhalf*x*x )

      line. That's that application of one iteration of Newton-Raphson (rewritten a bit from what I and others posted earlier, but for this problem algebraically the same). To get more accuracy in the calculation, apply this line repeatedly.

      The explanation of the why the magic value - (right shifted int represenation of the float) is a good initial guess is given in the Paper by Chris Lomont linked to in article. Specifically, look on Section 4 beginning on Page 3 of Lomont's article; that's where the fun starts.

      In a nutshell, from Lomont, the initial guess is computed "by multiplying the exponent by -1/2, and then picking bits to minimize error." I have not worked through all the details on why this is a good initial guess, but will reemphasize that getting a good result from Newton-Raphson DEPENDS on the initial guess. So, it is not surprising that someone hunted for a general algorithm to find a good initial guess.

      --
      Computational Chemistry products and services.
    36. Re:What's with use of Pointers? by arniebuteft · · Score: 5, Informative
      The article at http://www.math.purdue.edu/~clomont/Math/Papers/20 03/InvSqrt.pdf really explains it better, but the point is to trick the code into performing integer operations on a float. You start the function with a float, which is 32 bits, arranged in a very specific sequence: first bit is sign (0 for positive, 1 for neg), next 8 bits are exponent (offset by 127 to allow positive and negative exponents), and the last 23 bits are the mantissa (normalized to assume the decimal point has a binary 1 to its left). So, the simple good old number 21 in float is the sequence 0 10000011 01010000000000000000000, for the sign, exponent, and mantissa (omit the spaces of course). That same number 21, stored as a 32-bit integer is 00000000 00000000 00000000 00010101 (again, omit the spaces).

      The trick of this function is to take the 32 bits of data that are really a float, but process it as if it's an integer. So you take that cumbersome number 21 as a float, then BAM! presto, turn it directly to an integer not through type conversion, but by simply treating those same 32 bits as if they were representing an integer all along.

      Let's use the number 21 as an example in the function call.

      The binary representation of 21 as a float is 01000001 10101000 00000000 00000000 (broken into 8-bit words for clarity). The function then goes to create a new integer i, whose value is also 01000001 10101000 00000000 00000000 (which happens to be 1101529088 in decimal). The magical line of the code, i = 0x5f3759df - (i>>1), takes that integer i, shifts its bits one to the right (turning our 01000001 10101000 00000000 00000000 into 00100000 11010100 00000000 00000000, or 550764544 in decimal), then subtracts it (still doing integer math here) from 0x5f3759df (which is 01011111 00110111 01011001 11011111 or 1597463007 in decimal), and winds up with 00111110 01100011 01011001 11011111 (or 1046698463 in decimal).

      Now, for its next trick, it takes that wonky integer 1046698463, and turns it back into a floating point number, by the same trick used above, i.e. simply by looking at those same 32 bits, and pretending they're a float, not an int. The binary representation of 1046698463, 00111110 01100011 01011001 11011111, is the same as 0.22202251851558685 in float.

      From here on out, it's all floating math. Apply the Newton-Rhapson method (thats the next line), we get x = 0.22202251851558685 * (1.5 - ( (21*.5) * 0.22202251851558685^2 )) = 0.218117811. We return this value at the closing of the function. As it turns out, the inverse square root of 21 is 0.21821789... (thanks Google calc). So, I have no idea WHY the Float to Int to Float trick works, but it works very well.

      Whew!

    37. Re:What's with use of Pointers? by fireboy1919 · · Score: 1

      The Newton method is something generally taught in trigonometry in high school. At least, it was taught to me, and I was going to school in Florida, which is a low man on the totem pole.

      If you went to high school, took trig, and don't remember, then I'd say you probably don't remember much of it.

      --
      Mod me down and I will become more powerful than you can possibly imagine!
    38. Re:What's with use of Pointers? by HTH+NE1 · · Score: 2, Interesting

      Because in C's own weird way, that's the only way of referring to a float as an int without changing the bits.

      <yoda>No. There is another.</yoda>

      You can also lie to va_arg() about the type of the argument to achieve the same thing, but it's not as efficient.

      --
      Oh, say does that Star-Spangled Banner entwine / The myrtle of Venus with Bacchus's vine?
    39. Re:What's with use of Pointers? by Goaway · · Score: 1

      Seven lines of code is clearer than one line now? And if your "compiler doesn't understand", hwo on earth would doing TWO assignments instead of ONE be faster?

    40. Re:What's with use of Pointers? by ChadN · · Score: 1

      Think of the magic value as actually two magic values stuck together. One part chosen to fix up the exponent part of the float, and one part to help fix up the mantissa part of the float.

      The right shift reduces the mantissa by two (effectively an approximate square root of the value). The magic value is chosen so that the exponent is always made negative. Thus, the shift performs a sort-of square root, and the subtraction performs a sort-of inversion of the square root.

      If a '1' bit is shifted into the mantissa, that effectively increases the value of the mantissa by 1/2. That can be compensated for, somewhat, by choosing a magic number that sometimes forces a borrow from the exponent bit during the subtraction, thus reducing the exponent bits by one, and reducing the resulting float value by half. But sometimes the mantissa part is not increased by half (ie. a zero bit is shifted in from the exponent bits). Basically, you need to do an optimization in order to find what value for the mantissa bits is best (ie. gives the best approximation over all possible cases). The analysis is a bit hairy (and discussed in the pdf linked to by others), but doable.

      In the end, you convert back to a float, which is a pretty good starting point (the main effect being the exponent value of the input argument has been halved and negated), and use that for one iteration of Newton-Raphson square root approximation. Newton-Raphson is highly sensitive to the starting point (ie. it works much better and faster with a good first guess), which makes the integer shift-and-subtract step so valuable.

      Here is a discussion of a related method that is used for (non inverted) square roots:

      http://en.wikipedia.org/wiki/Methods_of_computing_ square_roots#Approximations_that_depend_on_IEEE_re presentation

      --
      "It's overkill, of course. But you can never have too much overkill." - Anonymous Slashdot Coward
    41. Re:What's with use of Pointers? by ChadN · · Score: 1

      Since the starting value is passed as an argument, and is a float (not a union), you would have to do a little extra work to define the passed in argument as part of a union (semantically). In the end, it is probably not worth it.

      If you knew your compiler could pass values by register (rather than on the stack), it might conceivably save a copy or two by avoiding use of the address-of operator. At that point, though, you would probably just write the function in assembly language.

      --
      "It's overkill, of course. But you can never have too much overkill." - Anonymous Slashdot Coward
    42. Re:What's with use of Pointers? by gnasher719 · · Score: 1

      Just follow it step by step.

      In an IEEE 32-bit floating point number, the highest bit is the sign bit. The next eight bits are the exponent adjusted by 0x7f, and the remaining 23 bits are the lower 23 bits of the mantissa. If you have a number x = 2^k * (1 + m / 2^23), then the floating point representation is highest bit = 0, next 8 bit = k + 0x7f, remaining 23 bits = m. That means you get

              i = ((k + 0x7f) * 2^23) + m = (k > 1) and interprets it as a floating-point number. First we examine what happens if we multiply x by 4: The exponent of x, that is k, is increased by two. i is increased by (2 > 1) = 0x1fc00000 + m/2, and l = 0x5f3759df - (i >> 1) = 0x3f7759df - m/2 = 0x3f000000 + 0x007759df - m/2. m/2 is between 0 and 0x003fffff, so the exponent field is equal to 0x3f000000. Interpreted as a floating-point number, it is (1/2) + (1 + (0x007759df - m/2) / 2^23) = (1/2) * (1.932430 - m/2^24) = (1/2) * (1.932430 - (x - 1)*2^23/2^24) = 1.216215 - x/4 after some length calculation. Now if you draw graphs of 1/sqrt (x) and 1.216215 - x/4 for 1 0x003759df.

      However, with all that said, on a modern CPU this code might not be running too fast, because you need to write a floating-point register to memory, read back to an integer register, write back to memory, and read back to a floating-point register; that is not usually that a modern CPU likes to do. Both PowerPC and Intel/AMD CPUs have a vector instruction that will calculate an approximate inverse square root for four floating-point values simultaneously, with higher precision than this code does.

    43. Re:What's with use of Pointers? by Anonymous Coward · · Score: 0
      real computing science degree

      Why would you need a degree to use high-school maths? IIRC it was covered in 6th form (i.e. for 16yo kids). Of course, proving it would be a more advanced exercise :-)

      I remember the teacher explaining it as: draw a graph of the function. To find a zero crossing (root), take a tangent to the graph by differentiating the function. Something simple like x^(-0.5) - y = 0 should converge on the root.

      Dang, I need to read up on it again!

    44. Re:What's with use of Pointers? by 91degrees · · Score: 1

      Seven lines of code is clearer than one line now?

      Yes. Unless you like code like *a += (*b-=3).

      And if your "compiler doesn't understand", hwo on earth would doing TWO assignments instead of ONE be faster?

      A good optimiser will eliminate simple assignments. It's just assigning a new variable idnetifier to the same register or address.

      Taking the address of a local variable may make the compiler put the variable into local memory. It would then have to be copied back. So not only could this require 2 copies, but it would also have to do this via memory which will usually be slower than via a register.

    45. Re:What's with use of Pointers? by prockcore · · Score: 1

      The way I always did it was:

      union { int dw; float f; } df; df.f = 1.5f; int i = df.dw;

    46. Re:What's with use of Pointers? by Solra+Bizna · · Score: 1
      Both PowerPC and Intel/AMD CPUs have a vector instruction that will calculate an approximate inverse square root for four floating-point values simultaneously, with higher precision than this code does.

      What's more, the PowerPC (at least, the 604 and above) has a scalar instruction to do the job. I've used it several times in my graphics code. frsqrte. In fact, on the 750, it executes in the same amount of time as a floating point add.

      It's one of the few instructions GCC isn't smart enough to use automatically. (another is rlwimi, which is without a doubt, the coolest instruction ever)

      -:sigma.SB

      --
      WARN
      THERE IS ANOTHER SYSTEM
    47. Re:What's with use of Pointers? by radarsat1 · · Score: 1
      It's one of the few instructions GCC isn't smart enough to use automatically. (another is rlwimi, which is without a doubt, the coolest instruction ever)


      Wouldn't the optimization be in the PPC-specific libc code for the sqrt() function, rather than being something that GCC does?
    48. Re:What's with use of Pointers? by radarsat1 · · Score: 1

      As an aside, is an excellent illustration of the value of a real computing science degree (assuming you take it at an institution where they actually expect you to learn math, numerical methods, etc).


      Sadly, I have one. ;-)

      Don't get me wrong, I learned a hell of a lot and wouldn't exchange it for anything in the world, but frankly I wasn't the best in th e math classes, I'll admit.

      However, I don't remember covering this particular algorithm. (I never took the advanced algorithms class though.)
    49. Re:What's with use of Pointers? by Abcd1234 · · Score: 1

      However, I don't remember covering this particular algorithm. (I never took the advanced algorithms class though.)

      Interesting. I know, in my degree program, numerical methods were a required course. In it, we covered all sorts of numerical methods, including estimation techniques such as Newton-Raphson's, and also delved into computing error, how to handle error, etc. But, as I say, it *really* depends on the degree program...

    50. Re:What's with use of Pointers? by gnasher719 · · Score: 1

      '' Definitely not clearer to someone well-versed in C, but also is not considered portable (I think -- it's been a few years since I was up on the C Standard). I believe C doesn't guarantee that all the union elements will be on the same alignment. The compiler will (probably) choose to do it that way, but it doesn't have to. ''

      How would you suggest that elements of a union could possible have different alignment, since they have the same address?

      The reason why it is non-portable is that the C Standard explicitely says that assigning to one element of a union and then reading another element is undefined behavior (with the exception of certain cases with elements that are structs). The "* (int *) &x" is non-portable for exactly the same reason (because the C Standard says so).

    51. Re:What's with use of Pointers? by gnasher719 · · Score: 1

      '' Just curious, can this be done in Java? ''

      Yes. Even better, in Java it is portable and guaranteed to work.
      Java has explicit functions to get the bit-representations of a float or double in a 32 bit or 64 bit integer and to create a float or double value from its bit-representation.

    52. Re:What's with use of Pointers? by Reality+Master+101 · · Score: 1

      How would you suggest that elements of a union could possible have different alignment, since they have the same address?

      Correct me if I'm wrong (and I could be -- my C is rusty, and please forgive any syntax mistakes), but for:

      union { char a[5]; int b; } u;

      &u.a is not guaranteed to be the same as &u.b. That way, you could potentially pack unions/structs that have character data and don't need to be aligned. In other words, 'a' doesn't need to be aligned if the preceding data caused the union to fall on an odd address, but b could force an alignment.

      --
      Sometimes it's best to just let stupid people be stupid.
    53. Re:What's with use of Pointers? by ebyrob · · Score: 1

      Wait.

      Are you claiming you have a method in strait C for taking a float (that isn't dereferenced), treating it as in int (without casting), and avoiding the possibility of memory operations?

      If so, please share.

    54. Re:What's with use of Pointers? by clarkcox3 · · Score: 1
      As an aside, is an excellent illustration of the value of a real computing science degree (assuming you take it at an institution where they actually expect you to learn math, numerical methods, etc).


      Isn't Newton-Raphson iteration taught in high school calculus?

      --
      There are no tiger attacks in my area and it's all because this rock I'm holding keeps the tigers away.
    55. Re:What's with use of Pointers? by Kuciwalker · · Score: 1

      At CMU, oddly enough it *isn't* required for the CS degree but I do have to take it for my Math double.

    56. Re:What's with use of Pointers? by 91degrees · · Score: 1

      You're right. I tried this and get exactly the same code with both methods.

    57. Re:What's with use of Pointers? by CryoPenguin · · Score: 1

      Clearer? I don't think so, but that's just an opinion.
      Faster? Not on x86. Integers and floats use different registers, and it's not possible to convert one to the other without going through memory anyway. (SSE can do it, but then SSE has a single instruction for InvSqrt so it wouldn't need the function at all.)

    58. Re:What's with use of Pointers? by dozer · · Score: 1

      Try it. I think you'll find you're wrong. On any post-1980s compiler both techniques will either produce the exact same code or, if one is slower, then it's the union that's causing the register to spill. Since nobody uses unions, nobody really optimizes for them but, believe you me, C compilers optimize the hell out of pointers.

      And, the address-of operator implies no such thing if it's optimized away (and it is). Maybe you were thinking of the volatile keyword?

    59. Re:What's with use of Pointers? by Anonymous Coward · · Score: 0


      Well, no. There is another way. The right way of doing this in C without lying to the compiler is to use a union.
      union U1
      { int asInt;
          float asFloat;} V1;

      This tells the compiler that V1 is a variable that has size = max(sizeof(int),sizeof(float)) and whose memory can be accessed as an int or a float.

      U1 V1;
      V1.asFloat = x;
      x=x/2;
      v1.asInt = 0x5f3759df - (V1.asInt >> 1);
      x = v1.asFloat*( 1.5f - x*v1.asFloat*v1.asFloat )

      In this day and age of strict pointers, this is a lot safer, as the compiler is allowed to assume that an int pointer doesn't point to the same place as a float pointer, and can rearrange code to modify the float before you read the integer. This doesn't look like a problem in this piece of code as there are other dataflow dependencies, but you never know.
      </pedantic>

      Basically a union says to the compiler: look man, I'm going to put in an int into this variable and then use it as float, that's cool. I know what I'm doing. Don't worry about it.

    60. Re:What's with use of Pointers? by mark-t · · Score: 1

      Actually, I _did_ try it... just now, using GCC 3.4.6, and used the -S option to analyze the assembly output of both. Regardless of whether -O1, -O2, -O3, or -Os was used, using a union always used one less mov instruction than the code described in this article did when compiled with the same optimization flag (it was otherwise identical except for which registers it used in a few of the instructions). Compiling both without any optimization flags on the command line, the union method saved _eight_ instructions over the code described in this article. So as I expected, a decent compiler _will_ optimize code that uses unions. The address-of/recast/dereference approach is an old hack for converting between the binary representation of data types that saves a programmer declaring a union that is probably only being used once (and therefore saves typing) but it's definitely no faster.

    61. Re:What's with use of Pointers? by hao2lian · · Score: 1

      I can't speak for all high school courses, but all AP Calculus (AB probably, maybe BC) cover Newton's method, usually part of the "applications of derivatives" section, a section that also entails throwing students that barely comprehend the material into the deep end of related rates.

      --
      Pelé!
    62. Re:What's with use of Pointers? by mark-t · · Score: 1

      (void*)&u.a is the same as (void*)&u.b is the same as (void*)&u. The only reason why &u.a and &u.b could be different according to the standard is because they are different types of pointers and some architectures may use different representations for pointers of different types. Casting them to any common pointer type that does not have stricter alignment restrictions than the union itself will yield the same address for each.

    63. Re:What's with use of Pointers? by mark-t · · Score: 1

      float InvSqrt (float x) {
      union { float x; int i; } value;
      float xhalf;
      value.x=x;
      xhalf = 0.5f*x;
      value.i = 0x5f3759df - (value.i>>1);
      value.x = value.x*(1.5f - xhalf*value.x*value.x);
      return value.x;
      }
      I compared the results compiled by GCC for the above code against the code for the InvSqrt code given in the article, and the code I just described above always saved at _least_ one instruction over the article's code, regardless of which optimization flags were used. Score one point for writing cleaner code. The only advantage the article's code has is that you don't have to create a union type to hold the value, but it's definitely no faster than using a union.

    64. Re:What's with use of Pointers? by mark-t · · Score: 1

      Actually it _IS_ faster on an x86... I tested it with gcc 3.4.6 The union method, regardless of which optimization flags were used, was always at _LEAST_ one instruction less than the assembly produced by the code described by the article when compiled with the same optimization flags.

    65. Re:What's with use of Pointers? by mark-t · · Score: 1

      Oops... I made a typo in the above code. The line that reads xhalf = 0.5f*x; should actually read xhalf = 0.5f*value.x;

  7. nope by rhombic · · Score: 1

    If you RTFA, the first person the author asks about it is J.C., and he says "nope, not me, maybe this other guy". Specifically NOT taking credit for this snip of code.

    --
    1984 was supposed to be a warning, not an instruction manual.
  8. Comment removed by account_deleted · · Score: 1

    Comment removed based on user account deletion

  9. x * x, right? by h4ter · · Score: 0, Redundant

    The inverse square root is 1/(sqrt(x)), not x^2 (which is, I admit, the first thing I thought of, wondering why anyone would be so excited about a faster way of getting it).

    1. Re:x * x, right? by Eudial · · Score: 2, Informative
      The inverse square root is 1/(sqrt(x)), not x^2 (which is, I admit, the first thing I thought of, wondering why anyone would be so excited about a faster way of getting it).


      Well, you're sort of right.

      f^-1(x) = x^2 is the inverse of f(x) = sqrt(x) where x > 0

      --
      GAAH! MY PRINTER IS ON FIRE!!! PUT IT OUT! PUT IT OUT!
    2. Re:x * x, right? by Anonymous Coward · · Score: 0

      no. (sqrt(x))^2 = x holds for all complex x, not just the positive reals.

    3. Re:x * x, right? by Anonymous Coward · · Score: 0

      Which is why the complex number system kicks the real number system's butt all over the map. *high fives*

      (Incidentally, (sqrt(x))^2 = x for 0, too, not just positive reals like the GP's x > 0, but that's getting pedantic.)

  10. Old and busted: Duff's device by Stavr0 · · Score: 3, Funny
    New hotness: Fast InvSqrt()

    Naah, just kidding. They both deserve a spot in the Clever Hacks Hall of Fame

  11. InvSquirt!? by paniq · · Score: 0, Offtopic

    Does InvSqurt stand for Inverted Squirt? Is this when I bring my Zune back to the store after I found out what a piece of shit it is?

    --
    Do not trust this signature.
  12. excellent thread by Anonymous Coward · · Score: 0

    great read, enjoyed it. brought back many good memories.

    that just goes to show you tweeds in VM land that you need to get down and dirty sometimes.

  13. Hmm... by porkmusket · · Score: 2, Funny

    Suddenly I'm very scared that Ballmer will want to InvSqrt me some pictures of his kids.

    1. Re:Hmm... by gstoddart · · Score: 2, Funny
      Suddenly I'm very scared that Ballmer will want to InvSqrt me some pictures of his kids.

      Ummmm .... you mean, squirt his kids back to him? :-P
      --
      Lost at C:>. Found at C.
  14. Poor function name by bloobloo · · Score: 4, Insightful

    The inverse of the square root is the square. The reciprocal of the square root is what is being calculated. In the case of multiplication I guess it's a reasonable name but it's pretty poor form in my view.

    1. Re:Poor function name by Anonymous Coward · · Score: 1, Informative

      The square root can be written as x^(1/2). The inverse square root can be written as 1/(x^(1/2)). This equals x^(-1/2).

      (-1/2) != (2).

      x^(-1/2) != x^(2).

      QED.

    2. Re:Poor function name by bloobloo · · Score: 1

      As I said, the inverse of the square root of a number under the operation of multiplication is the reciprocal of the square root of the number. This function is named specifically because of that property.

      But it's a function name. One would expect InvSqrt(n) to do the opposite that Sqrt(n) would do, i.e. squaring. RecipSqrt(n) would be a better name although I do admit it would take 2 more key presses!

    3. Re:Poor function name by Anonymous Coward · · Score: 0

      An easy counter-example is x=1. Your jump from step 2 to step 3 is incorrect in general. Also, step 2 fails to account for small values of 2.

    4. Re:Poor function name by truedfx · · Score: 3, Funny
      x^(-1/2) != x^(2).
      Er... no. It's entirely possible for x^(-1/2) and x^(2) to be equal. This happens when x=1. (I know what you meant, but what you posted is very different.)
    5. Re:Poor function name by Anonymous Coward · · Score: 0
      The inverse of the square root is the square.
      Or perhaps "InvSqrt()" translates to "Inversion of the Square Root". At least it wasn't called "asdfqwq()" (I've written code like that).

      But really, the definition of any term is always dependent upon the author. Context helps identify the desired definition, which why we look at the code. Summary comments would help considerably in this case.
    6. Re:Poor function name by Jeff+DeMaagd · · Score: 1

      To me, InvSqrt looks like a reasonable and intuitive function name. I really don't understand your point, I would never assume that it would return the square, that's thinking too hard about it.

    7. Re:Poor function name by Neoncow · · Score: 1

      It's not too hard to think about inverses. Take: y = x^(1/2) for example. To find the inverse, simply swap x and y, then solve for y.

      x = y ^(1/2)
      x^2 = y

      Tada! Hope that helps.

      For reference, the definition of a reciprocal is when you swap the numerator and denominator of a fraction.
      Thus x^(1/2) becomes 1/( x^(1/2) )

    8. Re:Poor function name by rg3 · · Score: 2, Insightful

      The word "inverse" has several uses in mathematics. One of them is the one you mentioned, which is to say "inverse function". In that sense, working with positive numbers, the inverse function of the square root is the square, yes. However, there is also the concept of "inverse element" for sets and operations. It's a very generic concept that relates to an operation and the identity element for that operation. For the sum, the identify element is 0, given that x + 0 = x. The inverse element of another element regarding a given operation (and it's usually written x^(-1) for the element x), is one such that x op. x^(-1) = i, being i the identity element.

      Again, going back to the addition operation, the identity element is 0, x + x^(-1) = 0. In this case, x^(-1) is -x. For the multiplication operation, the identify element is 1 and the inverse element x^(-1) of any x is 1/x. In my math classes in high school, the teacher always called -x "opposite x" and 1/x "inverse x". So, basically, if you read "square root" not as a function, but as a number, the "inverse square root" is 1/sqrt(x), and the "inverse function for the square root function", x^2.

      All of this to say what? That it's not that easy and it can be discussed.

  15. It was fast by KalvinB · · Score: 3, Informative

    http://www.icarusindie.com/DoItYourSelf/rtsr/index .php?section=ffi&page=sqrt

    That page compares the time it takes to calculate the sqrt various ways including Carmacks. Short version is that modern processors are significantly faster since it can be done in hardware. It may still be useful in cases where the processor doesn't have the sqrt function available.

    His version took 428 cycles compared to 107 cycles doing it in hardware on the same system.

    1. Re:It was fast by Anonymous Coward · · Score: 0

      That's just the square root, we're looking at calculating the inverse too.

    2. Re:It was fast by systemeng · · Score: 5, Insightful

      First off, this function calculates 1.0/sqrt(x), not sqrt(x). InvSqrt is a particularily nasty function because both the divide and the square root stall the floating point pipeline on IA32 processors. As a result, instead of shooting out one result per cycle that the pipelining normally allows, the processor will stall for 32 cycles for the divide after it has stalled for the 43 cycles for the square root(P4). This is a big hit to realtime performance and it also prevents 76 multiplies from getting done while the pipeline is stalled. Secondly, IA32 processors are super scalar and have multiple integer units which can do portions of this calculation in parallel. This algorithm is brilliant because it uses the integer units for a portion of the most difficult part of the calculation and the remaining floating point multiplies only take about 6 clock cycles on the FPU. The difference in clock cycles you are counting is likely because the routine as written will be implemented as a function call and the stack push overhead will eat you alive. If this is implemented inline, it's about 6 times as good as simply calling the processor's assembly instructions for root and divide in sequence with the penalty that it isn't as accurate. It is virtually impossible to beat sqrt on IA-32 but 1.0/sqrt can be computed faster with newton raphson iteration in one fell swoop than by coposition of the operations. I've worked several years implementing similar optimizations in the reference implementation of ISO/IEC 18026, a standard for digital map conversion. Most of the routines that had optimizations like this added to them saw at least 30% speed improvements. This is a bit of a soft number because many things were reordered to make the pipeline fill better but in general, a complicated function especially of trig fucntions that can be computed in one iteration of well designed newton-raphson will be much faster than the coposition of the CPU's implementation of the component functions. In short, don't write off careful numerics they can provide great sped improvements, just don't use them in code that people will want to understand later if you don't document exactly what you did and why.

    3. Re:It was fast by 91degrees · · Score: 1

      Interesting. But only to be expected with decreasing silicon cost. Low precision invsqrt operations such as those on vector units (e.g. 3dNow!) work in a similar way to this code. Doing it in hardware means you can make certain optimisations that aren't available to software.

    4. Re:It was fast by imsabbel · · Score: 1

      Thats notwithstanding, its still depricated.

      Telling a HUGE paragraph on how old cpus are old doesnt change it.

      --
      HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
    5. Re:It was fast by NFNNMIDATA · · Score: 1

      Wtf? You're using an altered version of the InvSqrt function to calc Sqrt and then comparing the result to a hardware call? Huh? That seems kind of silly. You need to compare apples to apples, you are comparing apples to oranges with red paint on them.

    6. Re:It was fast by adam31 · · Score: 4, Informative
      Okay, let's try x86...

      rsqrtss xmm1, xmm0
      about 5 cycles. And it can pipeline.

      Not a fan of x86? Maybe altivec...
      vrsqrtefp V2, V1
      depends, but 12 cycles probably and pipelined.

      On PS3's SPU it's rsqrte (6 cycles), on 3dNow it's pfrsqrt (8 cycles) both pipelined. Even PS2 had rsqrt (13 cycles). There's just no reason for software reciprocal square root. It's a cool trick, but it's not even useful anymore.

    7. Re:It was fast by systemeng · · Score: 1
      I agree with the imsabbel that for the purposes of graphics and places where you have a vector processor, it is long dead. My examples are from scientific computing: converting digital maps from one type to another. These are double precision calculations which cannot be done on either the vector processor or in single precision. The final result typically must be good to beyond 1.0e-13 and single precision won't work because intermediate calculations are multiplied huge numbers on the order of the radius of the earth squared.

      If you are stuck with the FPU: Doing good numerical math is not dead! In the context of graphics with vectorized GPU's and CPU's that implement this operation, newton-raphson square root is unnecessary.

    8. Re:It was fast by Hamilton+Lovecraft · · Score: 2, Informative

      RSQRTSS is an approximation using a lookup table, accurate to about 12 bits. It neatly replaces the integer trick in the q3 version, but you may still want a NR step to tighten it up afterwards. Also, a few of us are still worried about non-SSE platforms.

      --
      step 3: god dammit, it doesn't work
    9. Re:It was fast by SETIGuy · · Score: 1
      Uhm, aren't those approximations? You still need to do 2 or 3 iterations of newton-raphson to get full float precision. Do you know why these instructions return approximations? Is it maybe because they are a hardware implementation of

      *(int *)&f=0x5f3759df-((*(int *)&f)>>1);
      f=f*(1.5f+0.5*x*f*f);
      ?
    10. Re:It was fast by systemeng · · Score: 3, Informative

      If you have need of a low precision result and you have a vector processor, absolutely. You're right and thanks for the hard numbers. My work is in scientific computing converting digital maps where my final result must be acctuate to better than 1.0e-13 and intermediates are multiplied by numbers on the order of the square of the earth's radius in meters. I'm stuck with the FPU where things aren't so rosy. We usually combine the newton-raphson iteration of 1/sqrt(f(x)) with the newton raphson solution for f(x) where f(x) is something that can only be solved numerically. Once you're stuck with newton-raphson solution, you might as well get the most out of it by using the method to solve 1/(sqrt(f(x)) and get the inverse and the root for free.

    11. Re:It was fast by julesh · · Score: 1

      Err... OK. I'd like to know the author of that page's testing methodology, 'cause I really can't see how the 'faster_sqrtf' function presented could possibly take 426 cycles to execute: it's a memory reference followed by 3 standard integer operations. On the 1.2GHz Opteron described, it should take 3 cycles (during each of which there are 2 spare execution units for out-of-order execution of other code) plus a memory access if there's a cache miss, which with 133MHz memory should take 9 cycles, for a grand total of 12 cycles. I'm assuming the code is already in the cache, because in any situation where you care that much it will be. It'll also be inlined into the code that's calling it, so there shouldn't be a function call overhead. FSQRT has a latency of 35 cycles + memory access overhead. RSQRTSS has a latency of 3 cycles + memory access overhead. So this approach *should* be the same speed as that, and substantially faster than using C's sqrtf() function.

      My guess as to the mistake the author made: the C compiler inlined sqrtf(), but he didn't enable inlining for the code for the other approaches, meaning that there's a function call overhead involved, including a probable pipeline stall for a mispredicted return and a potential pipeline stall for an indirect function call if his compiler was producing position independent call included in the results for the two faster implementations.

      One thing I will grant though: the SSE RSQRTSS instrunction (which isn't available on the target architecture of Quake 3) *is* faster than calculating an inverse square root using the method from Quake3. But then you'll need to calculate 1/, which takes an additional 20 cycles. I think the end result will be pretty much neck-and-neck, if tested properly.

  16. Because that's what he wanted to do. by mcg1969 · · Score: 1

    OK, my subject is flippant, but it is also serious. The alternative you are proposing simply extracts an integer approximation to the float x. The code as written does something different: it extracts the binary representation of the floating-point number. That is, it is extracting the raw bits involved.

  17. Re: A better question: by Anonymous Coward · · Score: 2, Interesting

    Here's a more interesting question: "How does this function compute 1/x^(1/2)?"
    I'm asking, because I have no idea how it works. ;)

  18. Cool Journey by locotx · · Score: 1

    This was a great article. Especially for the script kiddies and wanna be HaXoRz. It's great to enlighten others about the art of true technology. Artcle included the following: * Mystery (who wrote this) * A Puzzle to solve (how did they come up with some a clevery solution) * Games (Quake, Doom, 3D graphics) * Math (Inverse Square-Root) * History (Who's who of 3D programming, VooDoo->3dFx->nVidia) * Lessons (Skill, Assembly Language, Research, OpenSource is good) Kinda brought back feelings of the "Revenge of the Geeks" series by PBS, when you get to place faces, names, ideas, feelings and thoughts to technical acheivements.

  19. Re:It's Funny. Laugh. by Anonymous Coward · · Score: 1, Informative

    Most /.ers are familiar with Fermat's last theorem. If nothing else, there was a lot of media hype when Andrew Wiles announced his solution. Beyond that, it's been in a couple episodes of Star Trek. Explaining it here is like explaining the existence of the aurora to eskimos.

  20. Mirrordot the airticle cut-and-paste by zoftie · · Score: 5, Informative

    Introduction
    Note!

    This article is a republishing of something I had up on my personal website a year or so ago before I joined Beyond3D, which is itself the culmination of an investigation started in April 2004. So if timeframes appear a little wonky, it's entirely on purpose! One for the geeks, enjoy.
    Origin of Quake3's Fast InvSqrt()

    To most folks the following bit of C code, found in a few places in the recently released Quake3 source code, won't mean much. To the Beyond3D crowd it might ring a bell or two. It might even make some sense.

    InvSqrt()

    Finding the inverse square root of a number has many applications in 3D graphics, not least of all the normalisation of 3D vectors. Without something like the nrm instruction in a modern fragment processor where you can get normalisation of an fp16 3-channel vector for free on certain NVIDIA hardware if you're (or the compiler is!) careful, or if you need to do it outside of a shader program for whatever reason, inverse square root is your friend. Most of you will know that you can calculate a square root using Newton-Raphson iteration and essentially that's what the code above does, but with a twist.
    How the code works

    The magic of the code, even if you can't follow it, stands out as the i = 0x5f3759df - (i>>1); line. Simplified, Newton-Raphson is an approximation that starts off with a guess and refines it with iteration. Taking advantage of the nature of 32-bit x86 processors, i, an integer, is initially set to the value of the floating point number you want to take the inverse square of, using an integer cast. i is then set to 0x5f3759df, minus itself shifted one bit to the right. The right shift drops the least significant bit of i, essentially halving it.

    Using the integer cast of the seeded value, i is reused and the initial guess for Newton is calculated using the magic seed value minus a free divide by 2 courtesy of the CPU.

    But why that constant to start the guessing game? Chris Lomont wrote a paper analysing it while at Purdue in 2003. He'd seen the code on the gamedev.net forums and that's probably also where DemoCoder saw it before commenting in the first NV40 Doom3 thread on B3D. Chris's analysis for his paper explains it for those interested in the base math behind the implementation. Suffice to say the constant used to start the Newton iteration is a very clever one. The paper's summary wonders who wrote it and whether they got there by guessing or derivation.
    So who did write it? John Carmack?

    While discussing NV40's render path in the Doom3 engine as mentioned previously, the code was brought up and attributed to John Carmack; and he's the obvious choice since it appears in the source for one of his engines. Michael Abrash was mooted as a possible author too. Michael stands up here as x86 assembly optimiser extraordinaire, author of the legendary Zen of Assembly Language and Zen of Graphics Programming tomes, and employee of id during Quake's development where he worked alongside Carmack on optimising Quake's software renderer for the CPUs around at the time.

    Asking John whether it was him or Michael returned a "not quite".

    -----Original Message-----
    From: John Carmack
    Sent: 26 April 2004 19:51
    Subject: Re: Origin of fast approximated inverse square root

    At 06:38 PM 4/26/2004 +0100, you wrote:

    >Hi John,
    >
    >There's a discussion on Beyond3D.com's forums about who the author of
    >the following is:
    >
    >float InvSqrt (float x){
    > float xhalf = 0.5f*x;
    > int i = *(int*)
    > i = 0x5f3759df - (i>>1);
    > x = *(float*)
    > x = x*(1.5f - xhalf*x*x);
    > return x;
    >}
    >
    >Is that something we can attribute to you? Analysis shows it to be
    >extremely clever in its method and supposedly from the Q3 source.
    >Most people say it's your work, a few say it's Michael Abrash's. Do
    >you know who's responsible, possibly with a history of sorts?

    Not me,

    1. Re:Mirrordot the airticle cut-and-paste by Sargeant+Slaughter · · Score: 1

      Wow! Very cool, makes me wish I had finished getting my CS degree and become a programmer...

      --
      I hear and I forget. I see and I remember. I do and I understand. -Confucius
  21. InvSquirt!?-Sperm. by Anonymous Coward · · Score: 1, Funny

    Nope. It's when it goes in instead of out.

  22. It Was Obviously... by eno2001 · · Score: 3, Funny

    ...the work of John Romero. Apparently he was going to call it the MakeYouMyBitch() function. ;P

    --
    -"...bad old ideas look confusingly fresh when they are packaged as technology" - Jaron Lanier (Digital Maoism on Edge.o
    1. Re:It Was Obviously... by Chris+Burke · · Score: 4, Funny

      He calls every function that. Have you seen his code? Freaking unreadable.

      int MakeYouMyBitch7 () {
          int my_bitch = MakeYouMyBitch() * MakeYouMyBitch2();
          return MakeYouMyBitch36(my_bitch);
      }

      Just terrible.

      --

      The enemies of Democracy are
    2. Re:It Was Obviously... by LittleBigLui · · Score: 1
      Just terrible.

      It's code like that that makes people whip out their katanas and scream "die, die"!
      --
      Free as in mason.
    3. Re:It Was Obviously... by Chris+Burke · · Score: 1

      It's code like that that makes people whip out their katanas and scream "die, die"!

      And the resulting binaries seem to have the same effect on people!

      --

      The enemies of Democracy are
    4. Re:It Was Obviously... by Anonymous Coward · · Score: 0

      Your use of my_* immediately made me think of Perl's take on lexicals via the my operator:

      sub make_you_my_bitch_7 {
          my $bitch = make_you_my_bitch() * make_you_my_bitch_2();
          return make_you_my_bitch_36( $bitch );
      }

  23. Fast functions by LiquidCoooled · · Score: 1

    I haven't seen code like this since I stopped coding Mandelbrot sets in 68k assembler.
    As a young newt I looked at the code inside fractint with awe and discovered some similar marvellous optimisations.
    Building on those and converting to 68k I made what was the fastest mandelbrot calculation I could.

    Another blast from the past.

    --
    liqbase :: faster than paper
  24. Also in Jim Blinn's Corner by Matchstick · · Score: 1

    Specifically, the compilation _Notation, Notation, Notation_, page 130.

  25. It might be damn smart.. by kan0r · · Score: 5, Insightful
    But the first thing I thought when I saw this was: "Damn, that code is a mess!"
    Seriously, try looking away from the genius who obviously wrote it.
    • There is no single comment which would make reading and understanding what happens here much easier!
    • Introduction of a magic number with no explanation whatsoever
    • Magic pointer arithmetics without demystification
    • Portability? Abuse of a single processor architecture, without warning that this would not work on non-x86
    I know it is good code. But it is simply bad code!
    1. Re:It might be damn smart.. by Anonymous Coward · · Score: 0

      It is called job security. :) Be happy there is even formatting.

    2. Re:It might be damn smart.. by kan0r · · Score: 1

      The guy who wrote this won't probably have a problem finding a job anyway.
      However, I *still* would like to suggest "The Practice of Programming" by Kernighan/Pike to the author. ;-)

    3. Re:It might be damn smart.. by Anonymous Coward · · Score: 2, Insightful

      There is no single comment which would make reading and understanding what happens here much easier

      Exactly. It's very simple code, and if you can't tell how it works by looking at it you need to first brush up on your applied math and IEEE 754. Then it will be obvious.

    4. Re:It might be damn smart.. by Anonymous Coward · · Score: 0

      At this level of play you don't use line-level comments. This ain't your father's COBOL-mobile.

    5. Re:It might be damn smart.. by An+ominous+Cow+art · · Score: 1

      Unfair moderation, you make valid points.

    6. Re:It might be damn smart.. by davidbrit2 · · Score: 1

      Well, it could certainly use a comment to explain the magic behind the bit fiddling, but it's not hard to figure out what it's doing just by looking at it for a few minutes. Hell, I don't do any 3D programming, and I spend most of my time in C# and SQL, and I could still figure out what it was doing, and that it was probably a floating point binary format trick - just not precisely why it worked. :)

      Portability I'll grant you. I'm not sure what kind of results you'd get on a 64 bit architecture, or one with different endianness. Maybe I'll try it on my PowerPC iBook for laughs just to see how "wrong" the results are...

    7. Re:It might be damn smart.. by Tyler+Durden · · Score: 2, Insightful

      Ummm... how is endianness going to affect the calculation at all? A 64 bit architecture should royally screw up the calculation, but given when the code was written a 32 bit architecture would be a perfectly reasonable assumption.

      If I wanted to be picky, I'd harp on the magic number or at least add a comment at the top saying it used Newton-Raphson iteration. As far as the bit fiddling goes, anyone who does 3D programming in C will instantly recognize what it does.

      --
      Happy people make bad consumers.
    8. Re:It might be damn smart.. by Anonymous Coward · · Score: 0

      Yeah, Like im so sure YOU could do better than a PROFESSIONAL GAME CODER.

    9. Re:It might be damn smart.. by d_strand · · Score: 1

      aww come on!

      the function is a couple of lines. Its name tells you everything you need to know about it. Portability was not an issue since it's targeted at x86. And the pointer operation is standard stuff if you've done any C programming.

    10. Re:It might be damn smart.. by Goaway · · Score: 2, Informative

      Anybody who does 3D programming in C at that level will instantly recognize Newton-Raphson.

    11. Re:It might be damn smart.. by Anonymous Coward · · Score: 0

      /* does stuff */

    12. Re:It might be damn smart.. by exp(pi*sqrt(163)) · · Score: 2, Funny

      Yeah, the computing industry is full of people who resent smart programmers who don't program by the book. That's fine. Every programmer eventually finds their own level and there's no shortage of startup businesses who need web page designers.

      --
      Doesn't it make you feel good to know that our freedoms are protected by politicans, lawyers and journalists.
    13. Re:It might be damn smart.. by Anonymous Coward · · Score: 2, Informative

      The actual code in Quake 3 is commented quite well:

      long i;
      float x2, y;
      const float threehalfs = 1.5F;

      x2 = number * 0.5F;
      y = number;
      i = * ( long * ) // evil floating point bit level hacking
      i = 0x5f3759df - ( i >> 1 ); // what the fuck?
      y = * ( float * )
      y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration

    14. Re:It might be damn smart.. by Anonymous Coward · · Score: 0

      What about Java programmers? We're still cool, right!?

    15. Re:It might be damn smart.. by CryoPenguin · · Score: 2, Informative

      A 64 bit architecture wouldn't necessarily harm anything.
      The only non-portable assumptions made by the code are: int is 32 bits, and float is stored in IEEE-754 format.
      The first assumption is still true in 64 bit gcc, and even if it isn't you can just write int32_t instead.

    16. Re:It might be damn smart.. by Psycosys · · Score: 1

      I just ran it on a G4 iMac. With one iteration: InvSqrt(4.000000) = 0.499154 1/sqrt(4.000000) = 0.500000 InvSqrt(10.000000) = 0.315686 1/sqrt(10.000000) = 0.316228 InvSqrt(2.000000) = 0.706930 1/sqrt(2.000000) = 0.707107 InvSqrt(100.000000) = 0.099845 1/sqrt(100.000000) = 0.100000 InvSqrt(0.100000) = 3.157232 1/sqrt(0.100000) = 3.162278 With two iterations: InvSqrt(4.000000) = 0.499998 1/sqrt(4.000000) = 0.500000 InvSqrt(10.000000) = 0.316226 1/sqrt(10.000000) = 0.316228 InvSqrt(2.000000) = 0.707107 1/sqrt(2.000000) = 0.707107 InvSqrt(100.000000) = 0.100000 1/sqrt(100.000000) = 0.100000 InvSqrt(0.100000) = 3.162266 1/sqrt(0.100000) = 3.162278

    17. Re:It might be damn smart.. by Psycosys · · Score: 1

      Hmmm, I should have previewed. Try two!

      I just ran it on a G4 iMac.

      With one iteration:
      InvSqrt(4.000000) = 0.499154
      1/sqrt(4.000000) = 0.500000

      InvSqrt(10.000000) = 0.315686
      1/sqrt(10.000000) = 0.316228

      InvSqrt(2.000000) = 0.706930
      1/sqrt(2.000000) = 0.707107

      InvSqrt(100.000000) = 0.099845
      1/sqrt(100.000000) = 0.100000

      InvSqrt(0.100000) = 3.157232
      1/sqrt(0.100000) = 3.162278

      With two iterations:
      InvSqrt(4.000000) = 0.499998
      1/sqrt(4.000000) = 0.500000

      InvSqrt(10.000000) = 0.316226
      1/sqrt(10.000000) = 0.316228

      InvSqrt(2.000000) = 0.707107
      1/sqrt(2.000000) = 0.707107

      InvSqrt(100.000000) = 0.100000
      1/sqrt(100.000000) = 0.100000

      InvSqrt(0.100000) = 3.162266
      1/sqrt(0.100000) = 3.162278

      Doesn't look too "wrong" to me.

    18. Re:It might be damn smart.. by Tyler+Durden · · Score: 1

      OK cool. I was wondering about this after I wrote my post.

      Thanks!

      --
      Happy people make bad consumers.
    19. Re:It might be damn smart.. by dkf · · Score: 1
      float is stored in IEEE-754 format
      I write code that gets ported to some very odd architectures indeed, and nearly everything these days is IEEE-754 for float, especially when the values go to memory. Things get odder when you're purely in registers and using doubles, but neither of those are the case here.

      I'd still love some analysis of whether this hack is such a good win on a modern processor; it's not beyond the bounds of possibility that a good FPU/compiler/libm might be able to beat the hack starting from idiomatic code...

      --
      "Little does he know, but there is no 'I' in 'Idiot'!"
    20. Re:It might be damn smart.. by Anonymous Coward · · Score: 1, Insightful

      Java is a mediocre language for mediocre programmers to write mediocre programs in. Sorry be the one to tell you.

      You can change, you know. It's not too late!

    21. Re:It might be damn smart.. by CryoPenguin · · Score: 1
      Ok, I wrote some code to benchmark various versions: http://akuvian.org/src/invsqrt.tar.gz.
      invsqrt_q3 is the version from the article. It gets miscompiled unless I add -fno-strict-aliasing.
      invsqrt_union is the same except with a union instead of pointer casts. This is the more correct way to write it in C, and fixes the aliasing issue.
      invsqrt_c is the trivial 1.0/sqrtf(x).
      For the benchmark, all of these were inlined inside a loop, so the SSE and 3DNOW versions do not include any time spent moving the result to or from the general floating-point registers.

      athlon64:
        30.15 cycles in invsqrt_c
        34.02 cycles in invsqrt_q3
        34.02 cycles in invsqrt_union
          3.02 cycles in invsqrt_sse
          3.02 cycles in invsqrt_3dnow

      pentium4:
        46.21 cycles in invsqrt_c
        70.03 cycles in invsqrt_q3
        84.90 cycles in invsqrt_union
          4.06 cycles in invsqrt_sse

      So I'd say that, no, the hack is no longer a win on modern processors.
    22. Re:It might be damn smart.. by Anonymous Coward · · Score: 0

      There is no single comment which would make reading and understanding what happens here much easier!

      Sorry, I forgot the comments. Here they are:
      float invSqrt(float x)
      {
      float xhalf = 0.5f*x;
      int i = *(int *)&x;
      i = 0x5f3759df - (i >> 1); /*** DRUNK!!! FIX LATER ***/
      x = *(float *)&i;
      x = x * (1.5f - xhalf * x * x);
      return x;
      }
  26. Error! by jrmiller84 · · Score: 3, Funny

    float InvSqrt(float x) { float xhalf = 0.5f*x; int i = *(int*) i = 0x5f3759df - (i >> 1); x = *(float*) x = x*(1.5f - xhalf*x*x); return x; } Asking John whether it was him or Michael returned a "not quite". But it's supposed to return a float!

    --
    I will forever be a student.
    1. Re:Error! by jrmiller84 · · Score: 1

      I receive an F for formatting. But it still would have compiled!

      --
      I will forever be a student.
    2. Re:Error! by ralmeida · · Score: 1

      Glad it's not Python, then.

      --
      This space left intentionally blank.
    3. Re:Error! by rmdir+-r+* · · Score: 1
      But it's supposed to return a float!

      *cough*. It does. TFAs version is slightly different. Observe:

      x = *(float*)&i;

      You missed the cast :).

    4. Re:Error! by Anonymous Coward · · Score: 0

      Now THAT'S funny :)

  27. Re:Bad programmer, no cookie by petermgreen · · Score: 1


    All floating point operations are approximations, writing good floating point code is all about understanding the approximations and tradeoffs you are making.

    and floading point operations can be one of the slowest parts of code

    i have to agree on the lack of comments in the source for games, it can make them very hard to penatrate (particularlly when combined with heavy use of macros)

    --
    note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
  28. Evil math--it approximates the inverse square root by Anonymous Coward · · Score: 2, Informative

    They're computing the inverse square root, usually written as something like x^(-2) (which is the same as 1/sqrt(x) of course) via, umm, err, magic :-)

    Seriously, they're using some evil numerical techniques that approximate the actual value of that function to withing a few percent very quickly. You could do more steps of Newton's root finding method or other complex things, but that would be slow and this is only meant to find things that are "good enough" for you to draw on someone's screen.

    This function is used in graphics programming, specifically in things like the Quake 3 source code (as well as other places). First, it calculates x/2, then it gets the bits of the float x as an integer (without modifying them), then it does the "magic" step with the magic number they calculated via evil mathematics to give the best starting point they can for Newton's method using the x >> 1 to divide the float-as-integer in two (and make it positive, since your sign bit falls into the exponent of the float! ... that's one of the crazy parts the PDF analyzes). Then it changes the int-float back into a plain float and does one step of Newton's method using the guess derived from the magic part.

    Don't mind me. I may have a degree in mathematics, and I even (somehow!) made it through at least one class on numerical analysis, but all I can tell you is that this stuff is evil :-) Read the PDF linked in the article if you want someone who still knows what they're talking about to explain it; I'm just trying my best to remember anything after all these years. I swear that analysis class gave me some Pavlovian instinct to shudder in dread whenever I see them pulling out the strange substitutions and Taylor expansions so they can calculate the error...

  29. Article Text by AngusL · · Score: 0, Redundant

    Introduction Note! This article is a republishing of something I had up on my personal website a year or so ago before I joined Beyond3D, which is itself the culmination of an investigation started in April 2004. So if timeframes appear a little wonky, it's entirely on purpose! One for the geeks, enjoy. Origin of Quake3's Fast InvSqrt() To most folks the following bit of C code, found in a few places in the recently released Quake3 source code, won't mean much. To the Beyond3D crowd it might ring a bell or two. It might even make some sense. float InvSqrt (float x){ float xhalf = 0.5f*x; int i = *(int*) i = 0x5f3759df - (i>>1); x = *(float*) x = x*(1.5f - xhalf*x*x); return x; } Finding the inverse square root of a number has many applications in 3D graphics, not least of all the normalisation of 3D vectors. Without something like the nrm instruction in a modern fragment processor where you can get normalisation of an fp16 3-channel vector for free on certain NVIDIA hardware if you're (or the compiler is!) careful, or if you need to do it outside of a shader program for whatever reason, inverse square root is your friend. Most of you will know that you can calculate a square root using Newton-Raphson iteration and essentially that's what the code above does, but with a twist. How the code works The magic of the code, even if you can't follow it, stands out as the i = 0x5f3759df - (i>>1); line. Simplified, Newton-Raphson is an approximation that starts off with a guess and refines it with iteration. Taking advantage of the nature of 32-bit x86 processors, i, an integer, is initially set to the value of the floating point number you want to take the inverse square of, using an integer cast. i is then set to 0x5f3759df, minus itself shifted one bit to the right. The right shift drops the least significant bit of i, essentially halving it. Using the integer cast of the seeded value, i is reused and the initial guess for Newton is calculated using the magic seed value minus a free divide by 2 courtesy of the CPU. But why that constant to start the guessing game? Chris Lomont wrote a paper analysing it while at Purdue in 2003. He'd seen the code on the gamedev.net forums and that's probably also where DemoCoder saw it before commenting in the first NV40 Doom3 thread on B3D. Chris's analysis for his paper explains it for those interested in the base math behind the implementation. Suffice to say the constant used to start the Newton iteration is a very clever one. The paper's summary wonders who wrote it and whether they got there by guessing or derivation. So who did write it? John Carmack? While discussing NV40's render path in the Doom3 engine as mentioned previously, the code was brought up and attributed to John Carmack; and he's the obvious choice since it appears in the source for one of his engines. Michael Abrash was mooted as a possible author too. Michael stands up here as x86 assembly optimiser extraordinaire, author of the legendary Zen of Assembly Language and Zen of Graphics Programming tomes, and employee of id during Quake's development where he worked alongside Carmack on optimising Quake's software renderer for the CPUs around at the time. Asking John whether it was him or Michael returned a "not quite". -----Original Message----- From: John Carmack Sent: 26 April 2004 19:51 Subject: Re: Origin of fast approximated inverse square root At 06:38 PM 4/26/2004 +0100, you wrote: >Hi John, > >There's a discussion on Beyond3D.com's forums about who the author of >the following is: > >float InvSqrt (float x){ > float xhalf = 0.5f*x; > int i = *(int*) > i = 0x5f3759df - (i>>1); > x = *(float*) > x = x*(1.5f - xhalf*x*x); > return x; >} > >Is that something we can attribute to you? Analysis shows it to be >extremely clever in its method and supposedly from the Q3 source. >Most people say it's your work, a few say it's Michael Abrash's. Do >you know who's responsible, possibly with a history of sorts? No

    1. Re:Article Text by AngusL · · Score: 1

      Well, I made a mess of that. Poor formatting and not anonymous. I tried.

  30. hakmem by trb · · Score: 2, Interesting
    The article barely mentions HAKMEM, but the invsqrt hack is reminiscent of the HAKMEM programming hacks, which were published in 1972. Several of these hacks use bit fiddling with magic constants to perform tasks in straight-line code, that you would ordinarily think of doing with iteration.

    HAKMEM is classic bathroom reading for hackers. If you want to do it up old-school, print a copy from original scans, double-sided.

    1. Re:hakmem by Hamilton+Lovecraft · · Score: 1

      This is the most satisfying way to get a hard copy of HAKMEM.

      --
      step 3: god dammit, it doesn't work
  31. Really old news... by Jahz · · Score: 1

    As the article says, this snippet has been around since the original Quake came about. Back then (a decade?) the hardware was so much slower than it is today that this approx function was absolutely needed to write the game. It's not new, nor mysterious. In fact I first encountered this code on a forum where somebody linked to the top10 most interesting code snippets (or something to that end)... I've even used it my own code here and there. The crazy hex number exploits IEEE floating point.
     
    A not that is not mentioned here: you can repeat the IEEE magic number line again for a more accurate result :)

    --
    There are 10 types of people in the world. Those who understand binary and those who do not.
  32. O_o by Rankiri · · Score: 1

    Look son, crazy people...

    1. Re:O_o by Anonymous Coward · · Score: 0

      Can't mod you, but I thought that was funny.

  33. Real men used no floating-point! by descubes · · Score: 0

    Back around 1990, I wrote a game called Alpha Waves (Preview). Back then, the problem was not using integers to approximate floating point math, it was avoiding integer multiplications, because those were expensive compared to additions. For those of you with a curious mind, you can take a look at http://cc3d.free.fr/cube.s for the original source code.

    So here is a quizz, the first one answering wins the admiration of the crowd:

    - How did that engine avoid multiplications?

    - How did it compute sine and cosines?

    - What low-level hardware trick did it use in two-player mode, and how could that be used for performance tuning?

    I believe Alpha Waves was the first video game on personal computers with full-screen "real" 3D (i.e. not scaled sprites), and I also believe it was the first 3D game with two simultaneous players sharing a screen. Would anybody have data to confirm or infirm that?

    --
    -- Physics for Dummies

    --
    -- Did you try Tao3D? http://tao3d.sourceforge.net
    1. Re:Real men used no floating-point! by TrappedByMyself · · Score: 1

      Just off the top of my head...

      - How did that engine avoid multiplications?

      Creative bit shifts and additions?

      - How did it compute sine and cosines?

      lookup tables?

      - What low-level hardware trick did it use in two-player mode, and how could that be used for performance tuning?

      Eh? need more info, or do I need to look at the links to understand the question?

      --

      Help me take back Slashdot. When did 'News for Nerds' become 'FUD and Conspiracy Theories for Extremist Nutjobs'?
    2. Re:Real men used no floating-point! by descubes · · Score: 1

      >> How did that engine avoid multiplications?
      > Creative bit shifts and additions?

      That would still be multiplications by another name. And by the way, on a 68K, shifting left N bits took a time proportional to N.

      >>How did it compute sine and cosines?
      > lookup tables?

      Yeah, right. OK, there was a lookup table somewhere, that much is obvious. But I guess I should rephrase the question as: how do you compute a rotation efficiently using only integer arithmetic?

      >> What low-level hardware trick did it use in two-player mode, and how could that be used for performance tuning?
      > Eh? need more info, or do I need to look at the links to understand the question

      Of course, where would the reverse engineering fun be?

      Hint: there is a C++ port on SourceForge.

      --
      -- Did you try Tao3D? http://tao3d.sourceforge.net
    3. Re:Real men used no floating-point! by Jerry+Coffin · · Score: 1
      - How did that engine avoid multiplications?

      There are simply too many different possibilities for there to be one reasonable guess. You can use shifts and adds, but that's rarely faster than the hardware, and at that time pipelining usually wasn't very useful. Table lookups of various sorts were fairly common, frequently with some bit-masking or -shifting to keep the table size reasonable.

      - How did it compute sine and cosines?

      CORDIC?

      Since you haven't even specified what hardware it was for, attempting to guess at the low-level hardware trick(s) you used would be pretty pointless.

      I believe Alpha Waves was the first video game on personal computers with full-screen "real" 3D (i.e. not scaled sprites), and I also believe it was the first 3D game with two simultaneous players sharing a screen. Would anybody have data to confirm or infirm that?

      "infirm" ? Do you mean "deny", perhaps?

      It wasn't the first game on a personal computer with real 3D -- I wrote some real 3D code that ran on a C64 around 1982 or '83. OTOH, the "ran" there is an exaggeration -- "walked" would be closer, and "stumbled" would probably be the right word. The code was all right, but there's only so much you can do on an 8-bit, 1 MHz processor. A couple of us played it enough to say it worked, but that's about it.

      Oh, it would also depend a bit on what you're talking about as 3D -- this was wireframes only. I wrote some lighting code, but never used it except to create a couple of static images (one frame per night...)

      --
      The universe is a figment of its own imagination.
    4. Re:Real men used no floating-point! by Anonymous Coward · · Score: 0

      I believe Alpha Waves was the first video game on personal computers with full-screen "real" 3D

      Really? I remember playing 3D Starstrike, a real 3D wireframe Star Wars ripoff released in 1985. In 1986 we had Starstrike II, now in solid, flat lit 3D. Had to wait a few more years before texturing became viable.

      And did I mention this was running at playable speeds on 8 bit Z80 machines? (Spectrum & Amstrad CPC)

      I probably shouldn't mention the wireframe hacking on Apple ][ some of us were doing in 1980...

    5. Re:Real men used no floating-point! by descubes · · Score: 1

      How did that engine avoid multiplications?

      There are simply too many different possibilities for there to be one reasonable guess. You can use shifts and adds, but that's rarely faster than the hardware, and at that time pipelining usually wasn't very useful. Table lookups of various sorts were fairly common, frequently with some bit-masking or -shifting to keep the table size reasonable.

      Well, that's why I gave the source code.

      Anyway, the answer was: the engine was pre-computing 3 axis vectors. That required a few mulitplications. Then, it would add them to create vectors like 1Z, 2Z, 3Z, 1X, 2X, 3X. Finally, all objects were defined using displacements like 1X 2Z -1Y and so on. As a result, the final rendering was done with practically only additions. If memory serves me right, there were only about 6 multiplications per object. That trick, and a very optimized polygon-filling routine, allowed Alpha Waves to display about 10-20 frames per second full screen of 3D graphics on a 8MHz 16-bit CPU. As I wrote in the parent post, I believe it was the first game to ever do that (3D games at the time typically displayed quarter-frame graphics, or full-screen vector graphics).

      How did it compute sine and cosines?

      CORDIC?

      Don't be silly! CORDIC is a floating-point algorithm.

      The answer was that angles were represented using 256 units for a rotation, which allowed a byte indexing in a table (with an offseting by 64 to select between sin and cos). Each entry in the table was between +/-32767. Multiplying that by a 16-bit coordinate would give a 32-bit value. Shifting the value down 15 times was not an option, since the 68K did not have a barrel shifter, so that would have been slow. Instead, I used the "swap" instruction, that swapped the high and low half of a 32-bit word. So the computation multiplied by sin(x)/2, not by sin(x), and I had to correct for that. As a matter of fact, it was not even exactly sin(x), because 32767 / 65536 is not exactly 0.5...

      Since you haven't even specified what hardware it was for, attempting to guess at the low-level hardware trick(s) you used would be pretty pointless.

      I did give enough pointers to answer the question. Of course, that means you are curious enough to actually follow a link. The original source code was for Atari ST. The hardware trick I referred to was that there were 16 registers to change the color palette. Using an interrupt, I was changing the color palette in the middle of the screen, so that the "top" and the "bottom" half of the screen could show users that were in different rooms, because the "mood" of each room was given by a different color palette.

      The palette-changing trick was relatively frequently used at the time. What made it useful for performance tuning was that you could change the screen border color on entry to a routine, and change it again on exit, and the height of the color band on the side of the screen would tell you how much time you spent in the routine.

      I believe Alpha Waves was the first video game on personal computers with full-screen "real" 3D (i.e. not scaled sprites), and I also believe it was the first 3D game with two simultaneous players sharing a screen. Would anybody have data to confirm or infirm that?

      "infirm" ? Do you mean "deny", perhaps?

      Probably

      It wasn't the first game on a personal computer with real 3D -- I wrote some real 3D code that ran on a C64 around 1982 or '83. OTOH, the "ran" there is an exaggeration -- "walked" would be closer, and "stumbled" would probably be the right word. The code was all right, but there's only so much you can do on an 8-bit, 1 MHz processor. A couple of us played it enough to say it worked, but that's about it.
      Oh, it would also depend a bit on what you're talking about as 3D -- this was wireframes only. I wrote some lighting code, but never used it except to cre

      --
      -- Did you try Tao3D? http://tao3d.sourceforge.net
    6. Re:Real men used no floating-point! by descubes · · Score: 1

      Really? I remember playing 3D Starstrike, a real 3D wireframe Star Wars ripoff released in 1985. In 1986 we had Starstrike II, now in solid, flat lit 3D. Had to wait a few more years before texturing became viable. Most games of the time (practically all of them flight or space simulators) were displaying 3D on quarter or half screens. The games you refer to were no exception. As an aside, the 8-bit variants were generally displaying a single shaded object at a time, others being wireframe. Another trick was to display scaled sprites (most car racing games were doing that).

      It may seem like a small thing today, but filling the screen with flat-shaded 3D graphics gave a much more immersive experience that had a serious "wow" factor at the time. I still remember the Infogrames guy staying in front of the computer playing my demo for maybe two hours straight, amazed at how fluid it was. Infogrames knew all about the games of the time, and they knew of nothing like it. Now, maybe there was a game they did not know about, I've always been curious.

      Of course, two years later, everybody was doing it. Though the particular gameplay remains somewhat unique to this day, and good enough that my kids still ask me to play it from time to time, swithing away from today's ultra-realist 3D...
      --
      -- Did you try Tao3D? http://tao3d.sourceforge.net
    7. Re:Real men used no floating-point! by Anonymous Coward · · Score: 0

      Blame my high school Computer Studies teacher if I got this wrong, but..
      Isn't _ALL_ multiplication on modern cpus achieved via addition anyway? Your question doesn't seem logical by my point of view.

    8. Re:Real men used no floating-point! by Jerry+Coffin · · Score: 1
      Don't be silly! CORDIC is a floating-point algorithm.

      Don't be silly! CORDIC is an algorithm. Other than being something binary, the representation you use has no relevance at all (beyond the fact that the table of secant constants needs to be represented in the format of the result).

      In fact, CORDIC is primarily useful when you don't have floating point hardware. If you have FP hardware (at least multipliers), you're frequently better off implementing one of the typical power series. In particular, CORDIC gives one more bit for each iteration, while a Taylor series (for only the most obvious example) doubles the number of bits with each term. If you only a few bits (e.g. 8 or 10) CORDIC works very well -- but if you have FP hardware and want something like the usual 64 bit result, 5 terms of the Taylor series will get you there faster and a Chebychev polynomial is generally faster still. Of course, back in the '50s when CORDIC was invented, the number of gates to do fast FP multiplication just wasn't available, so CORDIC ruled...

      I did give enough pointers to answer the question.

      At that point it's not a guess any more -- it's just a job of reverse engineering some ancient, obsolete source code. Unless you had far more interesting tricks than anything you've mentioned here, there's not much point or fun in that.

      --
      The universe is a figment of its own imagination.
  34. Intersection of hardware and software by HomelessInLaJolla · · Score: 1

    To all those wondering why John bothers to push out the source to id's game engines after the fact, the snippet of code at the very top of this article is a poster child for why. Not only do you get well-programmed and well-optimised 3D engines to modify and learn from, you get gems like the fast invsqrt function to show you that it's not all about the 3D hardware, and that software is arguably even more of a factor when analysing 3D performance
    When this software code it turned into a hardware circuit and patented then the magic number on this line,"i = 0x5f3759df - (i>>1);" will be the focus of the patent on the hardware circuit. It becomes very important to know who managed to skillfully approximate that mathematical technique with the code.

    Note that the final magic number isn't the one implicated in the paper on lamont.org though the paper does test it with two other possible candidates. Gary Tarolli mentions that he may have changed which particular magic number was used in the final code but that he didn't write the code itself. Likely, if this particular fast square root function is embedded as a circuit and the circuit patented, the legal documents will attribute the code to Gary since he made the most significant contribution of actually choosing which number was used in the code which was released.

    The question still remains, though: who wrote the routine?
    --
    the NPG electrode was replaced with carbon blac
    1. Re:Intersection of hardware and software by gnasher719 · · Score: 1

      '' When this software code it turned into a hardware circuit and patented then the magic number on this line,"i = 0x5f3759df - (i>>1);" will be the focus of the patent on the hardware circuit. It becomes very important to know who managed to skillfully approximate that mathematical technique with the code. ''

      That is of course pure nonsense, because a hardware implementation will just take the lowest bit of the exponent and a few bits of the mantissa as an index into a lookup-table to get two coefficients for a stepwise linear function.

  35. Alternate mod type by JBHarris · · Score: 2, Funny

    I wish there was an option to mod some of these posts "+1 Whoa".

  36. Future Crew? by SignalFreq · · Score: 1

    Perhaps this is from the old days of the Future Crew demo scene? They did some amazing things with limited CPU cycles back then (1988-1993).

    1. Re:Future Crew? by oc255 · · Score: 1

      Great reference to the famous (to me) Future Crew but it's looking like it predated graphics. We'll likely never know, like who wrote "Apple Pie" in the cooking world.

      Never mind this comment, this thread and TFA humbles me down to the carpet and I just want to listen.

  37. This was done in the 60's and 70's by Anonymous Coward · · Score: 1, Interesting

    I know my Dad used this technique in the 60's or 70's for vector normalization. It was quite valuable on systems that had no hardware floating point divide (and nothing in that era had a hardware square-root).

    Being non-iterative it also has a deterministic speed, which is valuable in
    real-time applications.

    Integer adds/subtracts on floating-point representations were also sometimes used to approximate multiplication and division (by adding/subtracting the
    exponent)

  38. Correction by DeadCatX2 · · Score: 5, Informative

    x^(-2) (which is the same as 1/sqrt(x) of course) Actually x^(-2) == 1/(x^2).

    I believe you meant to say x^(-1/2)

    Too bad the people modding you up don't have math degrees. =P
    --
    :(){ :|:& };:
    1. Re:Correction by SpeedBump0619 · · Score: 1
      Too bad the people modding you up don't have math degrees. =P

      Or, for instance, 7th grade algebra.
    2. Re:Correction by Anonymous Coward · · Score: 0

      7th grade algebra? You mean 9th grade algebra unless you're a year ahead then maybe you took it in 8th but I doubt 7th grade.

    3. Re:Correction by mkw87 · · Score: 1

      or logical thinking skills

      --
      Arguing with an engineer is like wrestling a pig in mud. Soon, you realize the pig is dirty, and he likes it.
    4. Re:Correction by iq+in+binary · · Score: 1

      Actually x^(-2) == 1/(x^2).

      I believe you meant to say x^(-1/2)

      Too bad the people modding you up don't have math degrees. =P
       


      Degrees? Highschool algebra would teach you that much ;) Well, in decent highschools........

      --
      Of all the Universal Constants, here's one I know: Nice guys finish last ;)
    5. Re:Correction by lkeagle · · Score: 3, Insightful

      You live in America, don't you?

    6. Re:Correction by Jesselnz · · Score: 1

      I live in America, and I learned that in middle school.

    7. Re:Correction by lkeagle · · Score: 1

      You're by far the exception, not the norm.

      Some American students don't get to it at all even by the time they're finished with high school.

    8. Re:Correction by WilliamSChips · · Score: 1

      I'm American and learned calculus in sixth grade. :P

      --
      Please, for the good of Humanity, vote Obama.
    9. Re:Correction by Anonymous Coward · · Score: 0

      Did you also learn how God created the universe, and that the planet is only 4000 years old? And that dinosaurs were placed on earth by God for some unknown purpose, perhaps to trick us?

      The American education system is laughable, and the whole world knows it. You're too busy spending money on killing anyone who's off-white and has an accent to teach your kids.

    10. Re:Correction by WilliamSChips · · Score: 1

      Actually, I learned Evolution. We have to give these stupid disclaimers, and there are still students who got offended, but I live pretty much on the outskirt of the Bible Belt.

      --
      Please, for the good of Humanity, vote Obama.
    11. Re:Correction by Anonymous Coward · · Score: 0

      Some American students don't get to it at all even by the time they're finished with high school.

      We call them illegal immigrants.

  39. Why was parent modded troll? by DeadCatX2 · · Score: 1

    We called it Numerical Analysis at Penn State. I didn't have to take it, I was only getting a minor in Comp Sci, and a major in Comp Eng.

    But...I learned how to approximate using Newton's method in Calc 1. ...in high school.

    Actually, I think the genius is the guy who came up with the magic bit sequence.

    --
    :(){ :|:& };:
    1. Re:Why was parent modded troll? by ebyrob · · Score: 1

      Sorry. You're right. We need a moderation titled "Well crafted Troll" so people will quit turning off their brain when they see a post marked "Troll".

      Note: You answered your own question with that last line. This beautiful hack has about as much to do with Newton's Method as it does with C language structure. They're both in there, but they're really just scenery.

    2. Re:Why was parent modded troll? by DeadCatX2 · · Score: 1

      I don't think OP deserved to be downmodded, though. He may have been arrogant, but he was no worse than the typical background noise that exists at score 1. I don't think he was being belligerent enough to warrant a troll moderation.

      I personally think people are WAY to quick to downmod.

      --
      :(){ :|:& };:
  40. It may in part be related to something I did ... by Ninja+Programmer · · Score: 5, Informative

    Here's an old version of one of my webpages:

            http://web.archive.org/web/19990210111728/www.geoc ities.com/SiliconValley/9498/sqroot.html

    And here's an updated version of the same page:

            http://www.azillionmonkeys.com/qed/sqroot.html

    It isn't an exact rendering of the code in question, but it explains enough for any skilled hacker to 1) understand what's going on and 2) modify the code to create the resulting code that's in the Quake 3 source. Furthermore this web page has existed since about 1997 (archive.org doesn't go back that far for some reason.)

    Now *IF* in fact the code origin comes from someone who took ideas from my site, I should point out that *I* am not the originator of the idea either (though I did write the relevant code). Bruce Holloway (who I credit on the page) was the first person to point out this technique to me at around the 1997 timeframe (prior to this, I created my own method which is similar, but not really as fast). (Vesa Karvonen informed by of the technique (through a code snippet with no explanation) at roughly the same time as well.) It was a technique well known to hard core 3D accelerator and CPU implementors, and follows an intentional design idea from the IEEE-754 specification.

    Prof. William Kahan, one of the key people who specified the IEEE-754 standard (the standard for floating point the many CPUs use, starting with Intel's 8087 coprocessor) apparently presented this idea, and is the source for where Bruce Holloway got the idea. The IEEE-754 standard came out around the 1982 time frame. Though, its very likely that these ideas originate from even earlier in computing history.

  41. Hardware not that much faster, if at all. by SETIGuy · · Score: 1
    Some timings (in 20 second runs on exponentially distributed random floats) for reciprocal sqrt for 32 bit floats with differences only allowed in last bit including proper handling of sqrt(-0.0) (== -0.000)

    • P4 (f15:m2:s4) 2.65 GHz. Carmack: 24.4 ns, x87: 36.0 ns, SSE: 30.8 ns
    • P4 (f15:m2:s9) 3.00 GHz (2 threads). Carmack: 16.7 ns, x87: 26.9 ns, SSE: 23.9 ns
    • Opteron 285 (f15:m33:s2) 2.60 GHz. Carmack: 12.9 ns, x87: 21.0 ns, SSE: 16.0 ns
    • Opteron 246 (f15:m5:s10) 2.00 GHz. Carmack: 15.5 ns, x87: 27.7 ns, SSE: 20.9 ns
    • Opteron 244 (f15:m5:s10) 1.80 GHz. Carmack: 19.3 ns, x87: 31.3 ns, SSE: 23.2 ns
    • PPC G5 1.80 GHz. Carmack: 45.7 ns, FPU 34.6 ns
    • UltraSPARC II 400 MHz. Carmack: 165.2 ns, FPU 163.7 ns

    On intel machines, Carmack is often, but not always, faster then the FPU. In fact, for some it can be faster to use Carmack to compute the non-reciprocal square root. And, no. I don't quite know why G5 sucks so bad in this test. The same source code is used for all of them.

  42. Re:Bad programmer, no cookie by DeadCatX2 · · Score: 2, Interesting

    It is probably a premature optimization that will slow computations down. I imagine they ran some profiling tools and found that they were spending a substantial amount of time in calls to sqrt(), and figured that the division was also time-consuming and for an operation that is performed on so many pixels, it was worthwhile to optimize this particular routine.

    This is John Carmack we're talking about. I imagine he knows how to use a code profiler.

    The approximations it involve are likely to cause bugs. Floating point always involves approximations. If you don't like them, use a better approximation (double precision). But if you're just trying to, say, figure out which pixel to color, and you approximate the pixel to a few decimal places...I think you're good to go.

    I mean, it's not like they're running scientific simulations with this. It's Quake3.

    Furthermore, converting floating-points to integers and then doing calculations on them is a trick that has been known for years, long before Quake3. They weren't converting floats to ints. They manipulated the bitwise representation of the float. I'm not sure if that's what you meant, though.
    --
    :(){ :|:& };:
  43. Re:Evil math--it approximates the inverse square r by sillybilly · · Score: 1

    I think the original constant might have been derived with simple Monte Carlo methods, and the method itself is probably from some book from long time ago, from the 50's when vacuum tube computers first came onto the scene and such speed gains by optimizing binary code to the limit by phd mathematicians was at the forefront of research. To quote the pdf article you cite, "However the analysis should be thoroughly tested on a real machine, since hardware often does not play well with theory." That's nonsense - the machine is predictable, therefore plays well with theory, it's you who get overly engulfed and lost in the derivations by following too many threads of thought and losing track of your assumptions and approximations, it's the theory that's got the problem, and the machine is just an easy and lazy way to double check your thoughts, but you could technically do the same calculations the machine does with a paper and pencil, it would just take longer. I think this article is a bait, to see who will point out subtle issues in the report that might have been directly planted, and such people who dare figure out wat's wrong will instantly win a prize of a dozen shitflies settling down onto them from the NSA/FBI/CIA/MSFT/Dept. of Homeland security, and instantly win a "slavejob-dreamjob" at a fancy place or more like a "fancy prison" of none of their choice, to continue developing their talent and put it to good work for Da Man.

  44. Re: A better question: by Jerry+Coffin · · Score: 5, Informative
    How does this function compute 1/x^(1/2)?


    It starts by taking a guess at the right answer, and then improving the guess until it's accurate enough to use.

    The first step depends heavily on the fact that a floating point number on a computer is represented as a significand (aka mantissa) and an exponent (a power of two). For the moment, consider taking just the square root of X instead of its inverse. You could separate out the exponent part of the floating point number, divide it by two, and then put the result back together with the original significand, and have a reasonable starting point.

    From there, you could improve your guesses to get a better approximation. The simplest version of that would be like a high-low game -- you split the difference between the current guess and the previous guess, and then add or subtract that depending on whether your previous guess was high or low. Eventually, you'll get arbitrarily close to the correct answer.

    This can take quite a few iterations to get to the right answer though. To improve that, Newton-Raphson looks at the curve of the function you're working with, and projects a line tangent to the curve at the point of the current guess. Where that line crosses the origin gives you the next guess. That's probably a lot easier to understand from picture.

    In this case, we're looking for the inverse square root, which changes the curve, but not the basic idea. As a general rule, the closer your first guess, the fewer iterations you need to get some particular level of accuracy. That's the point of the:

    i = 0x5f3759df - (i>>1);

    While the originator of this constant is unknown, and some of it is rather obscure, the basic idea of most of it is fairly simple: we start by shifting the original number right a bit. This divides both the mantissa and the exponent part by two, with the possibility that IF the exponent was odd, it shifts a bit from the exponent into the mantissa. The subtraction from the magic number then does a couple of things. For one thing, if a bit from the exponent was shifted into the mantissa, it removes it. The rest of the subtraction is trickier. If memory serves, it's based on the harmonic mean of the difference between sqrt(x) and (x/2) for every possible floating point number of the size you're using.

    This is where the fact that it's 1/sqrt(x) instead of sqrt(x) means a lot: 1/sqrt(x) is a curve, but it's a fairly flat curve -- much flatter than sqrt(x). The result is that we can approximate a point on the curve fairly accurately with a line. In this case, it's really two lines, which gets it a bit closer still.

    From there, the number has had a bit of extra tweaking done -- it doesn't actually give the most accurate first guess, but its errors are often enough in the opposite direction from those you get in the Newton-Raphson iteration steps that it gives slightly more accurate final results.
    --
    The universe is a figment of its own imagination.
  45. Possible clues by Anonymous Coward · · Score: 0

    I saw the method used in the SGI OpenGL sample implementation. It was quite a while ago.

    http://oss.sgi.com/projects/ogl-sample/

    Anyone have access to the version control for this at SGI?

  46. Re:It may in part be related to something I did .. by Enzo+the+Baker · · Score: 1

    It is quite possible several people came up with similar ideas independently. Particularly if they were exposed to similar IEEE-754 hacking tricks, such as the fast log/exp.

    --
    I may twist orthodoxy to partly justify a tyrant. But I can easily make up a German philosophy to justify him entirely.
  47. Re:It may in part be related to something I did .. by tknd · · Score: 3, Funny

    (I'm (sorry))

    You don't (happen to (program (lisp))) do you?

  48. mod parent up! by DrEasy · · Score: 1

    Thank you very much! Nothing beats an example.

    --
    "In our tactical decisions, we are operating contrary to our strategic interest."
  49. Clever trick! by kent.dickey · · Score: 5, Informative
    To summarize, the article is about a piece of code to approximate 1/sqrtf(f):

    float InvSqrt (float x){
      float xhalf = 0.5f*x;
      int i = *(int*)&x;
      i = 0x5f3759df - (i>>1);
      x = *(float*)&i;
      x = x*(1.5f - xhalf*x*x);
      return x;
    }
    The trick is the "i = 0x5f3759df" line. It's certainly a magic number.

    The algorithm is simple Newton-Raphson -- make a good initial guess, then iterate making the guess better. I think Newton-Raphson on 1/sqrt picks up 5-6 bits each try in the line "x = x*(1.5f - xhalf*x*x)". It can be repeated to get a more accurate result each time it's repeated.

    The problem with Newton-Raphson is making a good first guess--otherwise, you need an extra iteration or two. And that's what the magic number is doing, making a good first guess.

    So let's work out what a good first guess would look like for 1/sqrt(f), to see where this code came from.

    Floating Point numbers are stored with a mantissa and an exponent: f = mantissa * (2 ^ exponent), where exponent is 8-bits wide and the mantissa is 23-bits wide.

    Let's take an example: 1/sqrt(16) would have f = 1.0 * (2 ^ 4). We want the result 0.25 which is f = 1.0 * ( 2 ^ -2).

    So our first guess should take our exponent, negate it, and cut it in half. (Try more examples to see that this works--it's basically the definition of 1/sqrt(f)). We'll ignore the mantissa--if we can just get within a factor of 2 of the answer in one step, we're doing pretty well.

    Unfortunately, the exponent is stored in FP numbers in an offset format. In memory,

    exp_field = (actual_exp + 127) << 23
    The mantissa is in the low 23 bits, and the most-significant bit is the sign (which will be 0 if we're taking roots). For now, let's just assume we have our exponents as 8-bit values, to work out what we need to do with the +127 offset.

    We want new_actual_exp = -(actual_exp)/2. But in memory, exp = (actual_exp + 127). Or, actual_exp = exp - 127.

    Substituting gives (new_exp - 127) = -(exp - 127)/2. Simplify this to: new_exp = 127 - (exp - 127)/2 => new_exp = 3*127/2 - (exp / 2).

    Now the exponent is shifted 23 places in memory, so let's write out our code (and ignore the mantissa completely for now...):

    i = ((3*127)/2) << 23) - (i >> 1);
    rewriting as hex:

    i = 0x5f400000 - (i >> 1);
    Well, first, it's arguable whether it should be 0x5f000000 or 0x5f400000 (The "4" is actually in the mantissa). I'm guessing resolving that dilemma led to the original author discovering that choosing a particular pattern of bits in the mantissa can help make the initial guess even more accurate, leading to the 0x5f3759df constant.

    I haven't worked it out, but Chris Lomont http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf shows this first guess is accurate to about 4-5 bits of significance for all floating point values. That's a good result, considering that mucking with the exponents was just hoping to get us within 1-2 bits of significance.
    1. Re:Clever trick! by fph+il+quozientatore · · Score: 1

      I think Newton-Raphson on 1/sqrt picks up 5-6 bits each try in the line "x = x*(1.5f - xhalf*x*x)" No, it doesn't. Newton's method converges quadratically, so the number of correct bits doubles at each iteration.
      --
      My first program:

      Hell Segmentation fault

  50. Ooops! by Anonymous Coward · · Score: 0

    You're correct, 1/sqrt(x) == x^(-1/2). I'm not very alert today, sorry.

    The rest is correct, though :P

    1. Re:Ooops! by DeadCatX2 · · Score: 1

      Sad, my correction has a higher score than your obviously more informative post.

      See, I wouldn't mind being Informative and Funny for a total score of 5, but to be all informative? I didn't intend for your post to get modded down. You gave WAY more shit, and yeah you were correct; even without reading the article I recognized that what you said sounds like a great method of acquiring a good guess for the Newton-Raphson method I learned in Calc.

      Such is the horror of slashdot's moderation system. They're giving mod points to people, who probably don't read articles, to moderate a discussion of other people (would that be (~RTFA)^2?), according to rules that they probably haven't read.

      I don't know what might be better, or if this is like capitalism and the least evil?

      --
      :(){ :|:& };:
  51. not that accurate by quakehead3 · · Score: 1

    I compiled the function and tested it with few numbers...and compared the output with 1/sqrt(x)

    InvSart(4) = 0.499154
    1/sqrt(4) = 0.5

    InvSart(10) = 0.315686
    1/sqrt(10) = 0.316228

    InvSart(2) = 0.70693
    1/sqrt(2) = 0.707107

    InvSart(100) = 0.0998449
    1/sqrt(100) = 0.1

    so it's pretty accurate to around three figures...

    1. Re:not that accurate by Anonymous Coward · · Score: 0

      It's iterative, so you can run that last bit several times to get better and better approximations. This is still quite a savings when you consider that the same floating point operations used to take hundreds of cycles to execute. Modern SSE units on the x86 can do a much better job than x87 code (I think someone earlier quoted a figure like 5 cycles for the reciprocal of a square root on the 128-bit XMM registers), which is why this sort of thing (and the x87 instruction set) is mostly obsolete.

  52. Re:Evil math--it approximates the inverse square r by Anonymous Coward · · Score: 0

    > To quote the pdf article you cite, "However the analysis should be thoroughly tested on a real machine, since hardware often does not play well with theory." That's nonsense - the machine is predictable,

    Err, the reason they said that is because the method makes assumptions about the way a float is represented in binary. If you change how it works (or want to use doubles instead of floats, etc.), you'll have to rethink that method or your errors won't be nice and small. Any particular machine may be predictable, but the way we work with floating point numbers on a computer might *not* be if we discover some new & better way to model rational binary numbers.

  53. Similar solution for x^(-1/4) by jothaxe · · Score: 2, Interesting

    I think I have found a similar solution for the reciprocal of the FOURTH root. The code looks almost the same, but it requires a new constant and you shift the bits 2 places instead of just 1. float Inv4thrt(float x) { float x0 = x; int i = *(int*) // get bits for floating value i = 0x4F541EEF - (i>>2); // gives initial guess y0 x = *(float*) // convert bits back to float x = (x/4.0f)*(5 - x0 * x*x*x*x); // Newton step, repeating increases accuracy return x; } Note that the constant I have here is not very carefully tuned so the error may be slightly larger than with the sqrt. Pretty cool, I think.

  54. lousy code by idlake · · Score: 1

    The code may or may not be a good way of solving the problem, but it's lousy code nonetheless:

    * it's badly named: "reciprocal square root" would be better (however, that's a problem with graphics lingo)
    * it lacks a comment as to where it came from
    * it lacks a comment as to how it works or how the constant was determined
    * it lacks a comment as to how well it works
    * it lacks a comment as to the range of arguments it's valid for
    * it lacks (optional) error checking
    * it lacks a comment as to what environment it works in
    * it lacks a straightforward base implementation
    * there don't seem to be any test cases either

    The code posted in the article is actually not exactly the Q3 source code. The Q3 source code actually contains two slightly different versions of the function, one called "Q_rsqrt" commented (with "// what the fuck") and a single iteration, and second one that's uncommented and named incorrectly "SquareRootFloat" and uses two iterations. Neither of them say what the function actually does, and the fact that there are two of them is also not so great.

    Yeah, that kind of code causes me "to go wobbly in the knees"--by annoying the hell out of me--because the programmer who wrote it is wasting my time. Pretty predictably, sooner or later, this code will break something, and it will first take time to figure out that this code is responsible, then it will take some time to figure out what it is supposed to do, and finally, it will be a pain to figure out how to fix it. Or, I'd just rewrite it.

    1. Re:lousy code by Anonymous Coward · · Score: 0

      And that kids is the smell of jealousy.

    2. Re:lousy code by Anonymous Coward · · Score: 0

      And that kind of attitude, kids, is why your games and desktop machines have so many bugs.

    3. Re:lousy code by WilliamSChips · · Score: 1

      This is game code from a Real Programmer.

      --
      Please, for the good of Humanity, vote Obama.
  55. TeX by Anonymous Coward · · Score: 0
    x^(-2) (which is the same as 1/sqrt(x) of course)

    Actually x^(-2) == 1/(x^2).

    I believe you meant to say x^(-1/2)


    The format choices for Slashdot comments are:
    • Plaint Old Text
    • HTML Formatted
    • Extrans
    • Code


    For this topic it seems we need TeX.

  56. Re:Bad programmer, no cookie by Chris+Burke · · Score: 4, Insightful

    I imagine they ran some profiling tools and found that they were spending a substantial amount of time in calls to sqrt(), and figured that the division was also time-consuming and for an operation that is performed on so many pixels, it was worthwhile to optimize this particular routine.

    I've written enough 3D graphics code -- including a textured polygon rasterizer that would probably cry and try to delete itself if it saw something like Quake 3 -- to know that they didn't have to run a profiler to know that they'd be spending too much time doing 1/sqrt(x) if they didn't have a highly optimized routine for it. It's an inherent operation in so many 3D calculations it isn't funny, and while by the time Quake 3 came out hardware floating point units were pretty fast, FP divides and FP square roots were very lengthy operations that more importantly couldn't be pipelined.

    But if you're just trying to, say, figure out which pixel to color, and you approximate the pixel to a few decimal places...I think you're good to go.

    Yeah, pretty much. Back when I wrote my code (pentium days) you had an FP unit but it wasn't very good, so I used fixed point math (using integer instructions) which had sufficient precision for a 320x200 display. Getting enough performance out of the core algorithms was still hard enough that I had to take a lot of shortcuts, like instead of doing the right thing by using a divide every pixel to calculate which texel to use, I used a divide every 8 pixels and linearly interpolated in between. I'm sure that Quake (the contemporary 3D engine of the day which would also make my code cry) contained many more clever optimizations and approximations, because it wouldn't have been possible on the hardware of that day without them.

    In fact, approximating FP values for 3D code is so common that the 3DNow and SSE instruction sets contain instructions that approximate the square root and inverse square root to about half of single-precision floating point. The non-approximation instruction uses a lookup table to get an initial guess, then uses a couple iterations of Newton's Method to refine the result. The short cut instructions simply return the value in the lookup table.

    So yeah, basically the AC OP has no idea what he is talking about, and from that basis of ignorance is denigrating what is in reality a very clever and extremely useful hack.

    --

    The enemies of Democracy are
  57. Re:It may in part be related to something I did .. by waveclaw · · Score: 2, Insightful
    Though, its very likely that these ideas originate from even earlier in computing history.

    Attribution.

    That is why most, if not all, software patents are bogus. Just because you reinvent something published by a PhD working in a committee that disbanded 10 years before you knew 'C' came after 'B' in the alphabet does not make you reinvention patent worthy. The history of invsqrt() crosses disciplines of hardware and software design, spec development, graphics and math theory. With such a fascinating function having a hard to track history it is no wonder that things like James W. Hunt's 1976 diff concept can be patented in 2006 by Microsoft. (As mentioned by QuantumG on slashdot.)

    The trouble had in tracking the history of InvSqrt() is really sad. Computing is an industry that hypes how digital storage of information and perfect copies eliminates the isse of decay. While the ever expanding secondary storage (just now getting to really usefull) means oblivating the scaling issues with retaining the meta-information that denotes this very history. I guess the moral of the story is: sign your contributions. It won't take up that much space in the comments.

    "Don't have good ideas if you aren't willing to be responsible for them."

    --Alan Perlis

    And will the real Fast InvSqrt() author please stand up?
    --

    "You cannot have a General Will unless you have shared experiences. You cannot be fair to people you don't know."
  58. Same trick with log(x) and pow(x,y) by adam31 · · Score: 1
    Probably no one cares, since log and pow aren't as useful as rsqrt, but I wrote a FastLog (base 2) that uses basically the same principle.

    inline float FastLog2( float x )
    {
    unsigned int fint = (*(unsigned int*)&x & 0x7FFFFFFF);
    float exp = (float)( fint >> 23) - 127.0f;
    fint &= ~0x40000000;
    unsigned int mant = fint | 0x3f800000;
    float fmant = *(float *)&mant - 1.0f;

    // cheaper, slightly faster approximation:
    // return exp + 1.3333f*fmant - 0.33333f*fmant*fmant;

    // better, slightly slower approximation:
    float t0 = 0.1566666f * fmant + .5746666f;
    return exp + t0*fmant + (1.0f-t0)*fmant*(2.0f-fmant);
    }
    I guess the useful piece is that you can do basically the same thing with exp2(x) and combine it with log2(x) to get a decent powf(x,y) approximation.

    inline float FastExp2( float x )
    {
    unsigned int exp = ((unsigned int)x + 127) << 23;
    float fexp = *(float *)&exp;
    float fmant = x - (float)(unsigned int)x;
    float t0 = (.695f - .08f*fmant);
    float curve = 1.0f + t0*fmant + (1.0f-t0)*fmant*fmant;

    return fexp * curve;
    }

    inline float FastPow( float x, float y )
    {
    float logx = FastLog2( x );
    float answer = FastExp2( y*logx );

    return answer;
    }
    Enjoy!
  59. oh thats in the invsqrt.readme by cheekyboy · · Score: 1

    Dude, ever think that full docs might me in a .txt or .doc file?

    If we could do html documents inside source comment blocks that would be awesome instead of crappy ascii comments with no
    ability to do images fonts or diagrams

    --
    Liberty freedom are no1, not dicks in suits.
    1. Re:oh thats in the invsqrt.readme by Anonymous Coward · · Score: 0
      If we could do html documents inside source comment blocks that would be awesome instead of crappy ascii comments with no
      ability to do images fonts or diagrams
      <sarcasm>Yes, because we all want to make it even harder to write comments consistently.</sarcasm>

      Brief, ASCII, and to the point. Diagrams and crap are someone else's job. Heck, that's why I use LaTeX for writing all my documents--because I don't have to worry about making it look "pretty." That's someone else's job. (Possibly me with the "typesetting hat" on, but at least the activities are separate.)
  60. breaks on gcc by idlake · · Score: 1

    Apart from the unnecessary obscurity of the code, the code actually simply computes the wrong values with current versions of gcc and optimization turned on. See here:

    http://gcc.gnu.org/ml/gcc-bugs/2006-03/msg02943.ht ml

    http://gcc.gnu.org/ml/gcc-bugs/2006-03/msg02957.ht ml

    It also makes unstated assumptions about the values it gets called with; call it with something else and you get bad results.

    1. Re:breaks on gcc by Anonymous Coward · · Score: 0

      Arse biscuits!

      I inserted some printf statements and then the expected values came out. So much for optimization.

    2. Re:breaks on gcc by Anonymous Coward · · Score: 0

      I inserted some printf statements and then the expected values came out. So much for optimization.

      And your point is what? Are you trying to be funny, or are you really that dumb?

    3. Re:breaks on gcc by Anonymous Coward · · Score: 0

      Would it also break when converting via a union:
      union {
          float f;
          int i;
      } u;
      u.f=1.0; /* Now use u.i */

    4. Re:breaks on gcc by Anonymous Coward · · Score: 0

      It doesn't break on gcc 4.1, but it could break on other versions of gcc or other C compilers. It definitely will break on some processors other than x86.

      It's also only 17% faster than the portable, always-correct version on my machine, and may well be slower on other hardware.

      Overall, it's just not portable. It's OK to write such code if it's properly commented, if there is a portable implementation, and if there is a unit test for it, but none of those exist here. ATLAS is a good example of how this sort of thing can be done properly.

      The portable way of writing such code is via the IEEE C99 floating point bit-access functions, but that may or may not defeat the purpose (performance).

      Rather than being an example of coding prowess at ID, it seems to be an example of mindlessly copying SGI code to the x86 without understanding it.

  61. In other words... by redblue · · Score: 1

    Java sucks, right?

  62. Neat approximation.. by d_jedi · · Score: 1

    But it IS just an approximation. It doesn't give the right answers.

    Ex: InvSqrt(25) returns 0.19969 instead of the expected result of 0.2. (ie. sqrt(25) = 5, 1/5 = 0.2, for those wondering about the math).

    --
    I am the maverick of Slashdot
  63. My current explanation / understanding by kazad · · Score: 1

    The code is extremely clever. Net: It finds the inverse square root [1/sqrt(n)] using a great initial guess and one iteration of Newton's approximation method. It avoids excessive division, the square root operation, and multiplication, which are computationally expensive.

    I'm not an expert, but heres how I understand it:

    1. Background: Newton's method finds roots of any function
    ------

    What does factoring an equation have to do with finding 1/sqrt(n)? A lot. Give me a number n. I now make the function

    f(x) = 1/sqrt(x) - n

    Notice that when you find an x where f(x) = 0, it means x is the inverse square root of n:

    f(x) = 0
    1/sqrt(x) - n = 0
    1/sqrt(x) = n
    x = 1/sqrt(n)

    In other words, I need to find the root of that equation. Newton's method lets you do this by picking a starting value, seeing how far off you are, and getting closer and closer with each iteration. There's more info online. With Newton's method, call your initial guess "g". An better approximation for the root is

    guess_new = g - f(g)/f'(g)

    In our case, f(x) = 1/x^2 - i (where i is the initial value, as seen in the code). We use the power rule to see that f'(x) = -2x^-3, and plug it into the guess_new equation above:

    guess_new = x - (1/x^2 - i)/-2x^-3
    guess_new = x(1.5 - ix^2)

    which is exactly what the code above has. If you keep plugging "guess_new" back in the equation, you can get closer and closer to the answer.

    Here is a demo of multiple iterations to find inverse square: http://tinyurl.com/vh7hg/ Try plugging in different initial guesses (.2, .4, .8) to see how it converges. With me so far? Newton's method finds roots, and finds them fast if given a good guess.

    2. Now our problem becomes: How can we make a good guess?
    ------

    If we had a lot of time, we could just pick a random number and keep iterating using the method above. But that would be slow - we want a *good* guess.

    Well, our best guess for the inverse square root is the inverse square root itself! What's a good way to get 1/sqrt(n)?

    This is the first level of magic. Assume you have a number in exponent form, like this:

    10^6 = 1 million

    If you want to find the regular square root of 1 million, just divide the exponent by 2.
    sqrt(10^6) = 10^(6/2) = 10^3 = 1 thousand.

    If you want the *inverse* square root, divide the exponent by -2 to flip the sign.

    invsqrt(10^6) = 10^(6/-2) = 10^-3 = 1/thousand

    Ok so far? Our goal is to divide the exponent of i (our number) by -2 to get a really awesome guess for Newton's approximation method.

    3. Floats are stored in mantissa-exponent form
    ------

    This is the key. Floating-point numbers have an explicit exponent and mantissa component. Theoretically, we could mask out the bits for the exponent and do division.

    But division is expensive; the code uses another clever hack. Shifting bits is the same as dividing by 2 (or 4, 16, or any other power; the remainder is truncated, which is OK for an approximation).

    So we can divide by 2 easily. And if we want a negative number, instead of multiplying by -1 (expensive), we can just subtract the number from "0" (cheap).

    The program converts the floating point into an integer (using the pointer tricks), shifts the bits by 1 to halve the exponent, and subtracts from "0" (the magic number - hold on) to negate it.

    4. Why the magic number 0x5f37...?
    ------

    We can't just subtract from zero, there's too much going on. First, by shifting the bits we mave move some of the exponent bits into the mantissa. Also, there are different cases of odd/even exponents. The paper goes into lots of special cases, I didn't really understand them all first time around. But the magic number tries to minimize errors, and there can be several magic numbers used.

    5. What's the result?
    ------

    The result is that you get a great initial value to

    1. Re:My current explanation / understanding by Chinju · · Score: 1

      This is a nice explanation, though I do wish I understood more of the details of that magic number, where perhaps the bulk of the "WTF?" factor lies. It's a much better explanation, at any rate, then the (complete lack of an) explanation in the original code, which is appallingly comment-free. That flies sometimes, but given how many people are scratching their heads trying to reverse-engineer the thinking here, this is clearly not one of those times.

    2. Re:My current explanation / understanding by kazad · · Score: 1
      Thanks, totally agree. The lack of comments is pretty disturbing, esp. if this is a core routine. It's not immediately obvious if it works on 32- and 64-bit architectures, big or little-endian machines, etc. Although in this case, even if it were commented, the comment might look something like this

      // Computes inverse square: 1/sqrt(x)
      // using Newton's method of approximating roots and a magic number as an initial guess.
      // See this master's thesis paper and the IEEE floating point spec on why I chose this number: ...
      :)
  64. Re:Evil math--it approximates the inverse square r by sillybilly · · Score: 1

    I see. So you say the unpredictable part is the assumption of floating point representation in sign, mantissa and exponent terms, which can vary from architecture to architecture.

  65. Re: A better question: by Anonymous Coward · · Score: 0

    This function does not aproximate UNTIL you get good answer. The thing is that this initial guess is so good that just one approximation gives you almost two decimal points for any float. That is cute and you do not need more precision in games. No branching, you nail it every time.

  66. Lookup Table? by Tablizer · · Score: 1

    I'm gonna float (pun) an idea here: How about using a look-up table, and then extrapolate linearly between the 2 nearest matches in the table?

    -Tablizer

    1. Re:Lookup Table? by mikera · · Score: 2, Informative

      In modern processors, lookups from memory are far too slow. This is especially true if you are doing lots of random lookups that might go outside the cache.

      Performance-wise, you are much better doing several calculations than a single lookup.

    2. Re:Lookup Table? by Peeteriz · · Score: 1

      Actually, it might be not "several calculations" but even "several thousand calculations" - if your lookup table isn't in the cache, then it's not so fast anymore.

  67. From _your_ fortune file, this is even more amazin by IBitOBear · · Score: 1

    So, if you do

    fortune -m 'bits in'

    on any modern linux with fortune installed you will get the following,
    (along with usually at least one other tidbit)

    #define BITCOUNT(x)     (((BX_(x)+(BX_(x)>>4)) & 0x0F0F0F0F) % 255)
    #define  BX_(x)         ((x) - (((x)>>1)&0x77777777)                    \
                                 - (((x)>>2)&0x33333333)                    \
                                 - (((x)>>3)&0x11111111))

    It works, and I have used it to good advantage in circumstances where
    nothing else would _quite_ do...

    What, you ask, does it do?  It gives you the number of bits which are set in a
    given byte/word/dword...

    So BITCOUNT(0x81) == 2

    and so on.

    --
    Innocent people shouldn't be forced to pay for inferior software development.
    --"Code Complete" Microsoft Press
  68. Re: A better question: by KIAaze · · Score: 1

    1)I read a lot of comments explaining the algorithm andeverytime they say it uses an iteration (like the Newton-Raphson method) to approximate 1/sqrt(x). But the problem is I don't see where the iteration is! I have already done some C and C++ programming, but I don't understand how this code snippet can iterate without any loop. I don't see any "for(...)" in it. Can you do iterations simply with this line?: i = 0x5f3759df - (i>>1); As far as I know ">>1" simply shifts the bits to the right and 0x5f3759df is simply a number in its hexadecimal form. So this line looks like a simple substraction to me. So my question is: How does it iterate? 2)There is some C-code I didn't understand: float xhalf = 0.5f*x; x = x*(1.5f - xhalf*x*x); What does the "f" after the number do?

  69. Re: A better question: by KIAaze · · Score: 1

    Sorry, first time I was posting.
    I didn't pay attention to the fact that it was in HTML by default.
    Here it is again but in an easier to read form:

    1)I read a lot of comments explaining the algorithm and everytime they say it uses an iteration (like the Newton-Raphson method) to approximate 1/sqrt(x).

    But the problem is I don't see where the iteration is! I have already done some C and C++ programming, but I don't understand how this code snippet can iterate without any loop.
    I don't see any "for(...)" in it.
    Can you do iterations simply with this line?:
    i = 0x5f3759df - (i>>1); As far as I know ">>1" simply shifts the bits to the right and 0x5f3759df is simply a number in its hexadecimal form. So this line looks like a simple substraction to me.

    So my question is: How does it iterate?

    2)There is some C-code I didn't understand:
    float xhalf = 0.5f*x;
    x = x*(1.5f - xhalf*x*x);

    What does the "f" after the number do?

  70. Re:From _your_ fortune file, this is even more ama by Anonymous Coward · · Score: 0

    This was explained in Dr. Dobb's Journal magazine's feature called Algorithm Alley, can't remember the year or month. Tried googling in Yahoo and Google but didn't help.

    But effectively that calculated the number of bits by parallelizing over the whole number - calculate number of bits in whole, calculate number of bits in two halves, calculate number of bits in four quads, etc. up to "calculate number of bits in 1 bit" (which is trivial). Then you sum them together.

  71. Re: A better question: by dysan27 · · Score: 1
    The line

    x = x*(1.5f - xhalf*x*x);
    is the "iteration". Because this code was designed to run fast and not accurate, in many parts of graphics applications "close enough" is alright, we only need to run the line once, and so leave out the actual loop. Also due to other optimizations that is not the actual line you would have in the loop, but it is where the "iteration" is.
  72. The f after 0.5 says 0.5 is a float not a double by Sits · · Score: 2, Informative

    Answer to 2)
    The f after 0.5 and the 1.5 simply means "use the floating point version of this number". Multiplying two numbers which have different types (e.g. multiplying an int and a float) or doing assignment between different types means that one of the numbers will have to be automatically converted to the other's type and which conversion happens depends on the types of both numbers. By default, I think type of the literal 0.5 (without any letters after it) is a double. By adding the f (assuming x is a float) the types of the operands are the same so there is no implicit type conversion thus eliminating the chance of the wrong conversion happening automatically. You care about what type of multiply is done because certain types can lead to unwanted loss of precision.

    As for 1)
    Well if you iterate by one step you don't need a loop. Presumably most looping iteration happens because your first guess wasn't accurate enough. If your first guess WAS accurate enough why would you needlessly calculate more accuracy? I'm only speculating though - I don't know if this is really the case.

  73. Re: A better question: by Otto · · Score: 1

    The actual approach in the algorithm is to iterate, with each iteration getting closer to the final step. However, the magic constant sets the starting point for this iteration so close that iterating is not actually required unless you want a lot more (unnecessary) accuracy.

    --
    - Give a man a fire and he's warm for a day, but set him on fire and he's warm for the rest of his life.
  74. Re:Bad programmer, no cookie by descubes · · Score: 2, Informative

    In fact, approximating FP values for 3D code is so common that the 3DNow and SSE instruction sets contain instructions that approximate the square root and inverse square root to about half of single-precision floating point. That's also the only way to compute it on Itanium: you have to iterate using the floating-point reciprocal square root approximation instruction. There is no instruction giving you the final result.
    --
    -- Did you try Tao3D? http://tao3d.sourceforge.net
  75. Can xhalf=0.5f*x be made faster? by Bromskloss · · Score: 1

    Or is it fast already? Does the hardware just decrease the exponent with one when it sees that the value to multiply x with is 0.5? Otherwise, it sounds like that's something you would want to do manually, right?

    --
    Swedish plasma phys. PhD student; MSc EE; knows maths, programming, electronics; finance interest; seeks opportunities
  76. Re: A better question: by ja · · Score: 1

    The 'f' in 0.5f keeps the compiler away from casting 0.5 to the default 64bit double. So we get to stay in the cheaper 32 bit domain for our calculation.

    There is only one iteration, and as many other have said, this is because the initial guess is so clever so one iteration will be sufficient for the job in question. If you wanted to do a second iteration you would repeat the last line rather than set up a loop. So there would not be any 'for' or 'while' statements in that case either.

    --

    send + more == money? ...
  77. Don't know why by 12357bd · · Score: 1

    Really, by the comments on this page, it's amazing how people use functions they don't understand! What's this? Faith?
    Programming should not be based on faith but on reasoning.
    The article even fails to mention that the function is just a rough approximation at 1/sqrt(), this function would be better labeled RoughInvSqrt(). And yes bitwise manipulation of float point values is cool, yes, that's programming.

    --
    What's in a sig?
    1. Re:Don't know why by WilliamSChips · · Score: 1

      Unless you audit every line in every piece of code you ever use you have some degree of faith.

      --
      Please, for the good of Humanity, vote Obama.
  78. 0x5f3759df? by Anonymous Coward · · Score: 0

    I think he just ran simulations with constants and he really didn't use any math to come up with 0x5f3759df. Someone mentioned that 0x5f375a86 appears to perform slightly better. Can't it be that this person simply had some code that executed InvSqrt() all the time until the he found a good constant?

  79. Re:Evil math--it approximates the inverse square r by DeadCatX2 · · Score: 1

    So you say the unpredictable part is the assumption of floating point representation in sign, mantissa and exponent terms, which can vary from architecture to architecture. I highly doubt that the 32-bit floating point representation of numbers is going to vary from architecture to architecture. AFAIK any architecture that uses 32-bit floats uses the IEEE 754 standard.

    Now, IF they make a better floating point standard (I doubt it; this one is perfect in many ways), the only thing you really need is a new magic number.

    And IF they wanted to use double precision numbers...they probably care about precision and wouldn't be using a routine that is only good for approximation.
    --
    :(){ :|:& };:
  80. Article copied from Code Maestro? by TomerM · · Score: 1

    The original article that came out more than a year and a half ago, right after Quake's source code has been revealed, has been published at Code Maestro: http://www.codemaestro.com/reviews/review00000105. html
    If someone doesn't have any good ideas of his own for writing articles, perhaps he should have picked another profession...

  81. Re:Evil math--it approximates the inverse square r by sillybilly · · Score: 1

    This routine is good for precision. In the article he mentions some swede who had to get 48 significant digits for a thermo calculation for a doctorate student. Note that the 2nd iteration in the pdf is orders of magnitude more precise then the first, so all you need is enough iterations. You can always do a (do while current-previous(or a few previous moving average) epsilon), and build a loop, but that has to test a lot of conditionals ready for a branch(deep cpu pipeline predictors don't like branches) as opposed to just lining up 1 or 2 iterations in the code that makes it fly through the pipeline. True, when you do a lot of iterations, the speed gain from getting the first iteration so accurate doesn't count that much. It would save say 2 cycles out of a 20 cycle step, as opposed to making all the difference when there is only one iteration carried out, without a loop, as it is in the sample code given.

  82. Cleve Moler of Mathworks showed me this in 1987 by gvwalsh · · Score: 1

    I can shed some more light on the 1/sqrt(x) topic. I coded a vectorized version of this function for the Ardent Titan "graphics supercomputer" in 1987. I did not originate the idea but got it from Cleve Moler, original founder of MathWorks and author of MatLab. I did some work with Gary Tarolli when he was at Stellar, a competitor of Ardent which eventually merged with it to form Stardent computer. He may have picked it up there. The function was widely used in the Ardent 3D code base. I also had a 1/cuberoot(x) version which worked in a similar but slightly more complex manner. It was not used in 3D graphics but did find application in some mechanical engineering analysis programs. - Greg Walsh

  83. Doesn't work any more by Markus+Registrada · · Score: 2, Interesting
    Nowadays (e.g. since 3.x Gcc) that trick doesn't work any more, as written. To get reliable results you would need to say something more like

    float inv_sqrt(float x) { /* death to StudlyCaps! */
    float xhalf = 0.5f*x;
    union { float x; int i; } ix;
    ix.x = x;
    ix.i = 0x5f3759df - (ix.i >> 1);
    x = ix.x;
    x *= 1.5f - xhalf * x * x;
    return x;
    }

    The point is that pointer-based type punning isn't really allowed by the language definition. The optimizer can take advantage of that to emit simpler code that, in this case, produces the wrong answer. Using a union short-circuits that kind of optimization.

    This isn't to say any particular Gcc produces wrong answers with the code as written, but if you complained about one that did, they would just laugh at you and (if they liked you enough) tell you to rewrite it like the above instead.

  84. Re:From _your_ fortune file, this is even more ama by bensch128 · · Score: 1

    Funny, this was on an extremely hard interview I had recently.

    I couldn't answer the question...

    Ben

  85. For C code? Are you joking? by patio11 · · Score: 1

    They already spotted you about four letters more in that function name than most C programmers would be comfortable with. Count yourself lucky -- most would have made it something like isr(x). Of course, I'm a crankly Java programmer and it would have been public static double InverseSquareRoot(double number) had I had my druthers. And the code would have been "return 1.0 / Math.sqrt(x);" because I am a Java programmer and if I wanted speed I would be using a language which thinks 7 characters is too long for an identifier.

  86. Stupid HTML by DeadCatX2 · · Score: 1

    That last part didn't render right.

    (1 = mantissa 2; the non-fractional portion is also implied, so the bits of the mantissa are actually just the fractional portion)

    Should be

    (1 <= mantissa < 2; the non-fractional portion is also implied, so the bits of the mantissa are actually just the fractional portion)

    --
    :(){ :|:& };: