Origin of Quake3's Fast InvSqrt()

← Back to Stories (view on slashdot.org)

Origin of Quake3's Fast InvSqrt()

Posted by Zonk on Friday December 1, 2006 @07:20AM from the i-know-you-were-dying-inside-without-this dept.

geo writes "Beyond3D.com's Ryszard Sommefeldt dons his seersucker hunting jacket and meerschaum pipe to take on his secret identity as graphics code sleuth extraordinaire. In today's thrilling installment, the origins of one of the more famous snippets of graphics code in recent years is under the microscope — Quake3's Fast InvSqrt(), which has been known to cause strong geeks to go wobbly in the knees while contemplating its simple beauty and power." From the article: ""

19 of 402 comments (clear)

Min score:

Reason:

Sort:

And so why do we care? by Codename46 · 2006-12-01 07:27 · Score: 2, Insightful

From the words of John Carmack himself, if you discover and implement an algorithm by yourself, even though it may have already been discovered already, you deserve credit for finding it on your own.

[Insert rant about software patents]
Re:What's with use of Pointers? by Anonymous Coward · 2006-12-01 07:42 · Score: 1, Insightful

> int i = (int)x;

converts a float to an int, potentially cutting off the digits past the decimal point and/or leading digits that don't fit into the 32 bit integer range.

> int i = *(int*);

reinterpretes the bit pattern of the float number as an integer, so the exponent and the sign bit become part of the number.
Re:What's with use of Pointers? by MeanMF · 2006-12-01 07:43 · Score: 2, Insightful

int i = (int)x would convert the floating point number to its rounded integer equivalent.

int i = *(int*)x assigns i the 32-bit value that represents how x is stored in memory as a float.
Poor function name by bloobloo · 2006-12-01 07:43 · Score: 4, Insightful

The inverse of the square root is the square. The reciprocal of the square root is what is being calculated. In the case of multiplication I guess it's a reasonable name but it's pretty poor form in my view.
1. Re:Poor function name by rg3 · 2006-12-01 23:07 · Score: 2, Insightful
  
  The word "inverse" has several uses in mathematics. One of them is the one you mentioned, which is to say "inverse function". In that sense, working with positive numbers, the inverse function of the square root is the square, yes. However, there is also the concept of "inverse element" for sets and operations. It's a very generic concept that relates to an operation and the identity element for that operation. For the sum, the identify element is 0, given that x + 0 = x. The inverse element of another element regarding a given operation (and it's usually written x^(-1) for the element x), is one such that x op. x^(-1) = i, being i the identity element.
  
  Again, going back to the addition operation, the identity element is 0, x + x^(-1) = 0. In this case, x^(-1) is -x. For the multiplication operation, the identify element is 1 and the inverse element x^(-1) of any x is 1/x. In my math classes in high school, the teacher always called -x "opposite x" and 1/x "inverse x". So, basically, if you read "square root" not as a function, but as a number, the "inverse square root" is 1/sqrt(x), and the "inverse function for the square root function", x^2.
  
  All of this to say what? That it's not that easy and it can be discussed.
It might be damn smart.. by kan0r · 2006-12-01 08:01 · Score: 5, Insightful
But the first thing I thought when I saw this was: "Damn, that code is a mess!"
Seriously, try looking away from the genius who obviously wrote it.
- There is no single comment which would make reading and understanding what happens here much easier!
- Introduction of a magic number with no explanation whatsoever
- Magic pointer arithmetics without demystification
- Portability? Abuse of a single processor architecture, without warning that this would not work on non-x86
I know it is good code. But it is simply bad code!
1. Re:It might be damn smart.. by Anonymous Coward · 2006-12-01 08:41 · Score: 2, Insightful
  
  There is no single comment which would make reading and understanding what happens here much easier
  
  Exactly. It's very simple code, and if you can't tell how it works by looking at it you need to first brush up on your applied math and IEEE 754. Then it will be obvious.
2. Re:It might be damn smart.. by Tyler+Durden · 2006-12-01 09:38 · Score: 2, Insightful
  
  Ummm... how is endianness going to affect the calculation at all? A 64 bit architecture should royally screw up the calculation, but given when the code was written a 32 bit architecture would be a perfectly reasonable assumption.
  
  If I wanted to be picky, I'd harp on the magic number or at least add a comment at the top saying it used Newton-Raphson iteration. As far as the bit fiddling goes, anyone who does 3D programming in C will instantly recognize what it does.
  
  --
  Happy people make bad consumers.
3. Re:It might be damn smart.. by Anonymous Coward · 2006-12-03 00:36 · Score: 1, Insightful
  
  Java is a mediocre language for mediocre programmers to write mediocre programs in. Sorry be the one to tell you.
  
  You can change, you know. It's not too late!
Re:It was fast by systemeng · 2006-12-01 08:19 · Score: 5, Insightful

First off, this function calculates 1.0/sqrt(x), not sqrt(x). InvSqrt is a particularily nasty function because both the divide and the square root stall the floating point pipeline on IA32 processors. As a result, instead of shooting out one result per cycle that the pipelining normally allows, the processor will stall for 32 cycles for the divide after it has stalled for the 43 cycles for the square root(P4). This is a big hit to realtime performance and it also prevents 76 multiplies from getting done while the pipeline is stalled. Secondly, IA32 processors are super scalar and have multiple integer units which can do portions of this calculation in parallel. This algorithm is brilliant because it uses the integer units for a portion of the most difficult part of the calculation and the remaining floating point multiplies only take about 6 clock cycles on the FPU. The difference in clock cycles you are counting is likely because the routine as written will be implemented as a function call and the stack push overhead will eat you alive. If this is implemented inline, it's about 6 times as good as simply calling the processor's assembly instructions for root and divide in sequence with the penalty that it isn't as accurate. It is virtually impossible to beat sqrt on IA-32 but 1.0/sqrt can be computed faster with newton raphson iteration in one fell swoop than by coposition of the operations. I've worked several years implementing similar optimizations in the reference implementation of ISO/IEC 18026, a standard for digital map conversion. Most of the routines that had optimizations like this added to them saw at least 30% speed improvements. This is a bit of a soft number because many things were reordered to make the pipeline fill better but in general, a complicated function especially of trig fucntions that can be computed in one iteration of well designed newton-raphson will be much faster than the coposition of the CPU's implementation of the component functions. In short, don't write off careful numerics they can provide great sped improvements, just don't use them in code that people will want to understand later if you don't document exactly what you did and why.
Re:A famous quote by jmv · 2006-12-01 10:27 · Score: 4, Insightful

The real genius here is Isaac Newton. 'course, that's not news to anyone.

You mean that Newton thought about taking advantage of the IEEE float format to initialize the algorithm using "i = 0x5f3759df - (i>>1);"? Wow, now that's a clever guy!

--
Opus: the Swiss army knife of audio codec
Re:A famous quote by rblancarte · 2006-12-01 11:10 · Score: 2, Insightful

I think the point of the parent post was to point out that the summary should have enough information that you can decide if you want to RTFA or not. You should not have to RTFA to decide if you want to RTFA article or not.

As far as the name of the function - true, but you never know, it could be something different. A small explaination would have helped.

RonB

--
It is human nature to take shortcuts in thinking.
Re:A famous quote by saforrest · 2006-12-01 12:16 · Score: 3, Insightful

The result of taking the square root of a number and then squaring it will be the absolute value of the original number.

Really? What if the number is negative?

I think you mean to say "the original number", not "the absolute value of the original number". When given a negative argument, this composition will either return an error (because there is no support for complex numbers) or a negative result equal to the input.
Re:A famous quote by Tim+Browse · 2006-12-01 12:22 · Score: 2, Insightful

Carmack left school after two semesters to work as a freelance programmer. I don't think he made it to Numerical Methods or the equivalent. That makes this code all that much more impressive.
RTFA much?
Re:Correction by lkeagle · 2006-12-01 13:27 · Score: 3, Insightful

You live in America, don't you?
Re:Bad programmer, no cookie by Chris+Burke · 2006-12-01 13:50 · Score: 4, Insightful

I imagine they ran some profiling tools and found that they were spending a substantial amount of time in calls to sqrt(), and figured that the division was also time-consuming and for an operation that is performed on so many pixels, it was worthwhile to optimize this particular routine.

I've written enough 3D graphics code -- including a textured polygon rasterizer that would probably cry and try to delete itself if it saw something like Quake 3 -- to know that they didn't have to run a profiler to know that they'd be spending too much time doing 1/sqrt(x) if they didn't have a highly optimized routine for it. It's an inherent operation in so many 3D calculations it isn't funny, and while by the time Quake 3 came out hardware floating point units were pretty fast, FP divides and FP square roots were very lengthy operations that more importantly couldn't be pipelined.

But if you're just trying to, say, figure out which pixel to color, and you approximate the pixel to a few decimal places...I think you're good to go.

Yeah, pretty much. Back when I wrote my code (pentium days) you had an FP unit but it wasn't very good, so I used fixed point math (using integer instructions) which had sufficient precision for a 320x200 display. Getting enough performance out of the core algorithms was still hard enough that I had to take a lot of shortcuts, like instead of doing the right thing by using a divide every pixel to calculate which texel to use, I used a divide every 8 pixels and linearly interpolated in between. I'm sure that Quake (the contemporary 3D engine of the day which would also make my code cry) contained many more clever optimizations and approximations, because it wouldn't have been possible on the hardware of that day without them.

In fact, approximating FP values for 3D code is so common that the 3DNow and SSE instruction sets contain instructions that approximate the square root and inverse square root to about half of single-precision floating point. The non-approximation instruction uses a lookup table to get an initial guess, then uses a couple iterations of Newton's Method to refine the result. The short cut instructions simply return the value in the lookup table.

So yeah, basically the AC OP has no idea what he is talking about, and from that basis of ignorance is denigrating what is in reality a very clever and extremely useful hack.

--

The enemies of Democracy are
Re:It may in part be related to something I did .. by waveclaw · 2006-12-01 14:34 · Score: 2, Insightful

Though, its very likely that these ideas originate from even earlier in computing history.

Attribution.

That is why most, if not all, software patents are bogus. Just because you reinvent something published by a PhD working in a committee that disbanded 10 years before you knew 'C' came after 'B' in the alphabet does not make you reinvention patent worthy. The history of invsqrt() crosses disciplines of hardware and software design, spec development, graphics and math theory. With such a fascinating function having a hard to track history it is no wonder that things like James W. Hunt's 1976 diff concept can be patented in 2006 by Microsoft. (As mentioned by QuantumG on slashdot.)

The trouble had in tracking the history of InvSqrt() is really sad. Computing is an industry that hypes how digital storage of information and perfect copies eliminates the isse of decay. While the ever expanding secondary storage (just now getting to really usefull) means oblivating the scaling issues with retaining the meta-information that denotes this very history. I guess the moral of the story is: sign your contributions. It won't take up that much space in the comments.

"Don't have good ideas if you aren't willing to be responsible for them."

--Alan Perlis

And will the real Fast InvSqrt() author please stand up?

--

"You cannot have a General Will unless you have shared experiences. You cannot be fair to people you don't know."
Re:A famous quote by johnw · 2006-12-01 22:44 · Score: 2, Insightful

And badly named! It's not the inverse square root (which would simply be the square); it's the reciprocal of the square root.
Re:A famous quote by Mr+Chund+Man · 2006-12-02 01:58 · Score: 2, Insightful

I don't think that's quite what he meant...