Slashdot Mirror


User: systemeng

systemeng's activity in the archive.

Stories
0
Comments
182
First seen
Last seen
Profile
(view on slashdot.org)

Comments · 182

  1. Re:It was fast on Origin of Quake3's Fast InvSqrt() · · Score: 3, Informative

    If you have need of a low precision result and you have a vector processor, absolutely. You're right and thanks for the hard numbers. My work is in scientific computing converting digital maps where my final result must be acctuate to better than 1.0e-13 and intermediates are multiplied by numbers on the order of the square of the earth's radius in meters. I'm stuck with the FPU where things aren't so rosy. We usually combine the newton-raphson iteration of 1/sqrt(f(x)) with the newton raphson solution for f(x) where f(x) is something that can only be solved numerically. Once you're stuck with newton-raphson solution, you might as well get the most out of it by using the method to solve 1/(sqrt(f(x)) and get the inverse and the root for free.

  2. Re:It was fast on Origin of Quake3's Fast InvSqrt() · · Score: 1
    I agree with the imsabbel that for the purposes of graphics and places where you have a vector processor, it is long dead. My examples are from scientific computing: converting digital maps from one type to another. These are double precision calculations which cannot be done on either the vector processor or in single precision. The final result typically must be good to beyond 1.0e-13 and single precision won't work because intermediate calculations are multiplied huge numbers on the order of the radius of the earth squared.

    If you are stuck with the FPU: Doing good numerical math is not dead! In the context of graphics with vectorized GPU's and CPU's that implement this operation, newton-raphson square root is unnecessary.

  3. Re:It was fast on Origin of Quake3's Fast InvSqrt() · · Score: 5, Insightful

    First off, this function calculates 1.0/sqrt(x), not sqrt(x). InvSqrt is a particularily nasty function because both the divide and the square root stall the floating point pipeline on IA32 processors. As a result, instead of shooting out one result per cycle that the pipelining normally allows, the processor will stall for 32 cycles for the divide after it has stalled for the 43 cycles for the square root(P4). This is a big hit to realtime performance and it also prevents 76 multiplies from getting done while the pipeline is stalled. Secondly, IA32 processors are super scalar and have multiple integer units which can do portions of this calculation in parallel. This algorithm is brilliant because it uses the integer units for a portion of the most difficult part of the calculation and the remaining floating point multiplies only take about 6 clock cycles on the FPU. The difference in clock cycles you are counting is likely because the routine as written will be implemented as a function call and the stack push overhead will eat you alive. If this is implemented inline, it's about 6 times as good as simply calling the processor's assembly instructions for root and divide in sequence with the penalty that it isn't as accurate. It is virtually impossible to beat sqrt on IA-32 but 1.0/sqrt can be computed faster with newton raphson iteration in one fell swoop than by coposition of the operations. I've worked several years implementing similar optimizations in the reference implementation of ISO/IEC 18026, a standard for digital map conversion. Most of the routines that had optimizations like this added to them saw at least 30% speed improvements. This is a bit of a soft number because many things were reordered to make the pipeline fill better but in general, a complicated function especially of trig fucntions that can be computed in one iteration of well designed newton-raphson will be much faster than the coposition of the CPU's implementation of the component functions. In short, don't write off careful numerics they can provide great sped improvements, just don't use them in code that people will want to understand later if you don't document exactly what you did and why.

  4. Re:What's with use of Pointers? on Origin of Quake3's Fast InvSqrt() · · Score: 1

    The reason is that they want to use the bit values of the floating point number in the integer unit of the processor, not get the integer representation of the number.

  5. For long lasting phones, try a Blackberry on Why Do Gadgets Break? · · Score: 1

    I accidentally dropped my last blackberry in a toilet last year. After immediately removing the battery and propping it in a hotel heater vent for 2 hours it was revived and worked just fine even a year later. This doesn't even count the 50 or hundred times that one fell out of its belt holster. It was much more expensive than many of the phones with gimmicks but I haven't seen an equal in Motorola or Samsung that comes even close to this kind of longevity.

  6. Re:I am not an expert..... on Web Geniuses Or Web Dimwits? · · Score: 1

    Experts when peroperly applied do form a usable concensus. The Rand Corporation studied using groups of experts to predict enemy attacks in the 1940's. They came up with the Delphi Method of estimation seen at http://en.wikipedia.org/wiki/Delphi_method In tasks such as project management, the Delphi Method has been shown to be quite effective at predicting things like the true completion time of a complex project. While not a be-all and end-all the Delphi Method is one of the best uses of experts in prediction.

  7. Standards for Safety Critical Software on Industrial Strength Open Source Code? · · Score: 1

    I don't have FDA experience but for industrial machinery, chemical plant safety systems, and the like, the standard followed is IEC 61508. Unlike ISO 9000, IEC61508 has various hard requirements for failure probabilities and measuring said probabilities. It also has specs like the highest possible reliability than can be assumed in engineering calculations for a device and what has to be done to mitigate failures. It occupies 7 binders on my shelf and provides the basis for how to develop systems that have a verifiably low chance of killing people. A fascinating intro to IEC 61508 is presened in the online edition of Embedded Systems Programming http://www.embedded.com//showArticle.jhtml?article ID=19201765