Slashdot Mirror


High performance FFT on GPUs

A reader writes: "The UNC GAMMA group has recently released a high performance FFT library which can handle large 1-D FFTs. According to their webpage, the FFT library is able to achieve 4x higher computational performance on a $500 NVIDIA 7900 GPU than optimized Intel Math Kernel FFT routines running on high-end Intel and AMD CPUs costing $1500-$2000. The library is supported for both Linux and Windows platforms and is tested to work on many programmable GPUs. There is also a link to download the library freely for non-commerical use."

4 of 274 comments (clear)

  1. FFT=Fast Fourier Transforms by amliebsch · · Score: 4, Interesting

    Isn't that what SETI@home uses for the bulk of its signal analysis? Would be kind of neat to leverage the millions of idle GPU's out there.

    --
    If you don't know where you are going, you will wind up somewhere else.
  2. Re:It's nice... by john.r.strohm · · Score: 4, Interesting

    Depending on what you're doing, for an FFT, you probably don't need 64-bit floating point, and you DON'T need full IEEE-754 compliance.

    If you are taking data off of some kind of sensor, there are damned few sensors with 24 good bits of data out of the noise floor. Radars work just fine with 16-bit A/D converters.

    IEEE-754 compliance helps you in the ill-defined corners of the number space. FFTs inherently work in the well-behaved arena of simple trig functions and three-function (add/subtract/multiply) math.

    I'm currently doing FFTs with 16-bit fractional arithmetic in Blackfin DSP. For what I'm doing with the results, it is good enough.

    Not to mention you could use a "GPU farm" to do a fast search, and take any "interesting" data regions and feed those to a 64-bit, fully-IEEE-754 compliant, slow-as-molasses-in-January x86 FFT.

    Eventually, with some more articles like this one and yesterday's Cell piece, people will start to figure out that the x86 architecture is brain-dead and needs to be put out of its misery.

  3. Cray-1 comparison by Mostly+a+lurker · · Score: 5, Interesting
    The Cray-1A supercomputer, weighing in at 5.5 tons, had an absolute maximum peak performance of 250 megaflops. It, of course, cost millions and the power requirements (including for cooling) were in excess of 200 kW. I remember marveling at the advanced nature of this technological achievement.

    Thirty years later, a $500 GPU, weighing less than 1 pound, can produce 6 gigaflops. People complain about its power and cooling needs, but they are rather below 200 kW! We sometimes forget just how amazing the developments in computing have been over the last three decades.

  4. Re:Any 64 bit GPU's? by TheRaven64 · · Score: 4, Interesting
    more and more game developers are pushing for 64 bit color accuracy, which will necessitate a transition to fully 64bit GPUs in the not distant future

    Current generation GPUs handle 64bit and 128bit colours already. A 64-bit colour value is just four channels of 16-bit floats (halfs in Cg parlance). A 128-bit colour value is a vector of four 32-bit colour values.

    If game developers wanted 256-bit colour, then GPUs would need to support 64-bit floating point arithmetic. This is unlikely to happen, however, since 64-bit colour (which is really 48-bit colour with a 16-bit alpha channel) gives more colours than the human eye can distinguish. In fact, even with 64- or 128-bit colour for the intermediate results, current cards only have a 10-bit DAC for converting the colour value to an analogue quantity that can be displayed on an analogue screen.

    It is worth noting that Pixar's RenderMan software doesn't use more than 128-bit colour, and films like Toy Story were rendered using 64-bit mode.

    --
    I am TheRaven on Soylent News