High performance FFT on GPUs
A reader writes: "The UNC GAMMA group has recently released a high performance FFT library which can handle large 1-D FFTs. According to their webpage, the FFT library is able to achieve 4x higher computational performance on a $500 NVIDIA 7900 GPU than optimized Intel Math Kernel FFT routines running on high-end Intel and AMD CPUs costing $1500-$2000. The library is supported for both Linux and Windows platforms and is tested to work on many programmable GPUs. There is also a link to download the library freely for non-commerical use."
Why use a GPU for Final Fantasy Tactics? Couldn't you just use the GBA?
Note to mods: I'm probably being sarcastic.
if you're only considering 32-bit floating point numbers and don't need full IEEE-754 compliance.
Fast Fourier Transform
Isn't that what SETI@home uses for the bulk of its signal analysis? Would be kind of neat to leverage the millions of idle GPU's out there.
If you don't know where you are going, you will wind up somewhere else.
RTFA? Or just look at the pretty diagrams.
They're running against dual-processor systems (Opteron and Xeon).
grammar-lesson free since 1999. (rescinded - 2005)
Thirty years later, a $500 GPU, weighing less than 1 pound, can produce 6 gigaflops. People complain about its power and cooling needs, but they are rather below 200 kW! We sometimes forget just how amazing the developments in computing have been over the last three decades.
I'm sorry, but playing Quake at really high framerates does not count as research. He's not fooling anyone.
The business cards which list him as 'Profess0r of Pwnage' probably aren't helping either.
It's also bad when he refers to the undergrads as 'n00bs' during his lectures to them.
an FFT is a transform that turns a signal (like an audio file) into its frequency components (like a spectrograph). It's used for MP3 compression, sound EQs, jpeg compression, mpeg4 compression, and a number of other things (I use FFTW for tuning my guitar).
FFTW is the 'Fastest Fourier Transform in the West', a cute name for the work of a number of graduate students who use several techniques to turn the FFT from 'Numerical Recipes in C' into a freaking speed daemon.
GPUFFTW is much the same thing, but ported to your video card's GPU - which is generally more optimized for doing the 'apply a floating point matrix to an array' thing - thus speedin the FFTW up even more while relieving the main processor from doing the work.
If you don't have a high-powered video card, this means nothing for you. If you do, it means the above operations (compression, spectrum analysis, etc) can be done faster and without eating up processes.
110100 1101000 1101000 1100110 0 1101111 1101000 1100011 1
FFT is a data compression and encryption standard used by a wide variety of extraterrestrial civilizations. Seti@home spends most of its time running FFT code to look for signals. If we managed to communicate with any of these aliens we could ask them what it stands for.
Doesn't it make you feel good to know that our freedoms are protected by politicans, lawyers and journalists.
Current generation GPUs handle 64bit and 128bit colours already. A 64-bit colour value is just four channels of 16-bit floats (halfs in Cg parlance). A 128-bit colour value is a vector of four 32-bit colour values.
If game developers wanted 256-bit colour, then GPUs would need to support 64-bit floating point arithmetic. This is unlikely to happen, however, since 64-bit colour (which is really 48-bit colour with a 16-bit alpha channel) gives more colours than the human eye can distinguish. In fact, even with 64- or 128-bit colour for the intermediate results, current cards only have a 10-bit DAC for converting the colour value to an analogue quantity that can be displayed on an analogue screen.
It is worth noting that Pixar's RenderMan software doesn't use more than 128-bit colour, and films like Toy Story were rendered using 64-bit mode.
I am TheRaven on Soylent News
Apparently nobody knows what an FFT is. Here's the best description I can give without descending into math too much.
The Fast Fourier Transform is an algorithm to turn a set of data (as amplitude vs. time) into a set of waves (as amplitude vs. frequency). Say that I have a recording of a piano playing an A at 440 Hz. If I plot the actual data that the sound card records, it'll come out something like this picture. There's a large fading-out, then the 440 Hz wave, then a couple of overtones at multiples of 440 Hz. The Fourier series will have a strong spike at 440 Hz, then smaller spikes at higher frequencies: something like this plot. (Of course, that's not at 440, but you get the idea.)
The reason we like Fourier transforms is that once you have that second plot, it's extremely easy to tell what the frequency of the wave is, for example - just look for the biggest spike. It's a much more efficient way to store musical data, and it allows for, e.g., pitch transformations (compute the FFT, add your pitch change to the result, and compute the inverse FFT which uses almost the same formula). It's good for data compression because it can tell us which frequencies are important and which are imperceptible - and it's much smaller to say "Play 440 Hz, plus half an 880 Hz, plus..." than to specify each height at each sampling interval.
The FFT is a very mathematics-heavy algorithm, which makes it well suited for a GPU (a math-oriented device, because it performs a lot of vector and floating-point calculations for graphics rendering) as opposed to a general-purpose CPU (which is more suited for data transfer and processing, memory access, logic structures, integer calculations, etc.) We're starting to see a lot of use of the GPU as the modern equivalent of the old math coprocessor.
If you're looking for more information, Wikipedia's FFT article is a good technical description of the algorithm itself. This article has some good diagrams and examples, but his explanation is a little non-traditional.
Finally I have a good excuse to give the IT department why I need to upgrade my video card. I need to do FFTs faster (it has nothing at all to do with Doom3).
Support Right To Repair Legislation.
Typically, when doing these measurements, the GAMMA group counts the upload/download time as part of the computation time. So, the 4x-5x speedup you're seeing is end to end, with results starting and ending in main memory.
Our tests on nVidia 5600 series AGP cards (this was several years ago) showed that the net SETI@home throughput using the GPU was at best 1/5 of what we could obtain with the CPU. This was primarily due to transfers out of graphics memory and into main memory.
PCI Express allows for symmetric bandwidth to graphics memory and graphics memories are now typically larger than the size of our working set. The difficulty will be in benchmarking to see which is faster for a specific GPU/CPU combination.
At any rate it's a fairly simple job to swap FFT routines in SETI@home. The source is available. Someone may have done it by now...
Support SETI@home
In floating-point arithmetic, the algorithm was proved in 1966 to have an upper bound for the error that grows only as O(log N), and the mean (rms) error grows only as O(log N). (See this page for more info.) (Errors in fixed-point arithmetic are worse, going as N.)
Even in single precision, the errors for their FFT sizes are probably quite reasonable, assuming they haven't done something silly like use an unstable trigonometric recurrence.
If a thing is not diminished by being shared, it is not rightly owned if it is only owned & not shared. S. Augustine