FFTs Using AltiVec on Linux and Mac OS X
GregAllen writes "After searching for a good open-source AltiVec implementation of the Fast Fourier Transform (FFT), I have put together the AltiVec FFT page. It has all of the source, some benchmarks, and runs on Mac OS X or LinuxPPC. I even compared it to the venerable (but scalar) FFTW."
even faster mp3 / dvd / video variant - encoding/decoding!
Won't the *aa be pleased.
'Consumers shouldn't be allowed to have machines that can encode media files quickly since we're the only ones who are allowed to produce this stuff...'
'and no, why would consumers need machines that can decode a media file faster than the realtime playback speed? if you're playing something back, you won't need to do anything else...'
from the article:
"Split data seems to be much faster with SIMD architectures"
caching associativity and memory interleaving anyone?
What exactly is a FFT, and why would I want to do one?
So unless you're writing something for OSX and linux, want to use the same libs on both and are willing to sacrifice a 2x speed increase, you'd be well advised not to use this library package for FFTs on OSX.
"The worst tyrannies were the ones where a governance required its own logic on every embedded node." - Vernor Vinge
I use Gentoo Linux on my ibook and since I can compile all my code for my g3 cpu, I notice a huge speed increase between linux and macosx. Since My cpu doesnt take advantage of altivec apps that are compiled for altivec run slow on a g3. Apps that are directly compiled for a g3 will run 20-30% faster. Of course apple could make Macosx faster for g3 people if they recompiled the OS with out altivec support, but they cant do that. With My g4 system I dont see a huge speed difference in linux unless I use more aggressive gcc optimizations. Anyway, Linux on a g3 will always run faster than macosx on a g3.
keanmarine.com
It's important to point out that the "traditional way" to store complex numbers (and do an FFT) is with interleaved data. If your application uses interleaved data (and most do), then this library is for you. Split data is being used more because of the obvious performance improvements, but you have to change your code (and your mindset) to use it.
I've updated it to say: as much as twice as fast as vBigDSP on split data (but about the same on interleaved data).
I'd like to get an open-source split-data version to add to the site, so Linux users get good split-data FFTs, too.
Please help find my missing daughter: FindSabrina.org
I wonder when this benchmark was published, because the FFTW homepage already offers a 3.0beta for download, including SIMD support (SSE/SSE2/3DNow!/Altivec) support.
For applications where raw speed is not very important I recommend everyone to use the fftw library, it is already installed on a lot of systems and easy in use. Very fast are Intel's SDK version and DJ Bernstein's (only 256 points).
Although people often use the terms interchangably, strictly speaking the Fourier Transform is the mathematical operation (like "sort") and the FFT is a particular way of computing it (like "quicksort").
.jpg, or an .mp3, or a DVD. When you view or listen or watch you are using an Inverse FFT.
... though again, the DCT and the FT are essentially the same operation -- the DCT is just the real part of the FT (the FT is complex).
The FFT is a particularly FAST algorithm for computing Fourier Transforms, with O(n log n) instead of O(n^2) for the naive algorithm -- thus the name. I believe that there is a comparably fast and very similar algorithm for computing the DCT, which doesn't really have a separate name.
You use a FFT anytime you ENCODE a
True, though an inverse Fourier tranform is simply the Fourier transform run through a complex complement and multiplication by a constant. The FFT and the IFFT are essentially the same algorithm.
And to be very specific, I think all your examples use DCT (Discrete Cosine Transform) and not FFT.
Right
Well, FFTW 3.0 beta has been released, and it includes Altivec support. I would be very surprised if this new FFTW wasn't extremely competitive with all the benchmarks presented on this page, if not better.
Anyone want to add it to the benchmark?
Have you tried djbfft? You might want to, as it's quality software from a hardcore mathematician.
--
Twoflower
I always just use MATLAB. Does exactly what I need with speed and grace. I just hope that they come out with an OS X graphical version soon...
I've updated it to say: as much as twice as fast as vBigDSP on split data (but about the same on interleaved data).
There's a rather obvious reason for this, if you read the docs on vDSP.
All of the functions are provided in vector and scalar forms, with a set of criteria listed that must be met for the vector form of the function to be used. You'll see the requirement "Stride must be equal to 1" quite often.
In other words, the function is only vectorized for split data, and runs a slower scalar version for interleaved data.
So, by using interleaved data, you are benchmarking a scalar vDSP against your other scalar algorithms.
by using interleaved data, you are benchmarking a scalar vDSP against your other scalar algorithms
This is clearly not the case. On an 800 MHz G4, the peak scalar performance is 1600 MFLOPS. You're lucky to get 800 MFLOPS, just like the (very good) scalar FFTW. That's exactly why we have SIMD/AltiVec.
I'm using Apple's ctoz() and ztoc() to convert from interleaved to split and back, and getting "about the same as vBigDSP" -- that's 1-2 GFLOPS. Without these conversions, that curve peaks at about 4 GFLOPS.
You also could have checked your incorrect assumption if you had looked at the provided source code.
Please help find my missing daughter: FindSabrina.org