Slashdot Mirror


FFTs Using AltiVec on Linux and Mac OS X

GregAllen writes "After searching for a good open-source AltiVec implementation of the Fast Fourier Transform (FFT), I have put together the AltiVec FFT page. It has all of the source, some benchmarks, and runs on Mac OS X or LinuxPPC. I even compared it to the venerable (but scalar) FFTW."

33 comments

  1. the obligatory... by ndevice · · Score: 2, Insightful

    even faster mp3 / dvd / video variant - encoding/decoding!

    Won't the *aa be pleased.

    'Consumers shouldn't be allowed to have machines that can encode media files quickly since we're the only ones who are allowed to produce this stuff...'

    'and no, why would consumers need machines that can decode a media file faster than the realtime playback speed? if you're playing something back, you won't need to do anything else...'

    1. Re:the obligatory... by Twirlip+of+the+Mists · · Score: 1

      iTunes already uses a multithreaded, vectorized MP3 encoder. It's very fast.

      --

      I write in my journal
  2. memory interleaving by ndevice · · Score: 2, Interesting

    from the article:
    "Split data seems to be much faster with SIMD architectures"

    caching associativity and memory interleaving anyone?

    1. Re:memory interleaving by Pathwalker · · Score: 1

      caching associativity and memory interleaving anyone?

      I think the main reason is probably that vDSP only uses vectorized functions on split data, if it has to stride over the data set, it uses a scalar version of the function.

      Take a look at the docs for vDSP - for each function it lists the critera used to select the vector/scaler version of each function. You'll see the requirement that "Stride is equal to 1" pop up all the time - those functions only use altivec if the data has been split.

  3. What's a FFT by redcliffe · · Score: 2

    What exactly is a FFT, and why would I want to do one?

    1. Re:What's a FFT by ndevice · · Score: 5, Informative

      an fft is a fast fourier transform. It is commonly used to transform data from the time domain to the frequency domain.

      What this means is that is if you have a time-varying signal like audio for example, you can run an fft on it and get a frequency analysis on the sound. i.e. from the sound in a waveform format, after an fft, you can pick out which are the dominant frequencies, and what the relationship between frequencies are.

      Although the fft is strictly not necessary because it is, after all, just a transform, it turns out that many media compression techniques use them because humans aren't as discriminating in the frequency domain so if you do lossy compression in the frequency domain, we won't notice/mind as much.

      fyi, you do an fft any time you view a .jpg, or listen to an .mp3, or watch a dvd, etc. Lots of uses for them, and that doesn't even get into the engineering/scientific uses...

    2. Re:What's a FFT by Anonymous Coward · · Score: 2, Informative

      FFT = Fast Fourier Transform. It's used to convert a signal from time domain to frequency domain, which is closer to the way human perception works. For example: To record audio data, air pressure changes over time are sampled. An FFT of a sequence of samples gives the freqencies which are in that block. Most current media compression schemes make use of this transformation to identify the data components which are least noticeable when left out or encoded with reduced precision.

    3. Re:What's a FFT by tuxedobob · · Score: 1

      Such as SETI@home. It runs FFT for a split-second before doing the more CPU intensive tasks.

    4. Re:What's a FFT by nathanh · · Score: 4, Informative
      fyi, you do an fft any time you view a .jpg, or listen to an .mp3, or watch a dvd, etc.

      You use a FFT anytime you ENCODE a .jpg, or an .mp3, or a DVD. When you view or listen or watch you are using an Inverse FFT.

      And to be very specific, I think all your examples use DCT (Discrete Cosine Transform) and not FFT. JPEG definitely does. Very good overview here

    5. Re:What's a FFT by ndevice · · Score: 1

      oops, my mistake. Got my series mixed up there.

    6. Re:What's a FFT by jellisky · · Score: 1

      One quick correction:

      "It is commonly used to transform data from the time domain to the frequency domain."

      Technically, it can take data from any well-behaved domain to the frequency domain, not just temporal data. I use FFTs on spatial data more than temporal data. Just had to nitpick that. :)

      -Jellisky

  4. Not really for OSX unless you're porting to linux by RalphBNumbers · · Score: 2, Informative
    Note that MacOS X has a library called vDSP which is as much as twice as fast as vBigDSP on split data. Source code is not available, making it difficult to port to Linux.


    So unless you're writing something for OSX and linux, want to use the same libs on both and are willing to sacrifice a 2x speed increase, you'd be well advised not to use this library package for FFTs on OSX.
    --
    "The worst tyrannies were the ones where a governance required its own logic on every embedded node." - Vernor Vinge
  5. Linux is faster than macosx by dcstimm · · Score: 3, Informative

    I use Gentoo Linux on my ibook and since I can compile all my code for my g3 cpu, I notice a huge speed increase between linux and macosx. Since My cpu doesnt take advantage of altivec apps that are compiled for altivec run slow on a g3. Apps that are directly compiled for a g3 will run 20-30% faster. Of course apple could make Macosx faster for g3 people if they recompiled the OS with out altivec support, but they cant do that. With My g4 system I dont see a huge speed difference in linux unless I use more aggressive gcc optimizations. Anyway, Linux on a g3 will always run faster than macosx on a g3.

    1. Re:Linux is faster than macosx by Farley+Mullet · · Score: 2, Insightful

      This is probably because OS/X is heavily altivec-optimized, and g3's don't have altivec. So basically what you're saying is "a g3 optimized operating system is faster on my g3 than a g4 optimized operating system."

      Shocking, eh?

    2. Re:Linux is faster than macosx by n8_f · · Score: 1

      This is wrong. Apps that are compiled for Altivec don't run at all on G3s. That is why if you want to run Altivec code you have to detect at runtime what CPU you are on and run a non-Altivec version (or quit) if you are on a G3. So adding in optimizations for G4s do not make G3s run slower: they just run the non-optimized, plain PPC version.

      You notice a speed difference between OSX and Linux because you compiled Linux from source for the G3? Huh? Perhaps it is because you are running a completely different window manager (Aqua vs. KDE/Gnome), graphics system (Quartz vs. X11), operating system (BSD vs. GNU), and kernel (Mach vs. Linux). If you really want to measure the speed difference from compiling for the G3, take a binary PPC distro and test it against the exact same distro compiled from source. Debian should suffice. However, I really doubt you will see much difference.

      What you are really seeing isn't a huge difference between Linux and OSX on a G3 and a neglible difference between Linux and OSX on a G4, but a neglible difference between Linux on a G3 and a G4 and a huge difference between OSX on a G3 and a G4. And the reasons aren't suprising. The reason you don't see a large difference between Linux on a G3 and a G4 is because Linux is not (for the most part) optimized to take advantage of Altivec, the G4's main advantage over the G3, and because even if it was, Linux runs "fast enough" on a G3, so it doesn't need the extra speed those optimizations would provide (it is like the difference between 60fps and 120fps; not as dramatic as the difference between 15fps and 30fps). The reason for the large difference between OSX on a G3 and a G4 is that OSX is heavily optimized for the G4, and OSX needs the extra speed those optimizations provide. Aqua and Quartz are a lot more computationally complex than KDE/Gnome and X11. Altivec (or QuartzExtreme) put OSX into the "fast enough" category, and therefore equivalent to Linux.

      Nathan

    3. Re:Linux is faster than macosx by mkldev · · Score: 1
      Just to pick nits, I've never seen anyone (other than maybe a few open source projects during early development) include two copies of an application. Most apps only benefit from altivec enhancements in a handful of frequently used but easily isolated core processing routines. Thus, they simply take one conditional path on hardware with Altivec and a different path on hardware without it.

      Also you don't compile an application for Altivec. You write a separate version of those core routines that is completely rewritten to use vector operations instead of scalar.

      But you are right that, to my knowledge, no one has ever bothered to implement altivec emulation traps on the G3. It just doesn't seem that useful. :-)

      --
      120 character sigs suck. Make it 250.
  6. Re:Not really for OSX unless you're porting to lin by GregAllen · · Score: 4, Informative

    It's important to point out that the "traditional way" to store complex numbers (and do an FFT) is with interleaved data. If your application uses interleaved data (and most do), then this library is for you. Split data is being used more because of the obvious performance improvements, but you have to change your code (and your mindset) to use it.

    I've updated it to say: as much as twice as fast as vBigDSP on split data (but about the same on interleaved data).

    I'd like to get an open-source split-data version to add to the site, so Linux users get good split-data FFTs, too.

    --
    Please help find my missing daughter: FindSabrina.org
  7. Old story? by MS_is_the_best · · Score: 2, Informative

    I wonder when this benchmark was published, because the FFTW homepage already offers a 3.0beta for download, including SIMD support (SSE/SSE2/3DNow!/Altivec) support.

    For applications where raw speed is not very important I recommend everyone to use the fftw library, it is already installed on a lot of systems and easy in use. Very fast are Intel's SDK version and DJ Bernstein's (only 256 points).

  8. Further senseless nit-picking by melquiades · · Score: 4, Informative

    Although people often use the terms interchangably, strictly speaking the Fourier Transform is the mathematical operation (like "sort") and the FFT is a particular way of computing it (like "quicksort").

    The FFT is a particularly FAST algorithm for computing Fourier Transforms, with O(n log n) instead of O(n^2) for the naive algorithm -- thus the name. I believe that there is a comparably fast and very similar algorithm for computing the DCT, which doesn't really have a separate name.

    You use a FFT anytime you ENCODE a .jpg, or an .mp3, or a DVD. When you view or listen or watch you are using an Inverse FFT.

    True, though an inverse Fourier tranform is simply the Fourier transform run through a complex complement and multiplication by a constant. The FFT and the IFFT are essentially the same algorithm.

    And to be very specific, I think all your examples use DCT (Discrete Cosine Transform) and not FFT.

    Right ... though again, the DCT and the FT are essentially the same operation -- the DCT is just the real part of the FT (the FT is complex).

    1. Re:Further senseless nit-picking by plastik55 · · Score: 3, Informative
      Right ... though again, the DCT and the FT are essentially the same operation -- the DCT is just the real part of the FT (the FT is complex).


      No -- the DCT spaces its components at half-wavenumbers while the FT has components spaced at integer wavenumbers. So only every other DCT components would be equal to the real part of an FT component. (that isn't strictly true, either, since most DCTs have a half-sample offset compared to FT)


      Interestingly enough it wasn't until the '90s that a "fast" or O(n log n) algorithm for calculating the DCT was found--this is well after it had become widely used in compression.

      --

      I have a positive modifier on Troll. When I mod someone Troll their karma should go UP!

  9. FFTW 3.0 beta has been released by Dominic_Mazzoni · · Score: 3, Informative

    Well, FFTW 3.0 beta has been released, and it includes Altivec support. I would be very surprised if this new FFTW wasn't extremely competitive with all the benchmarks presented on this page, if not better.

    Anyone want to add it to the benchmark?

    1. Re:FFTW 3.0 beta has been released by GregAllen · · Score: 1

      FFTW 3.0 beta ... Anyone want to add it to the benchmark?

      Yes! I've run the benchmark, and will add the results shortly. Preliminary indication -- it's no faster than vBigDSP.

      --
      Please help find my missing daughter: FindSabrina.org
  10. Have you already seen djbfft? by twoflower · · Score: 2, Informative

    Have you tried djbfft? You might want to, as it's quality software from a hardcore mathematician.

    --


    --
    Twoflower
    1. Re:Have you already seen djbfft? by GregAllen · · Score: 1

      The djbfft website clearly emphasizes the Pentium, and makes no mention of AltiVec. I have personally found FFTW to be relatively fast on most (non-vector) platforms. However, the AltiVec FFTs smoke it.

      You have to give the FFTW people credit: they have provided their results, and made it easy for you to reproduce them. I guess that's why they're at MIT, and djbfft is at.... Where is it again?

      I don't really care who wrote my FFT, I just want it to be FAST and CORRECT.

      The djbfft website sounds like a lot of whining. If they think they've been wronged, they should provide a version of benchFFT that shows how it is better than FFTW, and provide a way for people to verify that.

      --
      Please help find my missing daughter: FindSabrina.org
    2. Re:Have you already seen djbfft? by twoflower · · Score: 1
      The djbfft website clearly emphasizes the Pentium, and makes no mention of AltiVec. I have personally found FFTW to be relatively fast on most (non-vector) platforms. However, the AltiVec FFTs smoke it.
      djbfft is faster than FFTW. It's also faster than many FFT libraries that sacrifice accuracy for speed. I asked if you had tried it, and you didn't answer.
      You have to give the FFTW people credit: they have provided their results, and made it easy for you to reproduce them. I guess that's why they're at MIT, and djbfft is at.... Where is it again?
      University of Illinois at Chicago. Don't try playing ego games; if you don't know who djb is, you need to do some more research before trying to slag him. He's one of the most important mathematicians in the world today.
      --


      --
      Twoflower
    3. Re:Have you already seen djbfft? by GregAllen · · Score: 1
      djbfft is faster than FFTW
      As is (almost) every AltiVec FFT benchmarked on the site -- FFTW is the slowest one listed. I included FFTW only because it is a "well known" scalar point of reference -- it is obviously very well known to djb. The topic of discussion is AltiVec, which djbfft clearly does not use. No matter how fast of a scalar implementation djbfft is, it cannot hope to compete with vBigDSP.

      I mean absolutely no disrespect to Dr. Bernstein. On his site he makes statements such as this: How many other high-performance libraries have been excluded from the FFTW benchmarks, or slowed down by the FFTW authors? Not good.

      I would suggest that he show head-to-head results in an easily digestable format. The FFTW people have made it very easy for him to do so. He could fix the compile flags on benchFFT that he is complaining about, and then post the results. As in "put your money where your mouth is."
      --
      Please help find my missing daughter: FindSabrina.org
  11. What about Matlab? by Anonymous Coward · · Score: 1, Interesting

    I always just use MATLAB. Does exactly what I need with speed and grace. I just hope that they come out with an OS X graphical version soon...

    1. Re:What about Matlab? by coult · · Score: 2, Informative

      Umm...not sure what is meant by "OS X graphical version", but I'm running Matlab for OS X and its the full version, does everything that it does on other platforms, including fancy graphics. It works really well with Apple's X11 because then all the graphics are hardware accelerated.

      --

      All is Number -Pythagoras.

  12. Re:Not really for OSX unless you're porting to lin by Pathwalker · · Score: 1

    I've updated it to say: as much as twice as fast as vBigDSP on split data (but about the same on interleaved data).

    There's a rather obvious reason for this, if you read the docs on vDSP.

    All of the functions are provided in vector and scalar forms, with a set of criteria listed that must be met for the vector form of the function to be used. You'll see the requirement "Stride must be equal to 1" quite often.
    In other words, the function is only vectorized for split data, and runs a slower scalar version for interleaved data.

    So, by using interleaved data, you are benchmarking a scalar vDSP against your other scalar algorithms.

  13. Re:Not really for OSX unless you're porting to lin by GregAllen · · Score: 2, Informative

    by using interleaved data, you are benchmarking a scalar vDSP against your other scalar algorithms

    This is clearly not the case. On an 800 MHz G4, the peak scalar performance is 1600 MFLOPS. You're lucky to get 800 MFLOPS, just like the (very good) scalar FFTW. That's exactly why we have SIMD/AltiVec.

    I'm using Apple's ctoz() and ztoc() to convert from interleaved to split and back, and getting "about the same as vBigDSP" -- that's 1-2 GFLOPS. Without these conversions, that curve peaks at about 4 GFLOPS.

    You also could have checked your incorrect assumption if you had looked at the provided source code.

    --
    Please help find my missing daughter: FindSabrina.org