Slashdot Mirror


Grand Unified Theory of SIMD

Glen Low writes " All of a sudden, there's going to be an Altivec unit in every pot: the Mac Mini, the Cell processor, the Xbox2. Yet programming for the PowerPC Altivec and Intel MMX/SSE SIMD (single instruction multiple data) units remains the black art of assembly language magicians. The macstl project tries to unify the architectures in a simple C++ template library. It just reached its 0.2 milestone and claims a 3.6x to 16.2x speed-up over hand-coded scalar loops. And of course it's all OSI-approved RPL goodness. "

5 of 223 comments (clear)

  1. Altivec by BWJones · · Score: 5, Informative


    For those who want a little background on Altivec, of course Wiki has a description here. Apple, who now ships Altivec in every system they make has a pretty good page here and Motorola nee Freescale has one here.

    The benefits of Altivec can be truly astounding for those processes that can be "vectorized". After all putting these kinds of calculations in hardware has got it all over software computation. It kind of reminds me of when I got one of those Photoshop accelerator hardware cards (Radius Photoengine with 4 DSPs on a daughter card linked to the Thunder series video card) for my IIci. Photoshop filter functions ran faster on that IIci than they did on much later PowerPC systems simply because you now had four hardware DSPs running your image math.

    --
    Visit Jonesblog and say hello.
  2. License issues by IO+ERROR · · Score: 5, Informative
    Be careful; the "open source" license (PDF) is not GPL-compatible. I don't even think it's BSD-compatible on first reading.

    The Reciprocal Public License requires you to release all of your source code if you link to this library, even if your project is personal or used in-house only.

    --
    How am I supposed to fit a pithy, relevant quote into 120 characters?
  3. Re:16X increase? by LordRPI · · Score: 5, Informative

    The principle behind SIMD, or, rather, Single Instruction Multiple Data, is that you can process wide arrays of values in a single instruction. With the PowerPC version of SIMD, also known as AltiVec, you can issue an instruction and have it work with a 128-bit wide register. These registers may contain up to 4 32-bit numbers, 8 16-bit numbers or 16 8-bit numbers. For example, I can load two AltiVec registers with 16 unsigned chars, add them together using Vec_Add() and have it return its results to an AltiVec register. So this in essense is adding 16 values at once and in theory it's good enough for markeing to claim a 16X speedup, but this is rarely the case.

  4. Autovectorization being add in GCC 4.0 by shawnce · · Score: 5, Interesting

    For those that don't already know is that autovectorization is being worked on for GCC by folks from IBM and others.

    GCC vectorizatoin project (site seem offline atm) but the abstract from a recent GCC summit is up.

    Autovectorization Talk (google html view of pdf)

  5. Depends on what you are doing by dsci · · Score: 5, Insightful

    We write code for hardcore chemical simulations. The limits on what can be studied, ie number of atoms/molecules or timescales of the simulations depends on one thing: speed.

    Faster computers means better simulations. BUT, if the code is not as fast as it can be on a particular architecture, your simulations are not going to be as complete as they can be. At least within a given time allotment.

    I've recently applied some code optimizations to a Monte Carlo simulation and saw speed ups of over 1000x. That's significant.

    It's naive to think that faster computers means we should live with sloppy or unoptimized code. SIMD is a useful technique, and if it means the difference between me getting work done in a week or two or three weeks, I think I'll take the one-week sim.

    --
    Computational Chemistry products and services.