Auto-Parallelizing Compiler From Codeplay
Max Romantschuk writes "Parallelization of code can be a very tricky thing. We've all heard of the challenges with Cell, and with dual and quad core processors this is becoming an ever more important issue to deal with. The Inquirer writes about a new auto-parallelizing compiler called Sieve from Codeplay: 'What Sieve is is a C++ compiler that will take a section of code and parallelize it for you with a minimum hassle. All you really need to do is take the code you want to run across multiple CPUs and put beginning and end tags on the parts you want to run in parallel.' There is more info on Sieve available on Codeplay's site."
Forgive me if I'm wrong (I've not coded parallel things before), but if the code is re-entrant, does this go a long way towards running the code in parallel? Obviously there are other factors involved here, like addressing memory, but this is thought of in re-entrant programming. I'm not sure what the difference is... please enlighten me :-)
Frtprallps
is arle ot
I loved 'Clocks'. Oh wait, Codeplay...not Coldplay.
Nevermind.
Oh look. A duck.
The opposite of progress is congress
The compiler will put out code for x86, Ageia PhysX and Cell/PS3. There were three tests talked about today, CRC, Julia Ray Tracing and Matrix Multiply. All were run on 8 cores (2S Xeon 5300 CPUs) and showed 739, 789 and 660% speedups respectively.
That's great - but do the algorithms involved here naturally lend themselves to the parallelization techniques the compiler uses? Are there algorithms that are very poor choices for parallelization? For example, can you effectively parallelize a sort? Wouldn't each thread have to avoid exchanging data elements any other thread was working on, and therefore cause massive synchronization issues? A solution might be to divide the data set by the number of threads and then after each set was sorted merge them in order - but that requires more code tweaking than the summary implies. So I wonder how different this is from Open/MT?
I think anybody who is claiming to get decent automatic parallelization out of C/C++ is selling snake oil. Even if a strict reading of the C/C++ standard ends up letting you do something useful, in my experience, real C/C++ programmers make so many assumptions that you can't parallelize their programs without breaking them.
1: OpenMP is supported by Sun, Intel, IBM, $MS(?) etc, and implemented in gcc 4.2.
2: OpenMP has been used successfully for about 10 years now, and is on a 2.5 release of the SPEC.
3. It is Open - the white paper for Codeplay mentions it being protected by patents. (boo hiss)
4. Did I mention that it is supported in gcc 4.2 which I built it on my Powerbook last week and it is very cool?
So maybe Codeplay is a nice system. Maybe they even have users and can offer support. But if you are looking to make your C++ code run multi-threaded with the least amount of effort I've seen ( It is still effort! ) take a look at OpenMP. In my simple tests it was pretty easy to make use of OpenMP, and I am looking forward to trying it on a rather more complicated application.
Oh, are we having a contest for who can name the earliest auto-parallelizing C compiler? If so, I nominate the vc compiler on the Convex computers. The Convex C-1 was released in 1985 and I believe had a vectorizing compiler from the start, which would make sense since it had a single, big-ass vector processor (one instruction, crap loads of operands -- can't remember how many, but it was something like 64 separate values being added to another 64 separate values in one single instruction).
I personally remember watching somebody compile something with it. It was really neat to watch -- required no special pragmas or anything, just plain old regular C code, and it would produce an annotated copy of your file telling you which lines were fully vectorized, partly vectorized, etc. You could, of course, tweak the code to make it easier for the compiler to vectorize it, but even when you did, it was still plain old C code.
Intel's compiler (icc), available for Linux, Windows, and FreeBSD extends OpenMP to clusters.
You can build your OpenMP code and it will run on clusters automatically. Intel's additional pragmas allow you to control, which things you want parallelized over multiple machines vs. multiple CPUs (the former being fairly expensive to setup and keep in sync).
I've also seen messages on gcc's mailing list, that talk about extending gcc's OpenMP implementation (moved from GOMP to mainstream in gcc-4.2) to clusters the same way.
Nothing in OpenMP prevents a particular implementation from offering multi-machine parallelization. Intel's is just the first compiler to get there...
The beauty of it all is that OpenMP is just compiler pragmas — you can always build the same code with them off (or with a non-supporting compiler), and it will still run serially.
In Soviet Washington the swamp drains you.