Auto-Parallelizing Compiler From Codeplay

← Back to Stories (view on slashdot.org)

Auto-Parallelizing Compiler From Codeplay

Posted by ryuzaki0 on Friday March 9, 2007 @03:28PM from the code-writes-you dept.

Max Romantschuk writes "Parallelization of code can be a very tricky thing. We've all heard of the challenges with Cell, and with dual and quad core processors this is becoming an ever more important issue to deal with. The Inquirer writes about a new auto-parallelizing compiler called Sieve from Codeplay: 'What Sieve is is a C++ compiler that will take a section of code and parallelize it for you with a minimum hassle. All you really need to do is take the code you want to run across multiple CPUs and put beginning and end tags on the parts you want to run in parallel.' There is more info on Sieve available on Codeplay's site."

9 of 147 comments (clear)

Min score:

Reason:

Sort:

Reentrant? by Psychotria · 2007-03-09 15:33 · Score: 5, Interesting

Forgive me if I'm wrong (I've not coded parallel things before), but if the code is re-entrant, does this go a long way towards running the code in parallel? Obviously there are other factors involved here, like addressing memory, but this is thought of in re-entrant programming. I'm not sure what the difference is... please enlighten me :-)
1. Re:Reentrant? by Anonymous Coward · 2007-03-09 15:47 · Score: 5, Informative
  
  Reentrancy is a factor, because it's a class of dependencies, but there are many other dependencies.
  
  Consider a for loop: for (int i=0; i100; i++)doSomething(i);
  
  Can this be parallelized? Perhaps the author meant it like it's written there: First doSomething(0), then doSomething(1), then ... Or maybe he doesn't care about the order and doSomething just needs to run once for each i in 0..99. The art of automatic parallelization is to find overspecifications like the ordered loop where order isn't really necessary. If nothing in doSomething depends on the outcome of doSomething with a different i, they can be run in parallel and in any order. Suppose each doSomething involves a lengthy calculation and an output at the end. Then they can't simply run in parallel, because the output is a dependency: As written, the output from doSomething(0) comes before doSomething(1) and so on. But the compiler could still run the lengthy calculation in parallel and synchronize only the fast output at the end. The more of these opportunities for parallelism the compiler can find, the better it is.
2. Re:Reentrant? by jd · 2007-03-09 17:09 · Score: 5, Informative
  
  Simple version: Parallel code need not be re-entrant, but all re-entrant code is parallel.
  More complex version: There are four ways to run a program. These are "Single Instruction, Single Data" (ie: a single-threaded program), "Single Instruction, Multi Data" (SETI@Home would be an example of this), "Multi Instruction, Single Data" (a good way to program genetic algorithms) and "Multi Instruction, Multi Data" (traditional, hard-core parallelism).
  SIMD would need to be re-entrant to be parallel, otherwise you can't be running the same instructions. (Duh. :) SIMD is fashionable, but is limited to those cases where you are operating on the data in parallel. If you want to experiment with dynamic methods (herustics, genetic algorithms, self-learning networks) or where you want to apply multiple algorithms to the same data (eg: data-mining, using a range of specialist algorithms), then you're going to be running a vast number of completely different routines that may have no components in common. If so, you wouldn't care if they were re-entrant or not.
  In practice, you're likely to use a blend of SIMD, MISD and MIMD in any "real-world" program. People who write "pure" code of one type or another usually end up with something that is ugly, hard to maintain and feels wrong for the problem. On the other hand, it usually requires the fewest messaging and other communication libraries, as you're only doing one type of communication. You can also optimize the hell out of the network, which is very likely to saturate with many problems.
  
  --
  It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
FPP by DigitAl56K · 2007-03-09 15:35 · Score: 5, Funny

Frtprallps
is arle ot
This is Awesome by baldass_newbie · 2007-03-09 15:37 · Score: 5, Funny

I loved 'Clocks'. Oh wait, Codeplay...not Coldplay.
Nevermind.

Oh look. A duck.

--
The opposite of progress is congress
Interesting, but.. by DigitAl56K · 2007-03-09 15:49 · Score: 5, Insightful

The compiler will put out code for x86, Ageia PhysX and Cell/PS3. There were three tests talked about today, CRC, Julia Ray Tracing and Matrix Multiply. All were run on 8 cores (2S Xeon 5300 CPUs) and showed 739, 789 and 660% speedups respectively.

That's great - but do the algorithms involved here naturally lend themselves to the parallelization techniques the compiler uses? Are there algorithms that are very poor choices for parallelization? For example, can you effectively parallelize a sort? Wouldn't each thread have to avoid exchanging data elements any other thread was working on, and therefore cause massive synchronization issues? A solution might be to divide the data set by the number of threads and then after each set was sorted merge them in order - but that requires more code tweaking than the summary implies. So I wonder how different this is from Open/MT?
1. Re:Interesting, but.. by Anonymous Coward · 2007-03-09 16:11 · Score: 5, Interesting
  
  or example, can you effectively parallelize a sort? Wouldn't each thread have to avoid exchanging data elements any other thread was working on, and therefore cause massive synchronization issues?
  
  Yes you can, take a look a Merge sort (or quick sort, same idea). You split up the large data set into smaller ones, sort those and recombine. That's perfect for parallization -- you just need a mechanism for passing out the orginal elements and then recombining them.
  
  So if you had to sort 1B elements maybe you get 100 computers and give them each 1/100th of the data set. THat's manageable for one computer to sort easily. THen just develop a service that hands you the next element from each machine, and you pull off the lowest one.
Prefer OpenMP by drerwk · 2007-03-09 15:57 · Score: 5, Informative

I have some small amount of experience with OpenMP http://openmp.org/ , which allows one to modify C++ or Fortran code using pragmas to direct the compiler regarding parallelization of the code. And the Codeplay white paper made this sound much like it implements one of the dozen or so OpenMP patterns. I am fairly skeptical that Codeplay has any advantage over OpenMP, but the white paper lists some purported advantages. I will not copy them here and take the fun out of reading them for yourself. I will list OpenMP advantages.
1: OpenMP is supported by Sun, Intel, IBM, $MS(?) etc, and implemented in gcc 4.2.
2: OpenMP has been used successfully for about 10 years now, and is on a 2.5 release of the SPEC.
3. It is Open - the white paper for Codeplay mentions it being protected by patents. (boo hiss)
4. Did I mention that it is supported in gcc 4.2 which I built it on my Powerbook last week and it is very cool?

So maybe Codeplay is a nice system. Maybe they even have users and can offer support. But if you are looking to make your C++ code run multi-threaded with the least amount of effort I've seen ( It is still effort! ) take a look at OpenMP. In my simple tests it was pretty easy to make use of OpenMP, and I am looking forward to trying it on a rather more complicated application.
Re:Been done... by adrianmonk · 2007-03-09 19:12 · Score: 5, Interesting

I have my 'Mips Pro Auto Parallellizing Option 7.2.1' cd sitting right next to my Irix 6.5 machine... and I know it's YEARS old

Oh, are we having a contest for who can name the earliest auto-parallelizing C compiler? If so, I nominate the vc compiler on the Convex computers. The Convex C-1 was released in 1985 and I believe had a vectorizing compiler from the start, which would make sense since it had a single, big-ass vector processor (one instruction, crap loads of operands -- can't remember how many, but it was something like 64 separate values being added to another 64 separate values in one single instruction).

I personally remember watching somebody compile something with it. It was really neat to watch -- required no special pragmas or anything, just plain old regular C code, and it would produce an annotated copy of your file telling you which lines were fully vectorized, partly vectorized, etc. You could, of course, tweak the code to make it easier for the compiler to vectorize it, but even when you did, it was still plain old C code.