Slashdot Mirror


Auto-Parallelizing Compiler From Codeplay

Max Romantschuk writes "Parallelization of code can be a very tricky thing. We've all heard of the challenges with Cell, and with dual and quad core processors this is becoming an ever more important issue to deal with. The Inquirer writes about a new auto-parallelizing compiler called Sieve from Codeplay: 'What Sieve is is a C++ compiler that will take a section of code and parallelize it for you with a minimum hassle. All you really need to do is take the code you want to run across multiple CPUs and put beginning and end tags on the parts you want to run in parallel.' There is more info on Sieve available on Codeplay's site."

13 of 147 comments (clear)

  1. openMP by Anonymous Coward · · Score: 2, Informative

    and what the difference between this and openMP ?

    1. Re:openMP by ioshhdflwuegfh · · Score: 2, Informative

      and what the difference between this and openMP ? On the page 7 of The Codeplay Sieve C++ Parallel Programming System, 2006 you'll find section that describes "advantages" of codeplay over openmp, but nothing terribly exciting. Codeplay allows you indeed to better automatize parallelization but is at the same time also limited to a narrower set of optimizations compared to openmp.
  2. Re:Reentrant? by Anonymous Coward · · Score: 5, Informative

    Reentrancy is a factor, because it's a class of dependencies, but there are many other dependencies.

    Consider a for loop: for (int i=0; i100; i++)doSomething(i);

    Can this be parallelized? Perhaps the author meant it like it's written there: First doSomething(0), then doSomething(1), then ... Or maybe he doesn't care about the order and doSomething just needs to run once for each i in 0..99. The art of automatic parallelization is to find overspecifications like the ordered loop where order isn't really necessary. If nothing in doSomething depends on the outcome of doSomething with a different i, they can be run in parallel and in any order. Suppose each doSomething involves a lengthy calculation and an output at the end. Then they can't simply run in parallel, because the output is a dependency: As written, the output from doSomething(0) comes before doSomething(1) and so on. But the compiler could still run the lengthy calculation in parallel and synchronize only the fast output at the end. The more of these opportunities for parallelism the compiler can find, the better it is.

  3. Prefer OpenMP by drerwk · · Score: 5, Informative
    I have some small amount of experience with OpenMP http://openmp.org/ , which allows one to modify C++ or Fortran code using pragmas to direct the compiler regarding parallelization of the code. And the Codeplay white paper made this sound much like it implements one of the dozen or so OpenMP patterns. I am fairly skeptical that Codeplay has any advantage over OpenMP, but the white paper lists some purported advantages. I will not copy them here and take the fun out of reading them for yourself. I will list OpenMP advantages.
    1: OpenMP is supported by Sun, Intel, IBM, $MS(?) etc, and implemented in gcc 4.2.
    2: OpenMP has been used successfully for about 10 years now, and is on a 2.5 release of the SPEC.
    3. It is Open - the white paper for Codeplay mentions it being protected by patents. (boo hiss)
    4. Did I mention that it is supported in gcc 4.2 which I built it on my Powerbook last week and it is very cool?

    So maybe Codeplay is a nice system. Maybe they even have users and can offer support. But if you are looking to make your C++ code run multi-threaded with the least amount of effort I've seen ( It is still effort! ) take a look at OpenMP. In my simple tests it was pretty easy to make use of OpenMP, and I am looking forward to trying it on a rather more complicated application.

    1. Re:Prefer OpenMP by PhrostyMcByte · · Score: 4, Informative

      Don't forget the other end of the development spectrum - Visual C++ 2005 has builtin OpenMP support too.

  4. How long has Sun Studio had "-xautopar"? by Anonymous Coward · · Score: 2, Informative

    Yep, it's in there.

    And it works, too.

  5. Re:FPP by MarkRose · · Score: 2, Informative

    "first parallel post"

    --
    Be relentless!
  6. Re:Reentrant? by jd · · Score: 5, Informative
    Simple version: Parallel code need not be re-entrant, but all re-entrant code is parallel.

    More complex version: There are four ways to run a program. These are "Single Instruction, Single Data" (ie: a single-threaded program), "Single Instruction, Multi Data" (SETI@Home would be an example of this), "Multi Instruction, Single Data" (a good way to program genetic algorithms) and "Multi Instruction, Multi Data" (traditional, hard-core parallelism).

    SIMD would need to be re-entrant to be parallel, otherwise you can't be running the same instructions. (Duh. :) SIMD is fashionable, but is limited to those cases where you are operating on the data in parallel. If you want to experiment with dynamic methods (herustics, genetic algorithms, self-learning networks) or where you want to apply multiple algorithms to the same data (eg: data-mining, using a range of specialist algorithms), then you're going to be running a vast number of completely different routines that may have no components in common. If so, you wouldn't care if they were re-entrant or not.

    In practice, you're likely to use a blend of SIMD, MISD and MIMD in any "real-world" program. People who write "pure" code of one type or another usually end up with something that is ugly, hard to maintain and feels wrong for the problem. On the other hand, it usually requires the fewest messaging and other communication libraries, as you're only doing one type of communication. You can also optimize the hell out of the network, which is very likely to saturate with many problems.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  7. Re:Reentrant? by prencher · · Score: 3, Informative
  8. Is this better than OpenMP? by Anonymous Coward · · Score: 2, Informative

    So, I fail to see what's new about this. As has been mentioned before, OpenMP auto-parallelizes for SMP systems quite well, as long as you know what you're doing. Like anything done in parallel, if you don't figure out where your data and algorithm dependencies are you'll hose your program. If Sieve does some sort of dependency analysis, that would be interesting, but I doubt it would catch all problems. In fact, I imagine it's provably impossible to auto-parallelize in the general case -- it will likely be proven equivalent to the halting problem eventually.

    What would be new is when someone substantially improves on MPI. Auto-parallelizing a FOR loop is amusing, doing the same for a complex algorithm moving data around in a cluster, well, that's a different sort of difficult.

    Anyway, no matter how many libraries and tools come out to ease the pain, parallel programming is frigging hard. In fact, the more automagic the compiler, the harder it will be to debug when the inevitable race condition sneaks through. Combine this with lowering the bar for parallel programming and letting more idiots in and we can look forward to some truly horrific code. If you make it so any idiot can code, any idiot will!

  9. Re:snake oil by ariels · · Score: 2, Informative
    TFA specifically mentions that you need to mark up your code with sieves:
    1. A sieve is defined as a block of code
      contained within a sieve {} marker and
      any functions that are marked with sieve.
    2. Inside a sieve, all side-effects are delayed
      until the end of the sieve.
    3. Side effects are defined as modifications
      of data that are declared outside the
      sieve
    The compiler can use this information to decide what parts of the code can safely be parallelized. Adding the "sieve" keyword can change the semantics of the code, adding it correctly is your responsibility.
    Not sure I find the particular concept appealing for programming -- just trying to straighten out the claim of the article.
    --
    2 dashes and a space, or just 2 dashes?
  10. OpenMP can support clusters by mi · · Score: 4, Informative

    Intel's compiler (icc), available for Linux, Windows, and FreeBSD extends OpenMP to clusters.

    You can build your OpenMP code and it will run on clusters automatically. Intel's additional pragmas allow you to control, which things you want parallelized over multiple machines vs. multiple CPUs (the former being fairly expensive to setup and keep in sync).

    I've also seen messages on gcc's mailing list, that talk about extending gcc's OpenMP implementation (moved from GOMP to mainstream in gcc-4.2) to clusters the same way.

    Nothing in OpenMP prevents a particular implementation from offering multi-machine parallelization. Intel's is just the first compiler to get there...

    The beauty of it all is that OpenMP is just compiler pragmas — you can always build the same code with them off (or with a non-supporting compiler), and it will still run serially.

    --
    In Soviet Washington the swamp drains you.
  11. Nothing new by UtilityFog · · Score: 2, Informative

    Cilk has been around for years, indeed it won the ICFP 1998 programming contest.