Slashdot Mirror


Auto-Parallelizing Compiler From Codeplay

Max Romantschuk writes "Parallelization of code can be a very tricky thing. We've all heard of the challenges with Cell, and with dual and quad core processors this is becoming an ever more important issue to deal with. The Inquirer writes about a new auto-parallelizing compiler called Sieve from Codeplay: 'What Sieve is is a C++ compiler that will take a section of code and parallelize it for you with a minimum hassle. All you really need to do is take the code you want to run across multiple CPUs and put beginning and end tags on the parts you want to run in parallel.' There is more info on Sieve available on Codeplay's site."

11 of 147 comments (clear)

  1. Reentrant? by Psychotria · · Score: 5, Interesting

    Forgive me if I'm wrong (I've not coded parallel things before), but if the code is re-entrant, does this go a long way towards running the code in parallel? Obviously there are other factors involved here, like addressing memory, but this is thought of in re-entrant programming. I'm not sure what the difference is... please enlighten me :-)

    1. Re:Reentrant? by 644bd346996 · · Score: 4, Interesting

      In the case of the for loop, that is really a symptom of the fact that c-style languages don't have syntax for saying "do this to each of these". So one must manually iterate over the elements. Java does have the for-each syntax, but it is just an abbreviation of the "for i from 0 to x" loop.

      Practically all for loops written are independent of order, so they could be trivially implemented using MapReduce. That one change would parallelize a lot of code, with no tricky compiler optimizations.

  2. Yup by PhrostyMcByte · · Score: 3, Interesting

    For the majority of apps, OpenMP is enough. That is what this looks like - a proprietary OpenMP. It might make it easier than creating and managing your own threads but calling it "auto" parallelizing when you need to mark what to execute in parallel is a bit of a stretch.

    For apps that need more, it is probably a big enough requirement that someone knowledgable is already on the coding team. Which isn't to say that a compiler/lang/lib lowering the "experience required" bar wouldn't be welcomed, just that I wish these people would work on solving some new problems instead of re-tackling old ones.

    The main purpose of these extensions seems to be finding a way to restrict the noob developer enough that they won't be able to abuse threading like some apps love to do. That is a very good thing in my book! (Think Freenet, where 200-600 threads is normal.)

  3. Re:Interesting, but.. by Anonymous Coward · · Score: 5, Interesting

    or example, can you effectively parallelize a sort? Wouldn't each thread have to avoid exchanging data elements any other thread was working on, and therefore cause massive synchronization issues?

    Yes you can, take a look a Merge sort (or quick sort, same idea). You split up the large data set into smaller ones, sort those and recombine. That's perfect for parallization -- you just need a mechanism for passing out the orginal elements and then recombining them.

    So if you had to sort 1B elements maybe you get 100 computers and give them each 1/100th of the data set. THat's manageable for one computer to sort easily. THen just develop a service that hands you the next element from each machine, and you pull off the lowest one.

  4. Been done... by TheRealMindChild · · Score: 3, Interesting

    I have my 'Mips Pro Auto Parallellizing Option 7.2.1' cd sitting right next to my Irix 6.5 machine... and I know it's YEARS old

    --

    "When life gives you lemons, don't make lemonade. Make life take the lemons back!" -- Cave Johnson
    1. Re:Been done... by adrianmonk · · Score: 5, Interesting

      I have my 'Mips Pro Auto Parallellizing Option 7.2.1' cd sitting right next to my Irix 6.5 machine... and I know it's YEARS old

      Oh, are we having a contest for who can name the earliest auto-parallelizing C compiler? If so, I nominate the vc compiler on the Convex computers. The Convex C-1 was released in 1985 and I believe had a vectorizing compiler from the start, which would make sense since it had a single, big-ass vector processor (one instruction, crap loads of operands -- can't remember how many, but it was something like 64 separate values being added to another 64 separate values in one single instruction).

      I personally remember watching somebody compile something with it. It was really neat to watch -- required no special pragmas or anything, just plain old regular C code, and it would produce an annotated copy of your file telling you which lines were fully vectorized, partly vectorized, etc. You could, of course, tweak the code to make it easier for the compiler to vectorize it, but even when you did, it was still plain old C code.

  5. Re:Hey! Let's reinvent OpenMP! by grub · · Score: 2, Interesting

    Our SGI compilers at work come with an -apo (automatic parallization optimization) command line option. That one option cost us a pretty penny. It's nice to see other people getting in on the action.

    Snippet from the manpage, highlighting is mine:

    -apo, -apokeep, -apolist
    For -n32 and -64, it invokes the Auto-Parallelizing Option
    (APO), which automatically converts sequential code into
    parallel code by inserting parallel directives where it is
    safe and beneficial to do so. Specifying -apo also sets the
    -mp option. Both -apokeep and -apolist produce a listing
    file, file.list. Specifying -apokeep retains file.anl and
    file.m, which can be used by the parallel analyzer, ProDev
    ProMP (see the EXAMPLES section). When the -IPA option is
    specified with -apokeep, the default settings for IPA
    suboptions are used with the exception of -IPA:inline, which
    is set to OFF.
    APO is invoked only if you are licensed for it. For licensing
    information, see your sales representative.

    For more information on APO, its directives, and command-line
    options, see MIPSpro C and C++ Pragmas.

    When specifying the -o32 option on the cc command line, -apo
    invokes the IRIS Power C analyzer (PCA). See the -pca option
    description.
    --
    Trolling is a art,
  6. Re:Interesting, but.. by DigitAl56K · · Score: 3, Interesting

    If you read my post, this is exactly what I suggested. The actual point was that it requires more than simply putting "beginning and end tags" on the code, e.g. it is not automatic.

    I would also ask this of CodePlay: If your compiler is automatic, why do we need to add beginning and end tags? :)

  7. Re:Prefer OpenMP by jd · · Score: 4, Interesting
    Personally, I would agree with you. I have to say I am not fond of OpenMP - I grew up on Occam, and these days Occam-Pi blows anything done in C out of the water. (You can write threads which can auto-migrate over a cluster, for example. Even OpenMOSIX won't work at a finer granularity than entire processes, and most compile-time parallelism is wholly static after the initial execution.)

    On the other hand, OpenMP is a far more solid, robust, established, reputable, reliable solution than Codeplay. The patent in Codeplay is also bothersome - there aren't many ways to produce an auto-parallelizing compiler and they've mostly been done. This means the patent either violates prior art (most likely), or is such "black magic" that no other compiler-writing company could expect to reproduce the results and would be buying the technology anyway. It also means they can't ship to Europe, because Europe doesn't allow software patents and has a reputation of reverse-engineering such code (think "ARC-4") or just pirating/using it anyway (think: pretty good privacy version 2, international version, which had patented code in it)

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  8. Wow - Deterministic Concurrency by Anonymous Coward · · Score: 1, Interesting

    Deterministic concurrency is a great aid for debugging - no more race conditions, no more heisenbugs, no more visibly different program behaviour on 1 core, 2-core, hyper-threading, Quad Core, 8 Core, and whatever the Intel and AMD road maps bring out in the future. Looks good for the sanity of all those programmers who have ever had problems manifest only on one machine after testing!

    This Sieve programming seems also to make it easier to target the PS3, which has gotten a bad rap as being notoriously difficult to program well. Who wants to break programs into tiny chunks that DMA work in and results out, instead of letting some automated system translate a higher level program into that low level programming model? Its about time that getting decent returns on parallelisation was easy. Its also time for the low level OS threading APIs (Posix, Win32) to be forgotten and buried. No more locking, data races, dead locks, and general programming complexity in order to get any speed up out of multi-core systems.

    I also like the idea of buying a Physics processor unit (PPU) and having an automatic speed boost in my programs.

  9. Re:Interesting, but.. by SSCGWLB · · Score: 2, Interesting

    You have a good point; both matrix multiply and ray tracing are embarrassingly parallel problems. They lend themselves to this type of optimization.

    Consider a two NxN matrices, A and B, multiplied together to make a matrix C. Each element of C (Cij), is the sum of Ai[0..N] and Bj[0..N]. This is an almost trivial parallelization problem, commonly one of the first coding exercise learned in a parallel processing class.

    IMHO, this is interesting but has a long way to go before its useful for anything but a narrow set of problem.