Auto-Parallelizing Compiler From Codeplay
Max Romantschuk writes "Parallelization of code can be a very tricky thing. We've all heard of the challenges with Cell, and with dual and quad core processors this is becoming an ever more important issue to deal with. The Inquirer writes about a new auto-parallelizing compiler called Sieve from Codeplay: 'What Sieve is is a C++ compiler that will take a section of code and parallelize it for you with a minimum hassle. All you really need to do is take the code you want to run across multiple CPUs and put beginning and end tags on the parts you want to run in parallel.' There is more info on Sieve available on Codeplay's site."
Forgive me if I'm wrong (I've not coded parallel things before), but if the code is re-entrant, does this go a long way towards running the code in parallel? Obviously there are other factors involved here, like addressing memory, but this is thought of in re-entrant programming. I'm not sure what the difference is... please enlighten me :-)
For the majority of apps, OpenMP is enough. That is what this looks like - a proprietary OpenMP. It might make it easier than creating and managing your own threads but calling it "auto" parallelizing when you need to mark what to execute in parallel is a bit of a stretch.
For apps that need more, it is probably a big enough requirement that someone knowledgable is already on the coding team. Which isn't to say that a compiler/lang/lib lowering the "experience required" bar wouldn't be welcomed, just that I wish these people would work on solving some new problems instead of re-tackling old ones.
The main purpose of these extensions seems to be finding a way to restrict the noob developer enough that they won't be able to abuse threading like some apps love to do. That is a very good thing in my book! (Think Freenet, where 200-600 threads is normal.)
or example, can you effectively parallelize a sort? Wouldn't each thread have to avoid exchanging data elements any other thread was working on, and therefore cause massive synchronization issues?
Yes you can, take a look a Merge sort (or quick sort, same idea). You split up the large data set into smaller ones, sort those and recombine. That's perfect for parallization -- you just need a mechanism for passing out the orginal elements and then recombining them.
So if you had to sort 1B elements maybe you get 100 computers and give them each 1/100th of the data set. THat's manageable for one computer to sort easily. THen just develop a service that hands you the next element from each machine, and you pull off the lowest one.
I have my 'Mips Pro Auto Parallellizing Option 7.2.1' cd sitting right next to my Irix 6.5 machine... and I know it's YEARS old
"When life gives you lemons, don't make lemonade. Make life take the lemons back!" -- Cave Johnson
Our SGI compilers at work come with an -apo (automatic parallization optimization) command line option. That one option cost us a pretty penny. It's nice to see other people getting in on the action.
Snippet from the manpage, highlighting is mine:
Trolling is a art,
If you read my post, this is exactly what I suggested. The actual point was that it requires more than simply putting "beginning and end tags" on the code, e.g. it is not automatic.
:)
I would also ask this of CodePlay: If your compiler is automatic, why do we need to add beginning and end tags?
On the other hand, OpenMP is a far more solid, robust, established, reputable, reliable solution than Codeplay. The patent in Codeplay is also bothersome - there aren't many ways to produce an auto-parallelizing compiler and they've mostly been done. This means the patent either violates prior art (most likely), or is such "black magic" that no other compiler-writing company could expect to reproduce the results and would be buying the technology anyway. It also means they can't ship to Europe, because Europe doesn't allow software patents and has a reputation of reverse-engineering such code (think "ARC-4") or just pirating/using it anyway (think: pretty good privacy version 2, international version, which had patented code in it)
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Deterministic concurrency is a great aid for debugging - no more race conditions, no more heisenbugs, no more visibly different program behaviour on 1 core, 2-core, hyper-threading, Quad Core, 8 Core, and whatever the Intel and AMD road maps bring out in the future. Looks good for the sanity of all those programmers who have ever had problems manifest only on one machine after testing!
This Sieve programming seems also to make it easier to target the PS3, which has gotten a bad rap as being notoriously difficult to program well. Who wants to break programs into tiny chunks that DMA work in and results out, instead of letting some automated system translate a higher level program into that low level programming model? Its about time that getting decent returns on parallelisation was easy. Its also time for the low level OS threading APIs (Posix, Win32) to be forgotten and buried. No more locking, data races, dead locks, and general programming complexity in order to get any speed up out of multi-core systems.
I also like the idea of buying a Physics processor unit (PPU) and having an automatic speed boost in my programs.
You have a good point; both matrix multiply and ray tracing are embarrassingly parallel problems. They lend themselves to this type of optimization.
Consider a two NxN matrices, A and B, multiplied together to make a matrix C. Each element of C (Cij), is the sum of Ai[0..N] and Bj[0..N]. This is an almost trivial parallelization problem, commonly one of the first coding exercise learned in a parallel processing class.
IMHO, this is interesting but has a long way to go before its useful for anything but a narrow set of problem.