Auto-Parallelizing Compiler From Codeplay
Max Romantschuk writes "Parallelization of code can be a very tricky thing. We've all heard of the challenges with Cell, and with dual and quad core processors this is becoming an ever more important issue to deal with. The Inquirer writes about a new auto-parallelizing compiler called Sieve from Codeplay: 'What Sieve is is a C++ compiler that will take a section of code and parallelize it for you with a minimum hassle. All you really need to do is take the code you want to run across multiple CPUs and put beginning and end tags on the parts you want to run in parallel.' There is more info on Sieve available on Codeplay's site."
The compiler will put out code for x86, Ageia PhysX and Cell/PS3. There were three tests talked about today, CRC, Julia Ray Tracing and Matrix Multiply. All were run on 8 cores (2S Xeon 5300 CPUs) and showed 739, 789 and 660% speedups respectively.
That's great - but do the algorithms involved here naturally lend themselves to the parallelization techniques the compiler uses? Are there algorithms that are very poor choices for parallelization? For example, can you effectively parallelize a sort? Wouldn't each thread have to avoid exchanging data elements any other thread was working on, and therefore cause massive synchronization issues? A solution might be to divide the data set by the number of threads and then after each set was sorted merge them in order - but that requires more code tweaking than the summary implies. So I wonder how different this is from Open/MT?
I think anybody who is claiming to get decent automatic parallelization out of C/C++ is selling snake oil. Even if a strict reading of the C/C++ standard ends up letting you do something useful, in my experience, real C/C++ programmers make so many assumptions that you can't parallelize their programs without breaking them.
I read the article, the information at the company's web site and even white papers written on the compiler. And although I did see one reference to "Multiple computers across a network (e.g. a "grid")" there was no other mention of it.
When I think of Parallelizing software, after getting over my humors mind thinking of a virus that paralyzes users, what comes to mind is clustering. When I think of clustering the train of thought directs me to Beowulf and MPI or it's predecessor PVM. Though I can find no information that supports the concept of clustering in any manner.
Again I did see a reference to: "Multiple computers across a network (e.g. a "grid")" but according to Wikipedia grid computing is defined "A grid uses the resources of many separate computers connected by a network (usually the Internet) to solve large-scale computation problems. Most use idle time on many thousands of computers throughout the world."
Well, that sounds like the distributed SETI project and the like, which would seem even more ambitious than a compiler that would help write MPI code for Beowulf clusters.
From all the examples this looks like a god compiler for writing code that will run more efficiently on multi-core and multi-processor systems but would not help you in writing parallel code for clustering.
Though, this brings up a concept that many people forget. Even people that I would consider to be rather intelligent on the subject of clustering often forget this. And that is that if you have an 8 computer cluster with each node running on a system with dual-core Intel CPU installed that if you write parallel code for it using MPI you are benefiting from 8 cores in parallel. Many people that write parallel code forget about multi-threading. To benefit from all 16 cores in a cluster I just described the code would have to be written multi-threaded and parallel. One of the main professors involved in a clustering project at my university stated to me that in their test environment they were using 8 dell systems with dual-core Intel CPU so in total they had the power of 16 cores. Since he has his Ph. D. and all I didn't feel the need to correct him and explain that unless his code was both parallel and multi-threaded he was only getting the benefit of 8 cores. I knew he was not multi-threading because they were not even writing the code in MPI rather they were using Python and batching processes to the cluster. From my knowledge Python cannot write multi-threaded applications. Even if it can I know they were not (from looking at their code).
Sometimes it's the simplest things that confuse the brightest of us....
Nick Powers
Encryption: I may not agree with what you say, but I will defend your right to encrypt it...