Domain: openmp.org
Stories and comments across the archive that link to openmp.org.
Comments · 47
-
Re:Clang is Slower
The performance gaps in most tests were small except when it came to OpenMP which Clang does not support.
-
Re:Given how specialized the use case scenarios ar
Somehow, someone has to figure a good standardized way to develop and run massively concurrent software, preferably based on an open standard.
Someone already has. There's even an accelerator subcommittee working to add nice compiler directives so we can do away with the ugly low-level OpenCL and CUDA programming. Fusion will not work without a decent programming model and associated tools.
-
Re:Unix isn't there yet, and probably never will b
A good system would let me switch on a new system/pc and it would automatically share all it's resources (storage, ram, cpu, I/O) with a defined cluster of other systems/PCs.
It's not entirely hopeless though: things like AFS, various distributed shared memory systems with a good API, task and process migration and so on have been around for quite some time.
-
Parallel programming is hard, film at 11.
The
/. summary of TFA is almost exquisitely bad. It's not Window or Linux that's not ready for multicore (as both have supported multi-processor machines for on the order of a decade or more), but rather the userspace applications that aren't ready. The reason is simple: Parallel programming is rather hard, and historically most ISVs have haven't wanted to invest in it because they could rely on the processors getting faster every year or two... but no longer.One area where I disagree with TFA is the claimed paucity of programming models and tools. Virtually every OS out there supports some kind of concurrent programming model, and often more than one depending on what language is used -- pthreads, Win32 threads, Java threads, OpenMP, MPI or Global Arrays on the high end, etc. Most debuggers (even gdb) also support debugging threaded programs, and if those don't have enough heft, there's always Totalview. The problem is that most ISVs have studiously avoided using any of these except when given no other choice.
--t
-
OpenMP makes parallel programming simple
OpenMP makes it simple to write code that takes advantage of multiple threads, and it's supported by gcc 4.x, Visual Studio (just not the Express editions I think), and other compilers.
-
Re:It's not for dumb people
then you want OpenMP bindings. Its simple (though, obviously, not as flexible and all-encompassing as writing everything yourself). eg, if you want to run a for loop using multiple threads you do:
#pragma omp parallel for
for (i=0;i < 10;i++)
{ //do something
}
return 0;pretty easy. There are limitations - no sleep constructs for example, but that's because its designed to process stuff in parallel, not be a theading library.
Its even available by default in MS VC2005 (you do need to enable it with the
/omp compiler flag) even if MS doesn't bother advertising it. -
Re:How about "C++ threads considered harmful"?
Just use OpenMP. Makes optimizing a program for parallel execution very easy.
-
Re:Panic?I'd like to ask a few related questions from a developer's point of view
:
1) Is there a programming language that tries to make programming for multiple cores easier?
2) Is programming for parallel cores the same as parallel programming?
3) Is anybody aware of anything in this direction on the C++ front that does not rely on OS APIs?1) Yes.
2) Maybe.
3) Yes.
-
Re:OpenMP?
My experience is with openMP http://www.openmp.org/ and MPI http://en.wikipedia.org/wiki/Message_Passing_Interface -- these have been used successfully for a number of years for large scale scientific computing. By large scale, I mean in the region of 100s -> 1000s of processors. (They also work well on desktop machines with multiple cores and/or several CPUs). MPI and openMP are reasonably straightforward to use, although getting your head around parallel programming takes a fair bit of effort at first...
-
Parallelization is easy
4 core CPU has no use at homes unless you are content creator. I'm software engineer, I don't think that any of my colleagues I work with knows how to write app that will take advantage of 2 cores; let alone 4.
Well, fortunately, some of this software has already been written just for you and your colleagues. Check out make(1) manual page — look for the -j option...
And no, it is not only for software engineering either. Every time I come back from vacation, I use make to convert my digital pictures from the lossless "raw" format of the camera to the lower resolution JPEG for the web-pages. Having four CPUs makes that process four times faster. Great idea, uhm?..
Your colleagues may be doofusen, but people, who will finally bring us reliable speech-generation and parsing (as an example) will certainly be smart enough to take full advantage of the multiple processors.
Meanwhile, you can schedule a meeting to discuss using OpenMP in your company's software... Compilers (including Visual Studio's and gcc) have been supporting this standard for some years now.
-
Try OpenMP!
This is so not true. Have you ever heard of OpenMP? It lets you trivially parallelize any for loops in your program (i.e. let each CPU handle a point in the loops. It is embarrassingly easy to implement. How would it benefit the user? Umm, how about searching your terabytes of images with face recognition software that will soon become available? How about all your photoshop processing, or for finance nuts, or updating your gigabytes of spreadsheet data? Those all run on for loops, you can bet.
Sorry? You don't use C or C++? Everything you've written is in Java/.NET and takes 2 seconds just to bring up a window on a modern CPU? Then maybe it's time to stop dissing lower level languages.
This is also a huge reason for GNOME to stay written in C like it sensibly is and stop talking to the devil/courting Mono. -
Re:Not only a dupe... but of an old story
An important new feature of GCC 4.2 is its support of OpenMP. In this age of multi-core CPU, this is a must. It's even supported in MS Visual Studio 2005. OpenMP is the way to go IMHO, if you don't know yet what it is, you've got to check it out...
-
Just look around
A lot of efforts are being done to simplify the parallel programming, both on micro scale and component scale. Just look through lambda-the-ultimate archives. Micro scale is mostly invisible to application programmers, and it is mostly done by compilers when they use SSE and friends, one of notable efforts of making it more explicit is OpenMP. On component scale, most of them are based on the message passing concurrency model (after you grok it, it is really really simple). The best effort that I have seen from point of view of usability is E programming language. I tried to clone its core ideas in my pet project AsyncObjects Framework, but usability is less than E's one because of the framework clutter.
-
OpenMP anybody?
OpenMP is an open standard for multi-platform shared-memory parallel programming in C/C++ and Fortran. It is supported by GCC 4.2 and greater.
-
OpenMP can support clusters
Intel's compiler (icc), available for Linux, Windows, and FreeBSD extends OpenMP to clusters.
You can build your OpenMP code and it will run on clusters automatically. Intel's additional pragmas allow you to control, which things you want parallelized over multiple machines vs. multiple CPUs (the former being fairly expensive to setup and keep in sync).
I've also seen messages on gcc's mailing list, that talk about extending gcc's OpenMP implementation (moved from GOMP to mainstream in gcc-4.2) to clusters the same way.
Nothing in OpenMP prevents a particular implementation from offering multi-machine parallelization. Intel's is just the first compiler to get there...
The beauty of it all is that OpenMP is just compiler pragmas — you can always build the same code with them off (or with a non-supporting compiler), and it will still run serially.
-
Prefer OpenMPI have some small amount of experience with OpenMP http://openmp.org/ , which allows one to modify C++ or Fortran code using pragmas to direct the compiler regarding parallelization of the code. And the Codeplay white paper made this sound much like it implements one of the dozen or so OpenMP patterns. I am fairly skeptical that Codeplay has any advantage over OpenMP, but the white paper lists some purported advantages. I will not copy them here and take the fun out of reading them for yourself. I will list OpenMP advantages.
1: OpenMP is supported by Sun, Intel, IBM, $MS(?) etc, and implemented in gcc 4.2.
2: OpenMP has been used successfully for about 10 years now, and is on a 2.5 release of the SPEC.
3. It is Open - the white paper for Codeplay mentions it being protected by patents. (boo hiss)
4. Did I mention that it is supported in gcc 4.2 which I built it on my Powerbook last week and it is very cool?
So maybe Codeplay is a nice system. Maybe they even have users and can offer support. But if you are looking to make your C++ code run multi-threaded with the least amount of effort I've seen ( It is still effort! ) take a look at OpenMP. In my simple tests it was pretty easy to make use of OpenMP, and I am looking forward to trying it on a rather more complicated application.
-
Re:Us coders are delaying the Singularity!
It isnt, if you use the right tools.
http://www.openmp.org/
OpenMP to the rescue ! -
Re:may not want to go back.. yeah right
Maybe if you were using kick-ass parallel grids like the ones I am for my simulations:
http://www.tacc.utexas.edu/services/userguides/lon estar/
http://www.tacc.utexas.edu/services/userguides/cha mpion/
you might change your mind. Plus, multithreading using OpenMP (http://www.openmp.org/ is relatively easy. Message Passing (MPI) is trickier, but much more powerful.
I'm just running a proggie on the champion grid (above) that's using just about all 96 cpus off-and-on for 3 days straight (Floquet time evolution of a 2-boson interacting system). On a single cpu system it would've taken months to make just one run. -
Hmmm
This would seem to be better for processes designed to only use one CPU, but it then prevents me from coding something in, say, OpenMP, in order to fine tune the parallelisation of my code (which would almost certainly work better than the generic optimizations that they would be putting in the CPU). Admittedly 95+% of programs aren't coded to be parallel, but this would still take away an option that would otherwise be there.
Perhaps there could be a documented way to access both CPUs directly? That may solve the problem. -
Re:A summary of the idea here...
It's not that it's can't be done, it's more that it hasn't been done. I'm not aware of any particular proof of why it's impossible, but I am aware of many of the reasons why it's extremely difficult. The most I've seen a compiler do on its own is recognize that a simple loop could be vectorized (Intel's compiler does this). But other than that, if you want parallel code, you need to parallelize it yourself.
Auto-parallelizing compilers have been a holy-grail type problem for a while. On the other hand, there are facilities such as OpenMP which can buy you a lot of power with little programmer cost. Doing complex tasks in OpenMP is complex just like any other programming model, but simple parallelization remains simple. There are many, many parallel languages, but I think OpenMP might become the standard shared-memory programming model because it fits on top of a language (C and Fortran) instead of supplanting it. -
Re:Simple parallelism?
You want OpenMP.
-
Re:Simple parallelism?
Sounds kinda like what OpenMP can do; only I'm not sure if it will adjust to serial execution of the parallel section if not enough processors are available.
-
OpenMP
For multi-core you should certainly look at OpenMP http://www.openmp.org/ before you start pthreading your code yourself.
OpenMP is a set of compiler directives which allow the compiler to handle the messy aspects of thread parallelism so that you don't have to. Adding OpenMP directives to a code is much faster than adding pthreads to it. -
Re:Compilers
Take a look at OpenMP.
It takes a commercial compiler, but its straight forward, an open specification, can be used "automagically", its portable across machines and languages.
It does not work on a clustered system, but only one that has local processors and memory. -
Re:CompilersI remember seeing something very similar to what you describe... A little poking around brings me to this page about implementing image-processing in hardware, (originally seen on robots.net).
They talk about OpenMP, (as The Boojum mentioned) and they use it in a way analogous to what you're describing there... an example: (Damnit... slashcode fuxors up the indenting...)
Listing 4: Implementation of replication sort1 par (element=0; element<SIZE; element++) {
2 seq {
3 par (element2=0; element2<SIZE-1; element2++) {
4 ifselect(element>element2) {
5 if(uList[element] > uList[element2])
6 comp[element][element2] = 1;
7 } else ifselect (element<=element2) {
8 if(uList[element] >= uList[element2+1])
9 comp[element][element2] = 1;
10 }
11 }
13 position[element] = SUM_OF_DIGITS(comp[element]);
14 sList[position[element]]=uList[element];
15 }
16 } -
Re:Compilers
I'd love to see a language (or language extension) cleanly define a way to let me define a code block attributes which could affect how and where it gets executed. The runtime library could then distribute that block as the environment best allows.
Have a look at OpenMP. Granted, it's more for shared-memory systems than clusters, but it works similiarly to what you describe. -
Re:This just in...
The reason being that the C programming language lacks a way to properly (explicitly) note parallellism.
OpenMP pragmas. All the big names are starting to support them. -
Re:well at least he seems to understand the probleCan I use OpenMP? I
void AccumulateLoopCount(int N) {
(I'm not actually an OpenMP programmer, so this syntax might be wrong...)
int accumulator = 0;
#pragma openmp parallel for reduction(+:accumulator)
for (int i = 1; i < N; ++i) {
accumulator += i;
}
return accumulator;
} -
Re:Hyperthreading
There appears to have been an attempt at adding it (see GOMP), starting about 2 years ago, but it appears dead.
Given the multi-core trend at hand, it could really be useful to continue.
And if not OpenMP, then what? A high-level threading language or language extension is going to be necessary, and OpenMP
1) makes a good case for being the best solution to date, and
2) is open. -
Re:Programs...?
I think the focus for most programmers (and game programmers included) is to go multi-threaded since both AMD and Intel are really pushing this solution for speeding up programs.
There is also a movement to make Multi Threading easier with things such as Open MP http://www.openmp.org/ . It looks like the future will be good for SMP applications.
-
Re:Am I Missing Something?
I don't think the optimization you speak of will work.
It actually works already. Get yourself Intel's or Sun's (for Solaris) compiler and see for yourself.
What happens if the code in the loop changes the value of i?
A fairly trivial problem given today's state of the art and science of compiler design. I trust, gcc will have this optimization soon too -- if it does not already.
Also compiler errs on the safe side, but there is a standard called OpenMP, which specifies compiler pragmas, with which you can assure the compiler, that certain things can be parallelized, even when it is not obvious to it.
-
Re:How do I code this thing??
What I think is they'll provide some c++ framework or perhaps some meta language so that programmers define small treatment units with clearly defined treatment, inputs and output streams, and interconnect them without having to write tons of boilerplate code, and with abstractions to be able not to care about the details of memory management and streaming from and toward other treatment units running on other SPEs.
That sounds remarkably like C++ std::valarray, of which my macstl is a SIMD-optimized version. Just write: v0 = sin (v1) + cos (v2) for arrays v0, v1, v2 and the library and compiler handles the rest, chunking the array for SIMD consumption. Now if only I can get a handle on a couple of Cells, then macstl will be able to run seamlessly on Altivec, MMX/SSE and the Cell architecture!
BTW, the OpenMP spec might help here to, it's largely implemented by compilers like IBM XLC++ (hmm...) and Intel ICC, though not yet in gcc.
Cheers,
Glen Low, Pixelglow Software
www.pixelglow.com -
Compiler technology - OpenMP
One question which was not addressed fully in the article was how do you compile/test programs for this thing. The answer is OpenMP. OpenMP is mulithreading API wich can hide parallelization from the user almoste completly. It's embarassingly easy to use - only one line of code is enouth to parallelize a loop. All threads creation/synchronisation remain hidden from user. It's extremly efficient too - I was never able to achime the same level of performance if duing multithreading myself. -
Re:Using Fortran, eh?
Hm. I'm not familiar with supercomputers... does Fortran have some sort of built-in support for being run on them?
Yes, it does. There is a Fortran 95 language variant known as High-Performance Fortran (HPF), specifically targetted at coding for parallel computers. Fortran also sits very well with OpenMP.
Fortran does not need a JIT, since it compiles straight to machine code rather than intermediate bytecode.
-
Re:There is a need for 64-bit home computers.
At issue is the necessity for the compiler to be able to find parallel sections of code. That's what things like OpenMP are for.
-
Re:Intel C++ Compiler 7.1 Rules because of OpenMPIntel's compiler rules on multiple processor systems (including fake-multiprocessor Hyperthreading systems) because of cool compiler-enabled parallel application support via OpenMP.
With Dual CPU systems and Hyperthreading, you can get near 2X advantages on quite a few algorithms.
I haven't kept up with Gnu's OpenMP stuff. I guess this is one such project. http://savannah.nongnu.org/projects/gomp/
-
Re:Performance comparisons
Of course the comparisons were made using gcc 2.95.x (or 2.8 in the second article).
gcc has come a long way in terms of performance since 2.95.x. 3.2.x now compiles code that runs as fast (or faster, depending on the code) as Intel's C/C++ compiler. Right now, the only reason for me to use Intel's compiler is OpenMP.
Now, if anyone has made some benchmarks using gcc 3.2.x, I'd be interested. (In fact, I may have to do some on Monday.)
Tony -
Re:Underwhelmed
Maybe this is a feature of newer releases of gcc, but I've never heard of -j doing auto SMP. There is a -j option for parallel makes with gnu make, but this is only for the compilation and not runtime.
The portland group compiler and the intel compiler. Do support some auto-parallalization via openmp and threads. -
All this talk...
All this talk about SMP machines coming nicely from this chip (and IBM's supposed workstation/linux aspirations) has me wondering something. Has anyone thought about adding openMP support to gcc? I'd give my eye-teeth for that. (Well, at least my wisdom teeth.)
I already know about OmniMP and OdinMP, but I want openmp natively in the compiler. Anybody know more than me?
-
Experience with the Intel Compilers
Note that Intel's compilers are available free for non-commercial work. I've written reviews of the Intel compilers, and my experiences suggest that Intel does, indeed, produce faster code than does gcc -- although not to the extent claimed in the Open Magazine article.
I'm no fan of Intel per se; read my article and Intel's responses for a full story. However, I'm not going to ignore Intel's delivery of high-quality Linux compilers for C++ and Fortran.
I'm using the Intel compilers (C++ and Fortran) for development of multiprocessing application; Intel supports OpenMP and gcc does not. Overall, I'm very happy with the Intel compilers, and recommend them to any serious developer. It never hurts to have more than one compiler, no matter what platform you're working on.
-
This isn't earth-shattering kids...Gridware isn't all that new, and it isn't a reaction to Mosix or SETI@home.
Batch systems have been around a long time in the HPC world. Gridware was orginally developed by GENIAS Software GmbH. GENIAS produced a batch scheduler called Codine, which was a commercial version of DQS. In fact, Sun's Grid Engine FAQ even states that Sun Grid Engine is a new name for CODINE.
Of course, DQS/Codine/Grid isn't the only batch-scheduling/cycle-scavenging game around. Other players are:
- Condor
- openPBS and it's commercial version PBS Pro
- Load Leveler (which IIRC is IBM's commercial implementation derived from Condor)
- LSF which is the product Sun was previously co-marketing until they purchased Gridware (probably because of the high per CPU cost of LSF).
- and lots of others that I've forgotten, many based on the once-common NQS/NQE batch system.
- There are also systems like Legion that represent a sort of ``next step'' computing enviroment.
Many of these predate newcomers like SETI@home and Mosix by serveral years. Most also provide hooks into parallel computing APIs like MPI, PVM, openMP, or something similar.
Batch scheduling and cycle-scavening are old concepts. Having wasted away my years as a graduate student submitting large quantum chem jobs to Crays, it's nice to see lots of groups continuing to squeeze every useful cycle out of existing hardware. Sun's recent annoucements are just the latest update to an old product---not a new idea, and not a Mosix/SETI rip-off.
-
Re:What OS?It runs IBM's latest AIX.
Each OS will only run 16 processors which make up a "node" of shared memory. There are hundreds of these nodes which are completely distributed memory over a high performance network.
The idea is to run a large problem using message passing (usually MPI), across nodes, and multi-threaded (often using OpenMP compiler directives - easier than Pthreads for scientific computing) inside a node. I am a developer on one of the large ASCI codes, and this is the model we are targetting, although currently we usually run using all MPI (no threads) just because the threads performance is not as good.
-
Re:F95 compiler
We've used the Portland Group Fortran & C compilers for a while. They are quite good...
I've had major troubles with their OpenMP>/A> parallel support. And their debugger lies to me. -
Now if the'd...In terms of the quality of machine-code they generate, Fujitsu produces some of the best linux/x86 compilers out there. (See URL http://www.tools.fujitsu.com/) -- and at reasonable prices, too! The only sticking point from my point of view is that they don't support the OpenMP standard for shared-memory parallel programming.
If anyone from Fujitsu is listening: I'll buy your compiler suite in an instant, if you'll give me support for OpenMP!.
I do met and air quality modeling, have dual-processor RH6 desk-side boxes both at home and at the office, and
- need
-
What about Adaptive Communications Environment?
ACE is an opensource C++ framework that implements common concurrent design patterns tested in a variety of platforms using a common source tree, not to mention a Java version. As for multiple languages, that's a little harder because some languages make certain assumptions. Easier to write C++ wrappers around them.
Another future possibility may be OpenMP which allows a sequential and parallel shared memory version to reside in the same codebase using compiler extensions. Although there are specifications for several languages/platforms, I don't think anyone has tested for intervendor compatibility as yet. However, it is still evolving.
The major problem is that once you start wandering outside the most commonly used languages (C,C++,Java,Fortran) into more exotic variants (Amoeba, Occam, Z etc) you will be running across differences in conceptual models (actor, CSP, timed lambda calculus, etc) which is like mixing different mathematical coordinate systems ... ie not recommended unless you really grok the theory and got a firm grasp of what you're trying to do. Coding is complex enough without making life impossible for yourself. Keeping things simple will then become your best friend.
LL -
Are they trying to duplicate SGI?
As Matt Welsh noted, it is not exactly a trivial problem. If you look very closely at the article, the LCC wants to occupy a happy ground between the share-nothing crowd (Microsoft, Tandem) and the share-everything (Oracle). The share nothing pardigm is rather simplistic in its approach and reflects the fact that throwing together a bunch of machines with a cheap interconnect is a comparatively straight-forward re-engineering approach. The share-everything come froms the extension of shared-bus architectures (e.g. Sun Starfire) which enforces a multiple lock strategy. Companies like SGI have thrown million of R&D dollars into the middle-ground which is why their cc-NUMA architecture and cellular IRIX is quite popular. I wish the LCC luck but there is a reason why a successful working solution is expensive as it requires a savvy combination of hardware+software+smart routing (the SGI solution uses a cache directory). You are effectively paying for some very sophisticated know-how as part of every SGI machine.
Given the direction that SGI is heading (Linux for entry-level&apps + IRIX kernel extensions for high-end) I would wonder whether the LCC would produce anything practical in a realistic time-frame. This is not to decry their laudable efforts and I would hope businesses are patient enough to wait for robust and cheap solutions. If nothing else, it will hopefully offer a shardardised set of software extensions (a la OpenMP) and coding practices so that a single source tree can support 1 to n processors.
Who knows, they might be able to come up with a few tricks that the pros have missed.
LL -
Re:Linux and scaling...
Troy Baer wrote
Something to keep in mind about the the Origin 2000 (SGI's 128-256 CPU boxes) is that they're not SMP systems. They're ccNUMA machines, and a lot of the "ccNUMAness" (including cache coherence, I think) is handled largely by the hardware.
The point of ccNUMA is to minimise the cost of porting software from uni-processors. The crux of the matter is that it is non-trivial to adapt programs to run on multiple processors efficiently. The ideal is to have a single source tree, add extensions such as OpenMP, then recompile. Kernels are a different matter as they have to be closer to the hardware. It is still a royal pain to code to the wire and manually manipulate the cache and bus protocols but that is what is needed for maximum performance. Apart from special cases such as national defence codes, the commercial imperative is time-to-market which means a ccNUMA machine can address 95% of the issues at reasonable cost would be preferred.
I wouldn't be surprised if you could boot the MIPS version of Linux on (for instance) an Origin with little or no modification. I don't know how well it would scale, though.
As far as I'm aware (correct me if I'm wrong), the SGI port of Linux has so far concentrated on older systems such as Indys and patches for their VisualWorkstation. I suspect it will take a while (2-5 years?) for them to get to the stage of having Linux+IRIX SMP extensions running on their highly scalable systems. Cellular IRIX is a single system image which is different from the way Linux is designed. Perhaps one conceptual integration approach is to follow how RTLinux works in having a separate real-time kernel embedded within the full Linux system. Also there are other multiprocessor optimisations like processor affinity which might take a while to enter into the kernel. SGI staff may be very enthusiastic and dedicated but there is a lot of work involved which will take time.
In other words, nice PR for SGI but don't hold your breath.
LL