Domain: pgroup.com
Stories and comments across the archive that link to pgroup.com.
Comments · 27
-
Re:For my next trick
I am not sure if CUDA programming can only be done in C++, but I think that if one absolutely needs CUDA, then he is already doing some low-level stuff.
CUDA Fortran is where it's at for hard-core number-crunching. See e.g. http://www.pgroup.com/resources/cudafortran.htm
-
Re:Completely different contract/machine/goals
That's similar to what PGI is doing. And you know what? It's not that simple. You seldom achieve competitive e performance with this annotation type parallelization, simply because the codes were written with different architectures in mind.
This is also the reason why the original design did emphasize single thread performance so much. The alternative to having POWER7 cores running at 5 GHz would have been to buy a BlueGene/Q with much more, but slower cores.They didn't go into that avenue because they knew that their codes wouldn't scale to the number of cores well.
None of the supercomputer codes I know uses such a type of parallelization or accelerator offloading. And the reason for that is not that folks enjoy doing work that a tool could handle for them, but because the tools don't work as well as advertised.
-
Re:Will it support Fortran?
PGI makes a CUDA Fortran compiler, but with GPUs, it's not as simple as just recompiling, the code has to be rewritten to take advantage of the accelerator and it's unique architecture.
-
Re:I especially like..
GCC and PGI have support for AMD processors. AMD often used these compilers for benchmark result submissions in the past. But the Intel C++ compiler is often faster at doing benchmark results since it has more advanced static optimization support.
-
These are 32-bit kernel benchmarks
Phoronix stated in the second paragraph: "For our testing we had used the final Intel 32-bit releases of the four most recent Ubuntu releases except for Ubuntu 8.10 "Intrepid Ibex" where we used the Intrepid release candidate."
This benchmark would be more valuable if it compared how the ubuntu repository fared when compiled with the different available compilers:
-Intel compiler
http://www.ubuntugeek.com/howto-install-intel-c-compiler-10-on-ubuntu-feisty-fawn.html
-GNU compiler
-SunStudio compiler
http://developers.sun.com/sunstudio/downloads/index.jsp
-Portland group has unifying binary for both intel 64-bit and amd 64-bit
http://www.pgroup.com/about/why_pgi.htm
-Pathscale compiler
http://www.pathscale.com/node/189It would also be nice to discuss benchmarks for:
-AMD 32-bit Kernel
-AMD 64-bit Kernel
-Intel 64-bit Kernel -
#5 Is Wrong
5) AMD and Intel continue to beat the crap out of each other with customers gaining but wondering why there is no software that supports those new 8-way processors, as both compilers and third-party developers fail to keep up.
The compilers exist, in the sense that outfits like PathScale and PGI already have compilers that support OpenMP and some degree of automatic parallelization. They need a lot of work to scale to larger numbers of cores but the primary roadblock here is integration with IDEs and moving these technologies into mainstream computing. If these companies and Microsoft figure out how to make these compilers pervasive with Visual C++, etc. things will change quite dramatically. I don't think this will happen in 2007, though. What will happen is that compiler vendors make significant strides improving access to parallel programming models, pareticualrly with support for Co-Array Fortran and UPC.
-
Re:Why? A Better Question....
Why doesn't AMD release their *own* compiler?
AMD's Compiler would be here:
http://www.pgroup.com/
AMD worked closely with PGI while prepping the K8 core for launch to have a 64 bit compiler capable of the same auto-vectorization the icc can do.
Also, Intel does push icc as the best compiler for Windows/x86 linux (which it quite possibly is, this story aside). So they are screwing the people who buy the compiler and later find out that it intentionally sucks for 15% of the market. And AMD then gets hosed by looking appearing slower than the P4 on apps built by Intel's compiler. -
Complexity: It's not the code, it's the CPUsCompilers often cannot optimize even simple code for a given architecture, because the architecture itself and its rules for execution are complex. For instance, depending on what tricks you use, running the Stream benchmark on an Opteron can run at 1.5GB/s assuming correct code and alignment, but no compiler flags, 2.2GB/s with the "obvious" flags on gcc (-O3 -m64), and over 3GB/s with the Portland Group's pgcc and using multiple arcane flags.
Even a simple sequential read loop that exceeds the L1 cache can benefit greatly from the appropriate cache hints in the assembly (prefetchnta and its variants).
Toss in a second processor and a NUMA architecture, and everything gets even more fun.
For examples of the hacks you can do on Opterons that vastly improve simple C code speed, take a look at the stream.c source and see AMD's technical pubs no. 25112, 24592, and 32035.
--Pat / zippy@cs.brandeis.edu
-
Re:C/C++ vs. Fortran
Who said there are only 2 compilers? He only tested 2, that's all. Here's a bit of a list - and notice that some of these are targeted specifically to scientific computing:
1. GCC
2. Intel compiler (Intel only)
3. Comeau
4. PathScale (Opteron only)
5. Portland Group (PGI)
6. Borland -
If you must have optimized support for AMD...
Portland Compiler Group has a compiler that's fairly decent and optimizes for both AMD64 and Intel 32-bit (although they're pushing the AMD64/clustering angle right now).
It's nice, but much less free. 15-day free trial. Prices are not bad, but it's still a little steep, even for an academic license.
This is not a plug... just in case anyone was wondering if there was something besides ICC you could try in to benchmark GCC on something that ICC can't cope with. -
Re:Answer: CompilersWhere is the AMD's answer to Intel's compiler?
On the Portland Group's website. If you have the money, they're darn good compilers. Microway sells them as their preferred C/C++ suite, which says something... They support AMD64 too!
:-)The only downside, for some, is that they're Linux-only.
-
This greatly surprises meAs an employee of an atmospheric modelling group I am very surprised to hear this. Our atmospheric modelling program, the Regional Atmospheric Modelling System, is not I/O bound in the slightest and is instead very much CPU bound. We currently use 100bT for the interconnect on our cluster, and have tried moving to Gigabit with negligable performance gains.
The main area in which we saw benefit was switching from the Portland Group Fortran Compiler to the Intel Fortran Compiler, which cut the timestep (simulation time/real time) nearly in half.
Every cluster in the department is assembled from commodity x86 components. Groups here have been moving from proprietary Unix architectures to Linux/x86 systems and clusters. Our group started out on RS/6000s, then moved to SPARC, and is now moving to x86. In terms of price/performance there really is no comparison.
As for TCO, the lifetimes of clusters here are relatively short, one or two years at the most. Thus a high initial outlay cannot be set by lower cost of operation.
-
Re:Underwhelmed
Maybe this is a feature of newer releases of gcc, but I've never heard of -j doing auto SMP. There is a -j option for parallel makes with gnu make, but this is only for the compilation and not runtime.
The portland group compiler and the intel compiler. Do support some auto-parallalization via openmp and threads. -
Portland Group Compiler....
I know it costs a little money, but i would be very intrested to see the PGI compiler set tested up there as well.. I've seen on alot of CFD code have 10x speedups over g77 with pgf77. I do know that PGI has a 15 day trial license for their compiler, that should be long enough for a test run of the almabench to run.
-
Still using Fortran
I worked as a SysAdmin and programmer at the University of Connectuicut's Optical Fiber Research Manufacturing Labratory for two years. Our graduate students, some of which were programmers, wrote their numerical models exclusively in F77. Our reasons were:
* The base model had been written in F77 and the majority of relevant literature was also written for F77
* Easily understood by other researchers; increased chances of getting published
* Trivial to port and run on a Cray, Sun Workstation, or a Linux cluster [which we had].
* Variety of parallel programming packages available: HPF, MPI, PVM
* The professor said it was The Way To Do It. ;)
I personally spent almost a year writing F77 code with PVM. While F77 had some unpleasant limitations which have already been covered, I was glad to have the experience.
We used The Portland Group's compilers exclusively, and my benchmarks against g77 showed significant preformance gain.
As part of my continuing work with the lab I am developing a parallel version of an extremely long [5 days on a dual AMD 1900+ !] and CPU-intensive algorithm, using MPI and F77. I have no doubt that F77 and F90 will be around and it use for a long time.
Joshua Thomas
formerly University of Connecicut
email: jthomas at poweronemedia dot com -
Re:Don't use Fortran 90.I can't believe the amount of crap being posted here, and this is fairly typical.
a) lack of dynamic memory allocation --- plain shit, nothing you can do about
You obviously don't know a thing about f90. Dynamic memory allocation is the first thing everyone converting an f77 code uses. Yes f77 is dated. Go pick up the f90 book by Cooper-Redwine and read the first chapter. Really, f90 is fairly clean.b) everything is passed by reference - no recursion whatsoever
Inconvenient, but it makes for much easier optimization. For recursion, there are tricks in f90 now that allow you to overcome most of the obstacles.The compilers are unbeliveably crash and bad. And I've tried both Sun and SGI f95 compilers... Sucessfully segfaulted them both within a few days of using them... Funnily enough NAG f95 compiler for linux seems to be quite stable.
I agree that Sun's f90 compilers are crap (although there f77 compilers are excellent), but SGI's is quite good. I use every advanced feature of F90 possible (derived types, pointers, allocatable arrays, operator overloading, etc.), and SGI's works quite well.c) Surprisingly hard to interoperate with f77 or C, becaused modules get funnly pre/suffixes all over the place...
I've never had problems with f77 but I agree with about C. A bigger problem with f90 is that the array structure is compiler dependent. This was done to allow the compiler writers more flexibility in optimizing their code, especially for parallelization issues, but given the way high performance computing is moving (in the U.S. anyway), it is a pain.If you want to use f90 on Linux, I highly recommend the Lahey-Fujitsu compiler. This produces nice fast code with good error checking. They seem to focus more on Windows than Linux which I dislike, but it is still a solid product.
Also quite good is Portrand GroupHere is some recommended links from them that all of the "Fortran sucks" crowd should read: prentice
In the U.S., the HPC community is clearly moving towards using C/C++ especially for the libraries (such as POOMA, Blitz,... that the parent poster mentioned). I've seen codes that have been POOMAized and they've run much slower than the original f90 versions. The POOMA guys talk about the fact that they will get their libraries further optimized, but it still remains to be seen. I saw a talk by the BLITZ people on how they are TRYING to get C++ as fast as C, which makes me think they should just use f90. My big complaint with these library writers is that they typically work with relatively simple problems -e.g., Poisson's equation- and then think that it will work for everything. I have yet to see a fluid code that uses these libraries and really fly.
As scientific problems become harder, the associated numerical algorithms become harder, and the codes more sophisticated and flexible. So agree that f77 is inadequate for any modern, real code, but that is hardly news. While the U.S. HPC community seems to be focussing on trying to get C/C++ up to snuff, it seems that Europe and Japan are pushing f90 or high performance fortran. I personally think we (the U.S.) are heading down the wrong path.
The basic reason why I use F90 is:
I'm a physicist, not a computer scientist. Yes, I agree that you can have fast code with C, but it is much harder. -
Re:The end of gcc 'cause intel's compiler is faste
I completely agree. Other companies have produced optimized compilers which have had better performance that gcc, such as The Portland Group and Kai (which is now part of intel... go figure) and they have not shut down gcc. Gcc is a good, free compiler. The rest may be faster, but not worth the price for most of us.
-
Re:F95 compiler
We've used the
Portland Group
Fortran & C compilers for a while. They are quite good; they have F90 (F95 adds only the forall, I believe, and I think that's supported too), HPF, and a lot of traditional extensions (VAX, Cray, SGI, etc.) Tech support is very quick and knowledgeable. -
Re:Doesn't Linux already have multiple C++ compile
Right now the only real compiler on Linux is GNU C
Actually that isn't true. There are already a couple of (albiet lesser known) commercial compilers for Linux:
KAI C++ -- Commercial C++ compiler for Linux.
Portland Group -- Commercial C, C++ and Fortran compilers for Linux.
-
Re:compiler?
Given, a lot of work has been done on gcc, but Borland's compiler is faster, produces smaller code, and produces faster code. (I have seen benchmarks in windows that prove this, but cannot remember where they are right now.)
With Borland's new compiler, gcc will have some more competition, and I *hope* that means more people working on it, or a new wave of development going into it.
I love Borland's IDEs, and if this one is like Borland 5 I'll probably buy it just for that (I'm currently using VMware/NT4/Borland 5 for my C++ editing, then compiling with gcc on Linux). I also used VCL on C++Builder3 and it was _much_ nicer to use than MFC (IMHO). But Borlands C++ compiler is a bit buggy. Good overall, but there are a few problems. Hopefully those will be fixed in the new version, however.
BTW, there are other compilers available. KAI C++ is a very nice compiler (no IDE or anything, just command line), runs on Linux, NT, HP-UX, Solaris, Tru64, AIX, Cray Unicos, and Hitachi machines. They also make Fortran compilers, IIRC. The Porland Group also makes F77, F90, C, and C++ compilers for Win32, Linux/x86 and Solaris/x86. I'm a bit suspicious of their ISO C++ compliance (they say they're compliant with cfront 2 and 3, which are ancient), but it's hard to say. Both of these have limited-time trials available (I've tried KCC and it's great, Portland's looks interesting anyway (automatic threading for up to 4 CPUs, automatic SIMD use on P-III, etc)) -
Alpha!
I would say, in order of preference:
1. Alpha
2. Ultra SPARC II
3. Athlon or G4 PPC
I leave the PIII completely out of the list because without SSE its a piece of shit. And SSE is probably not much use in dedicated mathematical computations (nor are MMX and 3DNow, but the Athlons FP unit beats the PIII even without 3DNow help)
Also, to really get good mathematical performance, you will need really efficient code. ie. Assembler or FORTRAN compiled with a well optimized commercial compiler like portland
-
Re:I'd like to see the US govt do the same
Is anyone out there actively lobbying the government to officially endorse OSS solutions rather than proprietary software? It seems to me as a taxpayer that I would like to see the vast number of government projects out there actively evaluate Linux as well as Solaris and NT as platforms.
Alot of us are trying, believe me.   Most of us have OSS snuck in the background but contributing none the less...   The more "public" uses are seen at NASA with their Beowulf cluster and at NIST.
There's a FOSE conference coming up next month and one of the seminar tracks includes a session on Linux.   However I wish that more on OSS was scheduled to be discussed at this.   The timing of FOSE is really good considering all that's going on right now, but seems no one in the D.C./MD/VA area appeared to push it for this year's conference...
-
Re:Beowulf != Supercomputer. So?>> P.S. : "no real compiler support" - what are you talking about?
> Ideally, you have a compiler that takes care of using all the nodes and distributing the code.
> If you have to hand-code all that, it just takes too long and is error-prone (debugging
> distributed code is a really ugly task). Something like High Performance Fortran.Are you aware that there are HPF compilers for Linux being used in Beowulf clusters?
Check out:
There are other commercial products, plus some educational type compilers.
- Darren
-- -
Ha Ha
Actually there are a number of F90 compilers under Linux. We've been using the Portland Group compiler for more than a year. There are several others. See the Linux Fortran Information Page for all the details.
-
fortran and Linux
I have to admit that I am biased (in large part due to my employer). I would not consider a dual PII generally a good multiprocessor for numerically intensive codes. This has nothing to do with Intel or Linux, or g77/pgf77/absoft, etc.
What I have found on my codes is that small (actually tiny) problems run well on pentia. But reasonable research sized problems cause it to huff and puff. Machines like the alpha or the R10k (and R12k) kick serious butt on the larger problem sizes. What is just insanely cool is to watch your code (efficiently) use all 32 processors, and get something like a 28-30x speedup.
But, as I said, I am biased.
Back to fortran. Jeff Templon has an excellent page on Linux and Fortran. Better is the big fortran link page. This is really a nice resource and is a nice intro to the general Fortran Market setup by Walt Brainerd. I strongly advise visiting this site if you need to think Fortran.
Ok, now some thoughts. Craig Burley and crew have done a positively bang up job on g77. It is IMO a useful productive research tool... with a caveat or two.
First, it really is just a front end to the gcc back end, so there are many... gcc-isms... floating about.
Second, while optimization is OK, it is generally tied to the gcc optimization, which has traditionally not been very good. The egcs project has had a much better track record on getting real optimization into the compiler. Folks, if your runs can take years, 5% DOES matter. Optimization on pentia is not just -O, you need things like
-O3 -malign-double -malign-functions=2 -funroll-loops -ffast-math
among others for decent performance.
Third, and most important for me, it (nor egcs) knows nothing whatsoever about multiprocessing.
In short, g77 and egcs in general are awesome tools. But unless you work on small problems, they are not suitable. You will need some better tools, and that involves passing over some money in this case.
I like the Portland Group tools, though the KAI tools are effectively identical to what you use on big supers like Origins. Unfortunately, I do not think KAI supports Linux any longer. Maybe we can all write them a nice letter on how they could drop support for some underused platform for computation (some come to mind here :-) ) in favor of Linux. Market size and all that.
As the author of the referant article wrote, most fortran users want all the speed they can get, so you need to look at what your code spends the most time doing, and figure out if it is doing it the right and most efficient way, or if your system is correctly designed for speed, or if you are hitting one area of your system really hard, and thus causing a bottleneck. In short, if you need to design for speed, start out with a workstation design, and not a PC design. You likely will need massive memory and IO bandwidth to complement an insanely fast CPU. Putting an Alpha into a PC architecture should be considered a capital crime. It makes much more sense to put it into something like a DS20, a T3E or some other design (I can fantasize about an Alpha in an Octane or an Origin, that would be a complete screamer... a memory and IO bus capable of feeding the processor at its full speed... shudder).
The language and its implementation are important, but so is the fundamental system design. You need to avoid bottlenecks everywhere.
Joe -
Try this...PGCC is a commercial C++ compiler that supports precompiled headers. They also have an NT version as well as a Linux (and other UN*Xes) version which could make your cross platform issues easier to deal with.
Remember, gcc != Linux. Please consider not insulting your target audience next time you ask for help.
-
SMP & C++
Portland Group sells one. However, C++ (and C) usually result in much slower code than FORTRAN 77 or Fortran 90. Besides, Fortran 90 is much nicer to program - for numerical programs.
For this kind of task, I would strongly advice against buying AMD chips - their FPU is very slow compared to Intel's. You would be better off with a PII or Celeron. K7 might improve thinngs, but it isn't released yet.