A C++ Library That Brings Legacy Fortran Codes To Supercomputers
gentryx writes "In scientific computing a huge pile of code is still written in Fortran. One reason for this is that codes often evolve over the course of decades and rewriting them from scratch is both risky and costly. While OpenMP and OpenACC are readily available for Fortran, only few tools support authors in porting their codes to MPI clusters, let alone supercomputers. A recent blog post details how LibGeoDecomp (Library for Geometric Decompostition codes), albeit written in C++, can be used to port such codes to state-of-the-art HPC systems. Source code modification is required, but mostly limited to restructuring into a new pattern of subroutines."
...like rice, is not countable. At least not since I learned the word.
how tenaciously researchers cling to their old, cludgy fortran code.
If your application iteratively updates elements or cells depending only on cells within a fixed neighborhood radius, then LibGeoDecomp may be just the tool you've been looking for to cut down execution times from hours and days to minutes.
Gee, that seems like an extremely limited problem space, and doesn't measure up at all to the title of this Slashdot submission. It might really be a useful tool, but when I clicked to this article I expected to read about something much more general purpose, in terms of 'bringing Legacy Fortran to Supercomputers'.
By the way, regarding the use of the word 'codes': I don't think English is the first language of this developer. Cut some slack.
RETURN without GOSUB in line 1050
I think I speak for many geeks when I say....
KHHHAAAAAAAAAAAAAANNNNNN!!!!
That is all.
#fuckbeta #iamslashdot #dicemustdie
High performance Fortran compilers for supercomputers and clusters have been around since before a good portion of the posters here were born. In fact, they often beat compilers for other languages. In certain disciplines like atmospheric science, Fortran is the language for super computing problems, even today. Misleading title.
Seems to me that there are bigger problems when porting Fortran code to C++, like lack of a multidimensional array type in C++, lack of all the other Fortran libraries, and the fact that Fortran code usually still seems to give faster executables than comparable C++ code on numerical applications.
And why would you fuck about with C++ when there is so much missing - just get a book and learn FORTRAN if you need to work in the scientific computing environment.
Modern supercomputers all have perfectly adequate Fortran compilers from a variety of vendors, so I'm not sure what problem this library is trying to solve.
And yes, in HPC-world an application is known as 'a code'. Believe it or not the ones I work with have configuration files known as 'input decks', a terminology dating back to the days of punch card input.
Isn't it rich?
Are we a pair?
Me here at last on the ground,
You in mid-air.
Send in the codes.
Isn't it bliss?
Don't you approve?
One who keeps tearing around,
One who can't move.
Where are the codes?
Send in the codes.
Just when I'd stopped
Opening doors,
Finally knowing
The one that I wanted was yours,
Making my entrance again
With my usual flair,
Sure of my lines,
No one is there.
Don't you love farce?
My fault, I fear.
I thought that you'd want what I want -
Sorry, my dear.
And where are the codes?
Quick, send in the codes.
Don't bother, they're here.
Isn't it rich?
Isn't it queer?
Losing my timing this late
In my career?
And where are the codes?
There ought to be codes.
Well, maybe next year . . .
The IEEE and Los Alamos National Laboratory seem to have a different opinion on this. And even the Oxford dictionary knows the use of codes. But surely those guys can't even spell gigahertz.
Computer simulation made easy -- LibGeoDecomp
I took a look at TFA and followed up by reading the description of LibGeoDecomp:
If your application iteratively updates elements or cells depending only on cells within a fixed neighborhood radius, then LibGeoDecomp may be just the tool you've been looking for to cut down execution times from hours and days to minutes.
Gee, that seems like an extremely limited problem space, and doesn't measure up at all to the title of this Slashdot submission. It might really be a useful tool, but when I clicked to this article I expected to read about something much more general purpose, in terms of 'bringing Legacy Fortran to Supercomputers'.
Correct. We didn't try to come up with a solution for every (Fortran) program in the world. Because that would either take forever or the solution would suck in the end. Instead we tried to build something which is applicable to a certain class of applications which is important to us. So, what's in this class of iterative algorithms which can be limited to neighborhood access only?
It's interesting that almost(!) all computer simulation codes fall in one of the categories above. And supercomputers are chiefly used for simulations.
By the way, regarding the use of the word 'codes': I don't think English is the first language of this developer. Cut some slack.
Thanks :-) You're correct, I'm from Germany. I learned my English in zeh interwebs.
Computer simulation made easy -- LibGeoDecomp
Not only does it cost a LOT to port this stuff and risk errors in doing so, but the cruftier it is the harder (and more expensive and error-prone) it is to port it.
If, instead, you can get the new machines to run the old code, why port it? Decades of Moore's Law made the performance improve by orders of magnitude, and the behavior is otherwise unchanged.
If you have an application where most of the work is done in a library that is largely parallelizable, and with a few tiny tweaks you can plug in a modern multiprocessor-capable library and run it on a cluster, you get another factor of almost as-many-processors-as-I-decide-to-throw-at-it, with small effort and negligible chance of breaking the legacy code.
What a deal!
And it's one less reason to touch the tarbaby of the rest of the working legacy code.
Let the COMPUTER do the work. People are for setting it up - with as little effort as practical - and moving on to something else that is important and can't yet be automated.
Eventually somebody will teach the computers to convert the Fortran to a readable and easily understandable modern language - while both keeping the behavior identical and highlighting likely bugs and opportunities for refactoring. Until then, keeping such applications in the legacy language (unless there's a really good reason to pay to port them) is often the better approach - both for economy and reliability.
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
We're using Boost Multi-array as a multi-dimensional array, so that's not really a problem. And since we call back the original Fortran code users are still free to use their original libraries (some restrictions apply -- not all of these libraries will be able to handle the scale of current supercomputers).
Regarding the speed issue: yeah, that's nonsense today. It all boils down writing C++ in a way that the compiler can understand the code well enough to vectorize it.
Computer simulation made easy -- LibGeoDecomp
...and has done for years.
We write a scientific code for solving quantum mechanics for solids and use both OpenMP and MPI in hybrid. Typically we run it on a few hundred processors across a cluster. A colleague extended our code to run on 260 000 cores sustaining 1.2 petaflops and won a supercomputer prize for this. All in Fortran -- and this is not unusual.
Fortran gets a lot of bad press, but when you have a set of highly complex equations that you have to codify, it's a good friend. The main reason is that (when well written) it's very easy to read. It also has lot's of libraries, it's damn fast, the numerics are great and the parallelism is all worked out. The bad press is largely due to the earlier versions of Fortran (66 and 77), which were limited and clunky.
In short, the MPI parallelism in Fortran90 is mature and used extensively for scientific codes.
You do know that Fortran 2008 has better support for parallelism and concurrency than c++ don't you? Or do you still think everyone is using F77?
I, too, work in HPC computing, and while I found "codes" very jarring to begin with, I've learned to live with it. I am not sure the "code" vs. "codes" issue it is more grammatically problematic than "people" vs. "peoples". A people (countable) is made up of people (uncountable). Similarly "a code" (countable, but nonstandard) is made up of code (uncountable). Personally I would use "a program" or "a library" instead of "a code", though.
Another related issue is whether "data" is countable or not. I'm used to it being uncountable, with there being more or less of it, but not "several data". But scientific journals in my field prefer the countable version "a datum", "several data", which is arguably more historically correct. That, too, took some getting used to.
As long as you need a double C (and forget about C++) standard committee tracker diploma to write a fucking function processing three non-aliased arguments with two-dimensional runtime-sized arrays, a C compiler supporting the most recent standards will produce much worse code than a Fortran compiler compiling some 50-year old Fortran subroutines written by a reasonably good mathematician without much programming experience.
Of course, if we are talking about earlier C standards, not even a standard-following geek has a chance to come close. And if we are talking about C++, we are still waiting for standards that allow efficient code generation for the most basic numerical subroutines.
Try it: write a simplistic Fortran subroutine doing a matrix multiplication for variable-sized multidimensional arrays. Easy to write in a Fortran-IV subset of the language. A moderately talented squirrel could do it between focusing on its nuts.
Then try to coax a C or C++ compiler into generating anything closely efficient. You'll need to revert to recent language standards, and your generated code will still have a much worse product of runtime times incomprehensibility.
That's why extern "Fortran" is still the most important numeric C code ingredient. Because the old libraries were written by genius mathematicians and lousy programmers, and you need genius programmers to get the stuff close to good in C/C++. But double geniuses are hard to come by.
You don't have to rewrite your code entirely, just a little bit.
You only have to restructure the subroutines and change the syntax.
Well, that sounds like rewriting to me. Just because there is a library that might implement the same semantics as FORTRAN's math does not mean that it isn't a rewrite, coming with all the risks for new errors and gotchas that that implies.
I do not fail; I succeed at finding out what does not work.
You never want a compiler to vectorize code. You want interfaces to vectoring hardware that you use to vectorize operations on your data. Just like you don't want compilers to provide multidimensional arrays - memory isn't multidimensional, so there's no natural layout. Instead you implement the arrays you need - even if they look the same the complexity contract and implementation is completely different for statically dimensioned (e.g. template params in C++) vs dynamically dimensioned (can be resized); sparsely populated either an entire row in a dimension, by specific dimension, or by any dimension (for instance only have data in rows 0, 5, 10383484387373, colums -4948484, 0, 338383 - implying sparsely populating only the intersecting cells); where indexes are arbitrary types (say complex), etc. NONE of this has a natural representation. Just like vectored operations in a NUMA architecture require careful data management for maximum throughput - so if you want to apply this to a sparse data set for instance you need to think through how this is to be done rather than just think a compiler can spit it out for you (other than in the most trivial demos that lack real-world requirements).
Your argument seems to focus mainly on how well a compiler can optimize a given code. But writing efficient software takes more. Ever tried to implement an AMR or 3D cache blocking in Fortran? It's a pain. Object orientation gives your programmers a huge boost in efficiency. And if they can use this efficiency to implement algorithms which converge faster, then this will make your code ultimately run faster. Even the last piece, the arithmetic kernel, can be done efficiently in C++ if you adopt modern libraries like Boost SIMD.
Computer simulation made easy -- LibGeoDecomp
Just asking because otherwise you'd had a better view on how intrusive (or not) this restructuring is. To give some numbers: a while ago we ported a simulation (video here) to the library. The simulation model was about 5000 lines of code. Not much, but the code was highly condensed and had been carefully modeled in the course of 3 years. We ended up having to change less than 100 lines to make it work with LibGeoDecomp. That's a far cry from a rewrite.
Computer simulation made easy -- LibGeoDecomp
Someone has some legacy Fortran code and a task of modifying it. There are two approaches: Port it or work on the existing source. Porting it allows for hiring from a very large (but shallow*) pool of programmers familiar with 'current' languages like C++. Working with the existing code means having to locate resources in a much smaller market. The former are cheap. The latter much more expensive. What to do?
*Good programmers can probably pick up a book and teach themselves Fortran pretty easily. But even in the C++ world, these people are more highly paid. There exists a large supply of people who know one language, but not the concepts of programming in general and are not cross trainable. These people work cheap**.
**Putting this class of people on such a project probably signals disaster.
Have gnu, will travel.
Reminds me of a recent experience writing a new system to replace a legacy system.
A key part of one of the homegrown network protocols was a CRC. This sounds OK, but the implementation was wrong. I spent a lot of time trying to reverse-engineer just what the original engineers had implemented. The fact that it was written in ADSP2181 assembler didn't help. It had never been an issue before because both ends of the link used the same wrong implementation, so the errors cancelled out.
I ended up writing an instruction-level simulation of the ADSP2181 processor (only needed a handful of instructions) and executing the original code directly. It works fine. Performance isn't an issue, though moving from a 33 MHz DSP chip to an eight-core 2.8 GHz box certainly helps in that department. :-)
...laura
It is true that there are a lot of legacy Fortran codes in scientific computing, but chances are that they are already parallel, so this tool won't be much of a use for those supporting them. OpenMP and MPI have been in use in Fortran codes for decades. The summary seems to think that legacy Fortran codes need saving and porting. They don't. They are just fine, number crunching faster than you can say DO CONCURRENT.
Having said that, LibGeoDecomp seems quite nice if you find a piece of serial code and you want to make a rough parallel version of it without much hassle. But if you are writing new code, you can parallelize it natively. Nevertheless, I believe that we must focus our resources in developing the current compilers. The Compaq compiler died in the hands of HP and people moved mostly to the intel compiler, since the open-source community was focused in C++ at the time and the gcc was stuck with the obsolete g77. Then g95 came along, that brought us all the cool stuff of Fortran 90/95, while gfortran was being developed. Now gfortran seems decent, but it still has to match the speed of ifort in order to sit at the cool kids' table. Also, we need the features of the latest Fortran standards. I would gladly use a compiler that is feature-complete, even if the executables are relatively slow, because I will be able to switch into the mindset of the Fortran2008 standard and stop doing things the Fortran95-way while coding. They will then have all the time they need to make it more efficient.
In my personal experience...
Most of the physics code in FORTRAN that I've dealt with are things like relativistically invariant P-P and N-P particle collision simulations in order to test models based on the simultaneous solution to 12 or more Feynman-Dyson diagrams. It's what was used to predict the energy range for the W particle, and again for the Higgs Boson, and do it rather reliably.
The most important part of this code was reproducibility of results, so even though we were running Monte Carlo simulations of collisions, and then post-constraining the resulting pair productions by the angles and momentum division between the resulting particles, the random number stream had to be reproducible. So the major constraint here was that for a reproducible random stream of numbers, you had to start with the same algorithm and seed, and the number generation had to occur linearly - i.e. it was impossible to functionally decompose the random number stream to multiple nodes, unless you generated and stored a random number stream sufficient to generate the necessary number of conforming events to get a statistically valid sample size.
So, it was linear there, and it was linear in several of the sets of matrix math as it was run through the diagrams to filter out pair non-conforming pair production events.
So we had about 7 linearity choke-points, one of which could probably be worked around by pre-generating a massive number of PRNG output far in excess of what would be eventually needed, and 6 of which could not.
The "add a bunch of PCs together and call it a supercomputer" approach to HPC only works on highly parallelizable problems, and given that we've had that particular capability for decades, the most interesting unsolved problems these days are not subject to parallel decomposition (at least not without some corresponding breakthroughs in mathematics).
I converted a crap-load of FORTRAN code to C in order to be able to optimize it for Weitek vector processors plugged into Sun hardware, including the entire Berkeley Physics package, since that got us a better vector processor than was on the Cray and CDC hardware at Los Alamos where the code was running previously, but adding a bunch of machines together would not have improved the calculation times.
Frankly, it seems to me that the available HPC hardware being inherently massively parallel has had a profound effect on constraining the problems we try to solve, and that there are huge, unexplored areas that are unexplored for what amounts to the equivalent of someone looking for their contact lens under the streetlight, rather than in the alley where they lost it, "because the light's better".
That's not what I call "limited". More like a rewrite, or at least a salvage operation.
Boost Multi-array doesn't support most modern Fortran array features, so it's useless for porting modern Fortran code to C++: you end up having to rewrite most of the code from scratch.
That just shows that with enough effort, you can create efficient special purpose libraries in C++; of course you can. The question is whether straightforward, boring numerical code compiles into fast executables. If you write it using Boost multi-array, it ends up being much slower (not to mention more tedious) than equivalent Fortran code.
I most certainly do.
There is a natural layout that handles 99% of all numerical needs. Numerical programmers understand it, and so do compilers.
You listed a bunch of exceptional cases that should indeed be handled by libraries. But not to support common cases well because of exceptional cases is stupid.
Care to backup those claims with actual code/numbers? I'm just asking because my FUD alarm just rang. Part of my job is performance engineering. My experience is that if you use C++ correctly, you get code which at least matches Fortran code.
Computer simulation made easy -- LibGeoDecomp
The story makes it sound like there's no support for Fortran in MPI but there totally is: https://computing.llnl.gov/tutorials/mpi/ I recognize that some kind of abstraction layer in a support library on top of MPI might be useful, but let's call it what it is.
All true.
But there is a C++ library, ObjexxFCL, that has Fortran 2008 array semantics and speed. It is used in conjunction with a Fortran to C++ conversion system.
Sorry, I should probably have added a disclaimer that I'm involved in the development of the library as my signature apparently doesn't make it obvious enough: I'm the project lead.
So far we've built about a dozen application with LibGeoDecomp, including porting a dozen large scientific codes towards it. You're right that porting a code usually involves debugging. But that's inevitable when parallelizing a previously sequential code anyway. We don't claim to do magic, we just have some cool tricks up our sleeves. And that's a Good Thing(tm). Because those who claim to cast magic usually disperse just b/s while clever tricks can save you weeks (months even) of work. Here is what you don't have to do if you use LibGeoDecomp:
As said, parallelizing a sequential code will almost always involve some sort of debugging, no matter which tool you use. But the library also brings a couple of facilities to ease that transition: 1. you can first adopt the SerialSimulator which performs no parallelization at all, but allows you to check the data transfer and callbacks. 2. you can then transition to those parallelization which run on a single node only (e.g. the CacheBlockingSimulator or the CudaSimulator) to check that there are no race conditions before (3.) you finally more to large scale systems using e.g. the HiParSimulator (used for full system runs on JUQUEEN, an IBM BG/Q and ATM the fastest European machine) or the HpxSimulator (used for runs on TACC's Intel Xeon Phi equipped Stampede; BTW: it's built on HPX, a parallel runtime to C++). 4. Finally you can piggy-back the TestCell onto your model, which will use checksums to validate the data the library gives back to your code.
Computer simulation made easy -- LibGeoDecomp
So you said Fortran codes we faster than C++ codes and now that's not the point any longer as they really aren't? Great, thanks!
The links you provided show that Fortran has some convenience functions for selecting parts of arrays and applying arithmetics to them. What I didn't see is anything you can't so with Boost Multi-Array and Boost SIMD.
Computer simulation made easy -- LibGeoDecomp
If your code is already parallelized, LibGeoDecomp might not have a terrible lot to offer for you. The blog post was by no means directed against Fortran as a language. Instead it advocates a way for folks to bring their existing, sequential Fortran codes to supercomputers without having to spend months doing the parallelization manually.
Computer simulation made easy -- LibGeoDecomp
You're right: the current compute architectures we see in HPC are geared at data parallel problems of massive size. Clock speeds are stagnating, sometimes even stepping down (e.g. NVIDIA Kepler has its cores actually clocked slower that Fermi with its hot clock for the shaders). Your description sounds like you'd benefit from a singular core which is tuned for single thread performance (e.g. with really big caches, a large out of order execution window) and runs at 5-10 GHz (which might require liquid nitrogen cooling).
But then again this is another niche, probably even smaller than the current HPC market, so it might not be commerially viable to develop products for it.
Computer simulation made easy -- LibGeoDecomp
There is no absence of tools to support continued use of Fortran. There is an excellent free Gnu compiler for Fortran that implements all of Fortran 1995 (as best I can recall) and most of Fortran 2003. This includes the C interop features, which (possibly with a little bit of interface coding) allow Fortran to be called from C and vice versa without compiler-specific hacking. Intel sells a commercial compiler for Windows that plugs into Visual Studio. I think there's even a .NET implementation.