NASA Runs Competition To Help Make Old Fortran Code Faster (bbc.com)
NASA is seeking help from coders to speed up the software it uses to design experimental aircraft. From a report on BBC: It is running a competition that will share $55,000 between the top two people who can make its FUN3D software run up to 10,000 times faster. The FUN3D code is used to model how air flows around simulated aircraft in a supercomputer. The software was developed in the 1980s and is written in an older computer programming language called Fortran. "This is the ultimate 'geek' dream assignment," said Doug Rohn, head of NASA's transformative aeronautics concepts program that makes heavy use of the FUN3D code. In a statement, Mr Rohn said the software is used on the agency's Pleiades supercomputer to test early designs of futuristic aircraft. The software suite tests them using computational fluid dynamics, which make heavy use of complicated mathematical formulae and data structures to see how well the designs work.
If this was written in COBOL, the replacement code would be in C#.
But a bit of googling shows that there's still more than enough justification to call it the best programming language for physics simulations.
So... there will be Fortran programmers out there. I'd suspect, though, given that it's maintained a niche in high-end physics simulation, that anyone who would program in Fortran at the level required here currently has a job doing just that, and won't have time for a major side project with an unknown probability of paying off.
VS
Given the popularity of Fortran these days amongst 'geeks' (whatever they mean by that), this challenge is essentially limited to people already working on it.
I understand, why BBC may want to explain, what FORTRAN is, but for Slashdot to spell it out reveals clumsy copy-pasting — and lousy editing.
What's with the "up to"? If I make it only twice faster, will I get anything? What if I make it 20,000 times faster — will my entry be disqualified for exceeding the specified maximum improvement?
In Soviet Washington the swamp drains you.
Hahahahahahaha.
What compiler is used on Pleiades?
"I don't know, therefore Aliens" Wafflebox1
"If you can make my simulation code run 10,000 faster, I'll give you Fifty Five Thousand dollars!"
1. Run it on better hardware.
2. Re-write the compiler to optimize this code in the best way possible.
3. Re-write the code so it provides optimal input to the compiler.
4. Come up with a new algorithm.
5 and beyond: Left as an exercise to the reader.
Assuming any improvements from #1 and #2 don't "count" for this contest, that leaves you with 3 and 4.
Unless the code is brain-dead there is no way you'll get anywhere close to 10,000 improvement JUST by #3. You MIGHT get it with a combination of #3 and #1 and/or #2 vs. just #1 and #2 alone. That is to say, changes in hardware and compilers may give an opportunity to re-factor the code to get huge improvements vs. un-modified code on new compilers and new hardware.
The big win will be in #4, but only if there are better algorithms out there or someone can come up with one. As with re-factoring the code, changes in hardware and corresponding changes in compilers may turn an algorithm that was inefficient in the 1980s into something that is best-in-class today.
5 and beyond are open-ended and the sky is the limit.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
"This is the ultimate 'geek' dream assignment,"
Actually it sounds like what I call "work."
"First they came for the slanderers and i said nothing."
I entered the contest and made some modifications but it runs well over 10,000 times faster. Disqualified again! >:(
Anons need not reply. Questions end with a question mark.
I never got the "up to 10 000 times faster" part either.
Can it become 10 000 times faster?
On what? Same processor architecture? Running the same algorithms?
Or they need to to be replaced by something faster which do it in some smarter way? Hopefully with as good results? Almost as good?
I'd find it hard to believe that you could do better than 2x performance of half decent fortran code.
love is just extroverted narcissism
The 10,000 times faster is this clearly unattainable goal, but just like the NP-Complete problems used in cryptography, no one knows for sure if P == NP or if there is some clever hack.
If you submit a solution anywhere near a 10,000 speedup, these guys with HKs wearing Ninja suits will come to your house, slap a bag over your head, and you will wake up on this island where you will be assigned a number and where this menacing beach-ball device will prevent you from ever returning home.
Old FORTRAN's default statically allocated arrays are one less indirection to access.
Tiny change compared to what can be lost in 'simple' object access. Which can cause a context switch, worst case, modern crap used by the unaware.
In some C dialects '[' and ']' are just macros that expand into pointer math.
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
Um, towards the bottom of the article it mentions "10 times faster" which is probably achievable and probably what they actually are looking for. They even mention that rewriting an inner loop, shaving a few milliseconds off will give a substantial speed increase.
Way back I worked at Alliant Computer Systems (https://en.wikipedia.org/wiki/Alliant_Computer_Systems) which was a hardware/software system that would automatically unroll loops in Fortran code and run them in parallel on up to 8 processors at a time (and each of the 8 also had vector hardware). It was very fast on the right Fortran code... Hardware support for concurrency control when one iteration of the loop depended on a computation from another iteration. All done in custom hardware which was killed (like many supercomputer companies) by the advent of high performance microprocessors.
Fortran has been a staple of high performance computing applications for decades and will continue as such. As such, there are several off the shelf tools available for profiling, optimization and vectorization, many from the vendor that includes architecture dependency. This task is something that normally would be better accomplished in-house, but also makes a clever and probably lower cost recruiting tool.
I am not a Fortran programmer, but I know how to make some code run faster (on the very same hardware).
One, is a better compiler with machine code optimizations that lead to (average) faster code execution.
Two, automated source code analysis, refactorization and optimization.
Three, hire better programmers, provided that Fortran can allow for higher effectiveness.
Four, move to a language that can take advantage of multithreading. Like Fortran 2008+ or, better, C.
Sent as ripples into the electromagnetic field. No single photon has been harmed in the process.
WTF is up with slashdot?
Not even a link to the actual contest page. Not that anybody in their right mind would touch it, I might signup. Just to look at how bad it was, perhaps to laugh/cry a little. Bet I've seen worse. Bet it doesn't use calculated GOTOs (Fortran 'feature' Goto IntVar, where IntVar contains a line number).
I bet hidden at the bottom is a linear problem (LP) solver that bangs on sub simulations, getting those results to converge across cells faster would be another trick worth trying. Just understanding how the aero problem gets approximated to an LP would likely take longer than the contest duration though.
This whole contest is bait for the retired old bastards that wrote the thing in the first place. They think one of them might have a trick or two, still up his sleeve.
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
Nowadays? When dealing with a complex piece of software about which nobody else will probably care? When trying to optimise very specific parts which were most likely developed by applying well-known theories?
Come on! This is the worst kind of discrimination! This is discrimination against me! LOL
Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
Old FORTRAN's default statically allocated arrays are one less indirection to access.
Also, FORTRAN does not allow pointer aliasing, so memory accesses can be cached in registers. That is a barrier to optimization in C/C++. C has the "restrict" keyword which helps, but it is rarely used and is not supported by old compilers.
$55,000 is much cheaper than hiring developers to do it. It's akin to companies having a contest for a new marketing logo, and the winner get's their work used. The compensation is a line on a resume.
I don't think that the solution to the problem has anything to do with what language is used to solve the problem. From the article:
"The software suite tests them using computational fluid dynamics, which make heavy use of complicated mathematical formulae and data structures to see how well the designs work."
When I was an undergraduate I wrote a FORTRAN program for a genetics professor to calculate the distribution of butterfly markings in a wild butterfly population (I also caught butterflies, tagged them, recording their markings, and released them). The statistical problem was solved by solving partial differential equations by approximation. The equation had two complicated halves. I guessed a number to be the solution and then plugged it into both sides of the equation getting two different answers. I then plugged the difference of the two answers into another equation which gave me a guess as to where the final answer lay. I plugged the difference equation result into the first two equations and came up with a new narrower answer spread. I kept doing this loop until the difference result was less than the number of significant digits I wanted in the answer. This program was a real CPU burner, enough so that Dr. West had to have a serious discussion with the computer center about his computer budget.
Years later I was the manager of a mainframe computer center. I once toured the much larger McDonald Douglas computer center in St Louis to see what I could learn about managing a computer center. One of the large applications at McDonald Douglas was solving partial differential equations by approximation to measure airflow across a wing. Each cross section on a wing had a different answer so the more cross sections they solved the more accurate their answer was. Therefore they divided up the problem across several computers and ran several calculations in parallel.
I think that the problem described by NASA is similar to my population genetics problem or McDonald Douglas' wing air flow problem. I can easily see that by cutting down the number of iterations needed to arrive at a significant answer you can save large amounts of CPU time. From the article:
"Significant improvements could be gained just by simplifying a heavily used sub-routine so it runs a few milliseconds faster, said Nasa on the webpage describing the competition. If the routine is called millions of times during a simulation this could "significantly" trim testing times, it added."
So if I were working on the problem I would look for an answer by speeding up the approximation calculation rather than speeding up the hardware or programming language.
--------------
Steve Stites
A lot of work has already been done on this. My college roommate twentymumble years ago worked on a sizable project to accelerate FORTRAN for parallel processing during his senior year, and lots of effort went into that project back then.
Socialism: a lie told by totalitarians and believed by fools.
You can do all kinds of things worse than 'pointer aliasing' in FORTRAN. Turn off array bounds checking and access neighboring structures etc. But if you do that, it's on you to understand your compiler, memory allocation/layout and hardware.
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
1. They're almost assuredly already using ifort.
2. Hopefully they've run VTune against it, but if they're engineers/physicists, who knows.
3. Yes. Programmers that really know how to write HPC applications are expensive and hard to come by, though. The main reason why Fortran is the language of choice for HPC is because Fortran does not allow aliasing, which enables deeper compiler optimization. However, C (but not C++) can replicate some of this behavior with the restrict keyword.
4. Usually multithreading creates performance problems. Massively parallel programs running on general purpose cores almost always run faster with one process per core (I guess I'm assuming that it's already an MPI program), unless you can eliminate mutexes. This is not true when running on MIC, since the per-process overhead of MPI itself becomes a bottleneck.
The answer is probably rewrite the inner loop for CUDA. Maybe also some kind of adaptive mesh refinement if it's not already doing it.
Don't listen to him, he'll be stone dead in a minute.
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
System Architecture
Manufacturer: SGI
161 racks (11,472 nodes)
7.25 Pflop/s peak cluster
5.95 Pflop/s LINPACK rating (#13 on November 2016 TOP500 list)
175 Tflop/s HPCG rating (#9 on November 2016 HPCG list)
Total CPU cores: 246,048
Total memory: 938 TB
2 racks (64 nodes total) enhanced with NVIDIA graphics processing units (GPUs)
184,320 CUDA cores
0.275 Pflop/s total
1 rack (32 nodes total) enhanced with Intel Xeon Phi co-processors (MICs)
3,840 MIC cores
0.064 Pflop/s total
Operating Environment
Operating system: SUSE® Linux®
Job scheduler: Altair PBS Professional®
Compilers: Intel and GNU C, C++ and Fortran
MPI: SGI MPT
Full specs here.
Sounds like stone soup to me. CUDA cores, Phi coprocessors, SGI interconnects, Linux OS because nothing else in the whole wide world could talk to all of that...
Ick. Keep your prize money.
Weaselmancer
rediculous.
In 1986 the state-of-the-art CPU generation was the i386 (other Motorola and other makers had similarly-powered CPUs available), but it was new and the i286 was much more common. The Pentium 200's were about 79x faster than those (based on the popular NSI). After that, the improvements were mostly in clock rate, with the latest I7's clocking the CPU roughly 20x faster than the Pentium 200.
So that means CPU-bound Fortran code should be executing roughly 1600x faster just by recompiling it on an I7. That's before any parallelization (note that modern Fortran compilers have parallel loop constructs, so that wouldn't be tough to add, if the algorithm allows for it).
So its tempting to think you could get most (perhaps all) of the way to this $55,000 prize with a $400 CPU and a copy of the Intel Fortran compiler.
Because "save state on stack, call, set up stack frame, process, return" is so much more efficient than "jump", amirite?
This whole contest is bait for the retired old bastards that wrote the thing in the first place. They think one of them might have a trick or two, still up his sleeve.
Are you suggesting that the original authors wrote the code up to 10,000 times slower than they were capable of?
"You know that job you did? Do it again, only better this time. You could win a shiny!"
He's getting rather old, but he's a good mouse.
Porting this creaky code to node.js running in Docker should give at least a 10x performance boost.
Which NASA's management still doesn't understand, those perennial idiots with their failure to go metric
Huh? What are you talking about? I'm not sure what they did back in the Apollo days, but they've been all-metric for some time now at least. If you're talking about that failed Mars lander mission, that was the fault of some stupid defense contractor they got some data from. NASA's failure was in not verifying the units of the data, but the contractor was also at fault for not providing any kind of units. Basically, both of them were stupid for providing, and accepting, a bunch of numbers with the units just assumed on each side. But it wasn't because they "failed to go metric" (however their defense-contractor partners they frequently work with did).
Finally, NASA has had a lot of successes: the Mercury and Apollo programs of course, but even in recent years there's been a bunch of successful probes and landers: the Mars landers, the Juno probe, the New Horizons probe, etc. If you're criticizing them for not getting bigger missions done and not having a way of sending humans into space any more, you can blame Congress and the White House for that. No one can get big missions done when the requirements are constantly changing and you're being jerked around with your budget and being told "work on this program now! No wait, cancel that, work on this other program now!"; anyone who's worked in engineering should know that.
-O3.
$$pls.
Sometimes, years after you finish a job (that you never really liked the solution to) a better solution comes to you.
Also: In the 1980s they didn't have a 10,000 node cluster, Multicore CPUs, SSI, Nvidia, ASICs, FPGAs or Gigabytes of RAM. They barely had math coprocessors and blitters.
If at the end of a job, you don't know how to do a it better, that proves you haven't learned anything since you started.
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
Now it's like 'the AIDS' and 'the diabetes'!
Oh no, he's got 'the C++' on his resume, don't touch it.
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
I thought: ah, a fun challenge.
Then I read the last sentence in the BBC article:
>The sensitive nature of the code means the competition is only open to US citizens who are over 18.
They just don't get it.
I'd find it hard to believe that you could do better than 2x performance of half decent fortran code.
They aren't really asking for basic code optimizations (although they will take them), they are hoping that someone might totally rework the code and do stuff like...
* Implement new algorithmic developments in such areas as grid adaptation, higher-order methods and efficient solution techniques for high performance computing hardware.
* Optimize inter-node processing in order to reduce overall model computation time and parallelization efficiency.
When you try to download their software, you are taken to this page which at the bottom contains the follow text:
By accessing and using this computer system, you are consenting to system monitoring, including the monitoring of keystrokes. Unauthorized use of, or access to, this computer system may subject you to disciplinary action and criminal prosecution. [emphasis mine]
A keylogger for using your website? Microsoft hasn't even thought of that yet!
-- Political fascism requires a Fuhrer.
False. For a counterexample, you could have learned how to do something else better.
Contribute to civilization: ari.aynrand.org/donate
Thats all we ever used in Physics and Blender uses it too, to give you an idea.
I'm probably being too sensitive here - not usually a trait that describes me - but is anyone else sick to freaking death of being called a 'coder'? I kind of hate to say this, but 'coder' sounds rather like 'data entry technician' - someone doing a mindless repetive job.
SSI is about massive pipelines. SSI based solutions have produced orders of magnitude performance increases for things like video encoding.
If you could pull this off, it's a reputation cementing job. You wouldn't be 'famous', but in some circles you could pick jobs after.
But like I say, I bet the duration is too short to do anything but tune the super tight code. The compiler won't be that bad, they've been working on Fortran compilers for a long fucking time.
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'