Why Scientists Are Still Using FORTRAN in 2014
New submitter InfoJunkie777 (1435969) writes "When you go to any place where 'cutting edge' scientific research is going on, strangely the computer language of choice is FORTRAN, the first computer language commonly used, invented in the 1950s. Meaning FORmula TRANslation, no language since has been able to match its speed. But three new contenders are explored here. Your thoughts?"
This. I have many friends in the physics dept and the reason they're doing Fortran at all is that they're basing their own stuff off of existing Fortran stuff.
What amused me about the article was actually the Fortran versions they spoke about. F95? F03? F08? Let's be real: just about every Fortran code I've heard of is still limited to F77 (with some F90 if you're lucky). It just won't work on later versions, and it's deemed not worth porting over, so the entire codebase is stuck on almost 40 years old code.
Huge libraries of FORTRAN code have been formally proven. New FORTRAN code can be formally proven. Due the limitations of the language, it is possible to put the code through formal processes to prove the code is correct. In addition, again, as a benefit of those limitations, it is very easy to auto-parallelize FORTRAN code.
"To those who are overly cautious, everything is impossible. "
APL-style languages should be even more optimizable, since they use higher-order array operators that make the control flow and data flow highly explicit without the need to recover information from loopy code using auto-vectorizers, and easily yield parallel code. By this logic, in our era of cheap vector/GPU hardware, APL-family languages should be even more popular than Fortran!
Ezekiel 23:20
Well, we live in a somewhat different world today, given that suitable HW for that is virtually everywhere. But just to be clear, I'm not suggesting anyone should adopt APL's "syntax". It's more about the array language design principles. Syntax-wise, I'd personally like something along the lines of Nile, with math operators where suitable, and with some type inference and general "in-language intelligence" thrown into the mix to make it concise. I realize that depriving people of their beloved imperative loops might seem cruel, but designing the language in a way that would make obvious coding styles easily executed on vector machines seems a bit saner to me than allowing people to write random loops and then either hope that the vectorizer will sort it out (they're still very finicky about their input) or provide people with examples what they should and shouldn't be writing if they want it to run fast.
Ezekiel 23:20
Also "legacy training". Student learns from prof. Student becomes prof. Cycle repeats.
Not really - even when I was a student we ditched F77 whenever we possibly could and used C or C++. The issue is more legacy programmers. Often the person in charge of a project is a older person who knows FORTRAN and does not want to spend the time to learn a new language like C (or even C++!). Hence they fall back into something more comfortable.
However by now even this is not the case. The software in particle physics is almost exclusively C++ and/or Python. The only things that I am aware of which are still FORTRAN are some Monte-Carlo event generators which are written by theorists. My guess is that as experimentalists even older colleagues have to learn C++ and Python to use and program modern hardware. Theorists can get by using any language they want and so are slower to change. Certainly it has probably been at least 15 years since I wrote any FORTRAN myself and even then what I wrote was the code needed to test the F77 interface to a rapid C I/O framework for events which was ~1-200 times faster than the F77 code it replaced.
Wow, faster AND more accurate. They must use some mystical floating-point instructions that only Fortran compiler writers know about.
On PPC implementations, head-tail floating point is typically used for "long double"; this leads to inaccuracies in calculations. 80 bit Intel floating point is also inaccurate. So are SSE "vector" instructions, since denormals, NaNs, INFs, and -0 are always suspect unless you compiler emits an extra instruction in order to trigger the "next instruction after" signalling of the condition, and for NaNs, you are still somewhat suspect there.
If it isn't IEEE-754 compliant, you pretty much can't trust it. FORTRAN goes way the heck out of its way, including issuing additional instructions and introducing pipeline stalls, in order to force IEE-754 compliance.
Pretty much this accuracy only matters if you are doing Science(tm); if you are doing graphics, you are generally willing to eat the occasional FP induced artifact, because what you typically care about is the frame rate in your game, rather than being 100% accurate.
So, in closing, they're not using "some mystical floating-point instructions", they are just using accurate floating point, rather than approximate floating point.
Isn't the main performance benefit that Fortran has always claimed over C/C++ the fact that an array is guaranteed to only be used from one thread at a time, and thus you don't have to re-read from memory to registers each time you want to do something with the data in the array? A capability that was formally added to C in C99 (and pretty much universally informally added to C++) with the restrict keyword?
Correct me if I'm wrong here, as I'm not a Fortran programmer.
By a scallop's forelocks!