Goto Leads to Faster Code
pdoubleya writes "There's an article over at the NY Times (registration required) about Kazushige Goto, the author of the Goto Basic Linear Algebra Subroutines (BLAS, see the wiki); his BLAS implementation is used by 4 of the current 11 fastest computers in the world. Goto is known for painstaking effort in hand-optimizing his routines; in one case, "when computer scientists at the University at Buffalo added Goto BLAS to their Pentium-based supercomputer, the calculating power of the system jumped from 1.5 trillion to 2 trillion mathematical operations per second out of a theoretical limit of 3 trillion." To quote Jack Dongarra, from the University of Tennessee, "I tell them that if they want the fastest they should still turn to Mr. Goto."" Ever get the feeling someone wrote an article merely for the pun?
DEC had an ultra-optimized math library (calculations on arrays, Fourier transforms, etc.), improved over decades by generations of PhDs. There were different versions of the routines for the different generations of CPUs, for the different cache sizes of a same model, maybe even for various speeds of RAM. Needless to say, the simple fact of linking against that library instead of the standard one improved the speed of math intensive code by a good 10 to 20 percent (those numbers out of my fuzzy memory, but that far from insignificant).
Add to that compilers that were producing top-notch machine language for the target architecture (producing images that ran twice as fast as what gcc gave you at best), CPUs that were spanking the rest of the world as far as floating-point performance was concerned, and you can understand why the scientific community has kept using Alphas for so long.