Goto Leads to Faster Code

← Back to Stories (view on slashdot.org)

Posted by Hemos on Monday November 28, 2005 @03:00AM from the we're-punny-this-morning dept.

pdoubleya writes "There's an article over at the NY Times (registration required) about Kazushige Goto, the author of the Goto Basic Linear Algebra Subroutines (BLAS, see the wiki); his BLAS implementation is used by 4 of the current 11 fastest computers in the world. Goto is known for painstaking effort in hand-optimizing his routines; in one case, "when computer scientists at the University at Buffalo added Goto BLAS to their Pentium-based supercomputer, the calculating power of the system jumped from 1.5 trillion to 2 trillion mathematical operations per second out of a theoretical limit of 3 trillion." To quote Jack Dongarra, from the University of Tennessee, "I tell them that if they want the fastest they should still turn to Mr. Goto."" Ever get the feeling someone wrote an article merely for the pun?

13 of 462 comments (clear)

Min score:

Reason:

Sort:

DEC Math Library by boa13 · 2005-11-28 03:13 · Score: 5, Interesting

DEC had an ultra-optimized math library (calculations on arrays, Fourier transforms, etc.), improved over decades by generations of PhDs. There were different versions of the routines for the different generations of CPUs, for the different cache sizes of a same model, maybe even for various speeds of RAM. Needless to say, the simple fact of linking against that library instead of the standard one improved the speed of math intensive code by a good 10 to 20 percent (those numbers out of my fuzzy memory, but that far from insignificant).

Add to that compilers that were producing top-notch machine language for the target architecture (producing images that ran twice as fast as what gcc gave you at best), CPUs that were spanking the rest of the world as far as floating-point performance was concerned, and you can understand why the scientific community has kept using Alphas for so long.
1. Re:DEC Math Library by Fess_Longhair · 2005-11-28 03:36 · Score: 2, Interesting
  
  The library is still alive. Apparently a group of the DEC developers are now at intel, and the the library is called the Math Kernel Library (MKL). http://www.intel.com/cd/software/products/asmo-na/ eng/perflib/219769.htm
Re:No you idiots - it's not about GOTO statements by 91degrees · 2005-11-28 03:25 · Score: 3, Interesting

No, it is about structured programming. At least indirectly through use of the pun. It's more on topic than a lot of the discussion on this site.
Where does the slashdot effect come from? by tehanu · 2005-11-28 03:32 · Score: 4, Interesting

A lot of people complain about people never reading the actual articles before they comment, but it seems worse than that. People don't even bother reading the blurbs.

I wonder where the slashdot effect comes from then?
Re:30%+ Improvement by mangu · 2005-11-28 03:41 · Score: 4, Interesting

to me says more about the previous implementation than it does about Goto's work

Which only goes to show that you haven't considered the implications of optimization in modern processors. A Pentium 4 can operate above 3 GHz. This means that light can travel no more than 10 centimeters in the duration of one clock pulse. With the spacing in the motherboard, this isn't enough for a pulse to go from the CPU to the RAM and come back. Even if the memory could operate at the same rate as the CPU, the computation would still be limited by light speed alone.

Optimization to get the full advantage of a Pentium 4 doing floating point calculations is one of the most difficult tasks one can do in computing. A P4 can do, in one clock pulse, four multiplications and four additions. To get 100% of this speed one needs to have a sophisticated handling of cache memory, among other requirements.
Re:No you idiots - it's not about GOTO statements by Anonymous Coward · 2005-11-28 04:13 · Score: 1, Interesting

The funny thing is . . . it's about extreme optimization of library code for math. Presumably, these optimizations make extensive use of assembly code, which, in turn, presumably makes extensive use of the JMP command or its equivalent. Which is, in fact, a GOTO.
Not to mention that code using GOTOs instead of more elegant control structures does, in fact, run faster since the machine doesn't have to worry about maintaining a call stack, etc.
Re:30%+ Improvement by ultranova · 2005-11-28 04:28 · Score: 2, Interesting

I know all my linear algebra software is written around an assembly language core, hand tuned for each new version of a half dozen processors, and designed from the start to minimize TLB misses instead of just naively trying to fit a dataset into L1 or L2 cache. I don't know why those retards at the universities and national labs were ever using anything else!

Propably because, being scientists first and programmers second, they simply don't have the time neccessary to learn the characteristics of the processor to the degree neccessary to even match, let alone overcome, the output generated by the compiler. It could also be that the algorithms are under active development, in which case writing them in assembler doesn't really make sense, since it will increase the time needed to write and test new versions. And if the scientists find that some function is unacceptably slow, and can't figure out a more efficient algorithm, they can always just hire a code monkey to hand-tune it with assembler.

Speaking of compiler optimizations, if simply replacing control structures with goto made the function 30% faster, then either the compiler truly sucks, or the previous implementation was something horrible.

--
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
Re:I failed a coding test because of this guy by thesnarky1 · 2005-11-28 04:36 · Score: 2, Interesting

As few lines as possible?
I'm surprised the teacher didn't get obfuscated codes, could'a been a fun way to practice for the IOCCC!
While it might not be smart, whenever I get assignments like this from my teachers, I always will write exactly to the letter. No one SAID I had to have line breaks at the end of a "line". So it'll be an 80 char row, no extra spacing. I won't go so far as to obfuscate it horribly, just take out the spacing.
I have have teachers give that assignment, and I've given them right back something in as few lines as possible. Most just kind of glare at me and grade it anyways, but one professor just laughed, realized his mistake, and gave me credit for the shortest answer, much to the chagrin of the other students in the class.

--
Want to find other gamers to play board and role playing game
Re:Its better if you have a f***ing clue by samschof · 2005-11-28 04:39 · Score: 2, Interesting

There is still a lot of tweaking going on in scientific computing. I work on large scale parallel computational fluid dynamics codes where performance can be tricky, especially when dealing with parallel iterative solvers. We often have to build and try a variety of preconditioners and simply measure which gives us the best performance. One preconditioner may lower the iteration count substantially, but it is quite expensive to apply; another may require excessive communication. You may also find that what works great on 32 processors deteroriates quickly once you scale to 256 processors. The point is that you rarely know in advance what the optimal strategies are.
Of course, these "tweaks" are related the global numerical scheme. Reordering a loop here and there then running to see if it made a difference is simply not practical as you point out.
Sam
Computed goto by apankrat · 2005-11-28 06:32 · Score: 4, Interesting

Not as helpful as computed goto

Seriously. Computed goto is very useful for low-level
optimizations in things like high-throughput ethernet
drivers and such. It basically eliminates conditional
checks in cases where the condition stays the same
for a particular set of data. So instead of
if (context->condition) foo(context); else bar(context);
one would have
/* one-time initialization */ if (context->condition) context->jmp = &_foo; else context->jmp = &_bar; .. goto *context->jmp; for (;;) { _foo: foo(context); break; _bar: bar(context); break; }
If the second part is executed in a loop, the savings of
not making an IF comparison accumulate fairly quickly.

--
3.243F6A8885A308D313
Re:This certainly is news to me. by masklinn · 2005-11-28 06:46 · Score: 2, Interesting

You wish. Truth is that real programmers use languages where GOTO is part of each instruction

--
"The way we can tell it's C# instead of Haskell is because it's nine lines instead of two." -- wadler
GOTOs sometimes make the code *more* readable by randyflood · 2005-11-28 07:20 · Score: 4, Interesting

I like everyone else was trained *never* to use the dreaded goto statement. I'll grant that Pascal was more readable than Basic (with unlabeled gotos).

But, sometimes, it is actually better to use a goto to make the code more readable. The Linux Kernel, for example, uses gotos. I was pretty sceptical at first because it had been drilled into my head how unreadable code was with gotos in it. But, reading the code, I have to admitt is is much more readable for exception handeling, for example.

If the goto would not make your code more readable then don't use it. But, in the cases where it would avoid a bunch of sillyness trying to get out of a bunch of nested loops in case some error happened, then it makes a lot of sense.

Linus Torvalds (and others) explain the reasoning for this at:

http://kerneltrap.org/node/553

In short, there are both readability and efficiency reasons to use gotos.

--
Randy.Flood@RHCE2B.COM
Re:GOTO considered harmful by Anonymous Coward · 2005-11-28 08:28 · Score: 1, Interesting

Don't forget the kitsch painter Odd Nerdrum! Odd nerd, yes indeed.