Goto Leads to Faster Code
pdoubleya writes "There's an article over at the NY Times (registration required) about Kazushige Goto, the author of the Goto Basic Linear Algebra Subroutines (BLAS, see the wiki); his BLAS implementation is used by 4 of the current 11 fastest computers in the world. Goto is known for painstaking effort in hand-optimizing his routines; in one case, "when computer scientists at the University at Buffalo added Goto BLAS to their Pentium-based supercomputer, the calculating power of the system jumped from 1.5 trillion to 2 trillion mathematical operations per second out of a theoretical limit of 3 trillion." To quote Jack Dongarra, from the University of Tennessee, "I tell them that if they want the fastest they should still turn to Mr. Goto."" Ever get the feeling someone wrote an article merely for the pun?
I'd always been told that use of Goto led to a case of the BLAS in my code!
Those who can, do. Those who can't, write technology blogs.
Ever get the feeling someone wrote an article merely for the pun?
Good thing the headline didn't contribute to that at all.
Not Buzzword 2.0 compliant. Please speak english.
Although he also writes fast code, Mr. Bluescreen was criticised for the poor stability of his code.
It was CIS 150, C++ was the language of the day (pascal before, java after.) I was taking an exam that was all coding. I remember extensive use of GOTO from my commodore days, so I used one in a test (the objective was to code something with as few lines as possible)
;)
I had the shortest working code in the class but the arse hole teacher failed me for it. Said something like "we don't teach goto for a reason. Yeah, it's in the book, but don't ever use it!"
Jerk. I should post his phone number on slashdot
Do not meddle in the affairs of sysadmins, for they are subtle, and quick to anger.
...To see who actually reads the article.
;)
Judging from the replies...not many people
Seriously, though, how does a guy end up with a name like this in computer programming? It sounds made-up! Then again, I've heard some very, very odd names...
The World Wide Web is dying. Soon, we shall have only the Internet.
I like to see people replying to a post after reading the name subject line and then express an opinion that 1) Everyone knows and no one would argue with and 2) Has pretty much nothing to do with the article beside a easy pun.
Goto Considered Helpful?
-Loyal
I aim to misbehave.
This guy is clearly considered harmful.
http://alternatives.rzero.com/
DEC had an ultra-optimized math library (calculations on arrays, Fourier transforms, etc.), improved over decades by generations of PhDs. There were different versions of the routines for the different generations of CPUs, for the different cache sizes of a same model, maybe even for various speeds of RAM. Needless to say, the simple fact of linking against that library instead of the standard one improved the speed of math intensive code by a good 10 to 20 percent (those numbers out of my fuzzy memory, but that far from insignificant).
Add to that compilers that were producing top-notch machine language for the target architecture (producing images that ran twice as fast as what gcc gave you at best), CPUs that were spanking the rest of the world as far as floating-point performance was concerned, and you can understand why the scientific community has kept using Alphas for so long.
Point taken, but in the early days goto still made a lot of sense, but a lot of conventional, old practices have gone the wayside with compilers that are smarter and better optimizing, and with better standardization in languages overall.
The *first* time I learned C, goto was perfectly acceptible (yay K&R original C material).
But really, my point is that a computer doesn't see things in the sense of functions; it sees things in the sense of labels (memory addresses), and in a sense, programming using functions is simply another way of getting around labeling a routine.
"Victory means exit strategy, and it's important for the President to explain to us what the exit strategy is." G.W.Bush
10 Print "oh Mr. K. GOTO" 20 I=I+1 30 If I 5 Print "Domo" Else 50 40 GOTO 20 50 Print "I'm Kilroy! Kilroy! Kilroy! Kilroy!"
You might want to read up on this page for some human interaction hints.
Try out fish, the friendly interactive shell.
I believe you are referring to Kazushige's cousin, Mr. Gosub.
He is your goto-guy :)
The thing is, "Goto" isn't logical.
Your argument against Goto is even less logical. Goto is a conditional jump, where the condition is always true. It's an if (true) { do; }.
Our brains have plenty of Goto's hardcoded into them; "repeat" is typically implemented through in a "goto" fashion, but you'll want to ignore that if you're a modern computer. The correct way is to instead unroll the loop and have no jump instruction at all (if you can get around it).
Sigh. Why don't they teach assembly anymore. It should be a pre-req to learning higher level languages.
"Victory means exit strategy, and it's important for the President to explain to us what the exit strategy is." G.W.Bush
No, it is about structured programming. At least indirectly through use of the pun. It's more on topic than a lot of the discussion on this site.
Considering the number of scientists who have been looking at this over a number of years, I think it really is a credit to Goto's work. Optimizing at this level is very challenging work on modern processors.
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
Hey kids! Your uncle Sammy here, with a fun rainy-day Slashdot activity for you!
First: take an article which revolves around a pun, just like this one, to deliver a message which has a different meaning than the headline would suggest.
Next: Pick a comment-karma threshold. Two or three ought to do it!
Last: Count how many of the people at that level have completely missed the point of the article: specifically, that the "Goto" in the writeup is not a GOTO statement, but rather the name of a programmer named Kazushige Goto; that this particular distinction is supposed to be considered a bit of ironic humor; and, that this is, in fact, the reason that Hemos posted it "from the we're-punny-this-morning dept."
Hours of fun for everyone!
0 END
Strange women lying in ponds distributing swords is no basis for a system of government.
A lot of people complain about people never reading the actual articles before they comment, but it seems worse than that. People don't even bother reading the blurbs.
I wonder where the slashdot effect comes from then?
Anybody who criticizes Goto Kazushige's Free Software credentials - he created a Linux/Alpha distribution called Stataboware, which among other things included an early version of his hand-tuned math library back in 1999 (it's now defunct, unfortunately).
10 Goto FirstPost
^C^C^C^C^C^C^C^C
From the article:
"Robert A. van de Geijin, a computer scientist who works with Mr. Goto at the Texas Center,..."
All right, a Japanese programmer named Goto, working with a non-Japanese guy name Geijin. That's too much.
Which is certainly good, but to me says more about the previous implementation than it does about Goto's work.
Yeah, that previous implementation must have totally sucked. I know all my linear algebra software is written around an assembly language core, hand tuned for each new version of a half dozen processors, and designed from the start to minimize TLB misses instead of just naively trying to fit a dataset into L1 or L2 cache. I don't know why those retards at the universities and national labs were ever using anything else!
(closing Slashdot, going back to working on my shamefully unoptimized C++ numerics code...)
Which only goes to show that you haven't considered the implications of optimization in modern processors. A Pentium 4 can operate above 3 GHz. This means that light can travel no more than 10 centimeters in the duration of one clock pulse. With the spacing in the motherboard, this isn't enough for a pulse to go from the CPU to the RAM and come back. Even if the memory could operate at the same rate as the CPU, the computation would still be limited by light speed alone.
Optimization to get the full advantage of a Pentium 4 doing floating point calculations is one of the most difficult tasks one can do in computing. A P4 can do, in one clock pulse, four multiplications and four additions. To get 100% of this speed one needs to have a sophisticated handling of cache memory, among other requirements.
Oh, Goat-toe hell you spoilsport!
After all, it was Donald Knuth himself who, in "Structured Programming with goto Statements" (Computing Surveys, sometime in 1974), wrote "At the [year] IFIPS Conference, I was introduced to Dr. Eiichi Goto, who cheerfully complained that he was always being eliminated."
(Apologies for errors, as my issues of CS are in storage and I'm doing this from memory.)
My favorite ever comment was, "If I ever saw this in the real world, I'd fire you" attached to an "A" test paper with a programming question on it I'd managed to reduce to one line of nearly incomprehensible recursion.
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
Yes, functions impose overhead. however:
- if your function is small enough and your compiler smart enough, it can inline the routine, removing overhead preserving readability
- nobody can say where the time-critical code is without profiling. Most of the (fortran) code I handle spends 80% of the time zeroing arrays. There's not so much to optimize in this procedure, and optimizing the remaining 20% by filling the code with gotos is only a waste of time
- if the algorithm is slow, optimize. If it's still slow, change algorithm.
- last but not least. If you optimize the code to save some hour of computational cost but you obtain code which needs an additional month to debug, you are doomed to have a very bad time.
-- "If A equals success, then the formula is A=X+Y+Z. X is work. Y is play. Z is keep your mouth shut." - Einstein
If wife has headache, GOTO sleep
If boss is on vacation, GOTO strip bar for long lunch
If in-laws are coming over, GOTO work and pretend there is a critical problem that requires your presence all night
If technical conference is in Vegas, GOTO it
loads of examples.
If work is boring, GOTO slashdot to kill an hour or two
"I have as much authority as the pope, I just
don't have as many people who believe it" - George Carlin
I was involved with the design and the benchmarking effort of #5, #59, #67 and a few others in Top500. The performance of a supercomputer is determined by the number of real FLOPs acheived versus theoretically claimed.
Theoretical FLOPs per processor = Core(s) * Speed_Per_Core (in Ghz) * 2. So for a Dual 3.6 Ghz Xeon, the theoretrical FLOPS is 2 * 3.6 * 2 = 14.4
An easy way to find out actual number of FLOPS a computer can acheive is to ask it to solve a number of Linear Algebra problems and then look at the time it takes to finish solving these problems. The faster the time, better FLOPs obviously.
Now, the reason we chose gotolib was:
1) It works with GCC
2) It is optimized to use the processor cache
3) And therefore fewer cache misses which translates to superior performance
4) And its free (though the source is not exactly open).
Because it uses the processor cache so effectively, it results in a better number than a regular BLAS (which does not use processor cache).
Alternatively, I've also used Intel's MKL which offers comparable performance but then it works best with ICC and its not free. Btw, #59 was benchmarked using gotolib and MKL -- but if I remember correctly, the final result was derived using MKL.
In essence, if you want to use GCC and work with lots of number crunching ie BLAS, gotolib is your best option.
Ever get the feeling someone wrote an article merely for the pun?
a per
There's of course the famous Alpher-Bethe-Gamow paper: http://en.wikipedia.org/wiki/Alpher-Bethe-Gamow_p
Apologies if I implied critcism of his work, that wasn't the intention.
My point is simply that in the field of optimisation, all your gains come from succeeding where the last guy failed. This 30% improvement in performance is not the difference between Goto's approach and his nearest competitor, it is the difference between his work and the previous solution used on that particular machine.
It's the statistic I'm criticising - it simply isn't very meaningful.
(I used to do research into hardware optimisation. Improvements of more than about 5% in any field to which smart people devote a lot of thought always make me suspicious !)
We have given birth to a new acronym: RPFH Read Past the F**king Headline.
AT&ROFLMAO
http://en.wikipedia.org/wiki/Nobukazu_Takemura
http://news.com.com/Writing+the+fastest+code%2C+by +hand%2C+for+fun/2100-1022_3-5972844.html?tag=nefd .top
"the correct pronunciation of my name is more like "goat-toe.""
Is that anything like camel toe?
"As God is my witness, I thought turkeys could fly." A. Carlson
Atlas is open-source and is a pretty good alternative. It is only a few percent slower than libgoto in most cases.
Save the bandwidth. Don't use sigs!
Propably because, being scientists first and programmers second, they simply don't have the time neccessary to learn the characteristics of the processor to the degree neccessary to even match, let alone overcome, the output generated by the compiler. It could also be that the algorithms are under active development, in which case writing them in assembler doesn't really make sense, since it will increase the time needed to write and test new versions. And if the scientists find that some function is unacceptably slow, and can't figure out a more efficient algorithm, they can always just hire a code monkey to hand-tune it with assembler.
Speaking of compiler optimizations, if simply replacing control structures with goto made the function 30% faster, then either the compiler truly sucks, or the previous implementation was something horrible.
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
Of course, these "tweaks" are related the global numerical scheme. Reordering a loop here and there then running to see if it made a difference is simply not practical as you point out.
Sam
It's just two syllables... the first, "go", is pronounced like the "go" in gordon, not like the English word "go". The second, "to", is not like the English word "to", it is pronounced like the "to" in "tornado". Try saying "gohr-tohr". *
* Note: Pronunciation instruction may only apply if you live in the city of Boston. People living in other localities may need to contact the appropriate authorities for further instruction.
If it's for-profit but free, you're not the customer -- you're the product (e.g., the Slashdot Beta's "audience").
In CS courses below 300, we were told that goto was evil and should NEVER be used. They didn't even teach its use. In 300+ courses, we were given examples of why goto is sometimes the best approach producing easier to understand code that was also faster. In fact, a lot of time was spent by professors that did real work deprogramming brainwashed students who were taught that global variables should never EVER be used, goto is Satan, dynamic memory is for terrorists and all kinds of god awful ideas.
That is one of the problems with academia. There are too many Java hugging professors teaching the C/C++ courses and trying to push their own agenda. A Java loving professor completely deprived his students of an entire semester of C++ file/data structure instruction because of his Java pimping agenda. When the term project of "File and Data Structures in C and C++" is a Java project, you know there's a problem...
> What kind of dumbass, shitbox, stupid programmer are you?
Oh dear. That's the worst code I've seen for ages.
Why free bar in the case that its attempt to allocate memory fails? Shouldn't you instead be freeing foo?
Likewise for the attempt to malloc baz, where you attempt to free baz instead of both bar and foo.
> Excuse my extremely sloppy writing, I'm rushing, and don't have time to proof
> read & restructure.
Not only that, your code is terrible, and your quoted justification for using goto "in every 10 lines of code" shows that you shouldn't be let anywhere near a compiler.
And after all that, you manage to find it within yourself to abuse someone who clearly knows more about the subject than yourself! Do you perhaps have a very small penis?
Wait until you read in an interview about Mr. Kazushige Goto's favorite food.
Italian.
Pasta.
Specially Spaghetti.
[/me ducks and runs away....]
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Seriously. Computed goto is very useful for low-level
optimizations in things like high-throughput ethernet
drivers and such. It basically eliminates conditional
checks in cases where the condition stays the same
for a particular set of data. So instead ofone would haveIf the second part is executed in a loop, the savings of
not making an IF comparison accumulate fairly quickly.
3.243F6A8885A308D313
for more technical info, see his site at the Texas Advanced Computing Center. pretty pictures and software tool downloads even.
I can attest to the efficiency of these routines. When I benchmarked a 22 processor Opteron cluster w/ Myrinet, the use of Goto BLAS resulted in a near 20% drop in CPU utilization but yielded a ~2 GFlop gain in performance using HPL (performance was roughly 60 GFlops total. Given more time, I could have probably coaxed more out of Linpack). This compared to ATLAS, the self-tuning BLAS and LAPACK routines that I painstakingly recompiled at least a few dozen times. Generally, ATLAS yields very decent results even compared to some of the "drop-in" Lin-Alg. routines found with most high-end compilers like PGI (ACML, PGI-optimized BLAS/LAPACK/SCALAPACK) but so far, nothing I have tried rivals the performance, in the case of HPL, of Goto's implementation. Great work, man!
I like everyone else was trained *never* to use the dreaded goto statement. I'll grant that Pascal was more readable than Basic (with unlabeled gotos).
But, sometimes, it is actually better to use a goto to make the code more readable. The Linux Kernel, for example, uses gotos. I was pretty sceptical at first because it had been drilled into my head how unreadable code was with gotos in it. But, reading the code, I have to admitt is is much more readable for exception handeling, for example.
If the goto would not make your code more readable then don't use it. But, in the cases where it would avoid a bunch of sillyness trying to get out of a bunch of nested loops in case some error happened, then it makes a lot of sense.
Linus Torvalds (and others) explain the reasoning for this at:
http://kerneltrap.org/node/553
In short, there are both readability and efficiency reasons to use gotos.
Randy.Flood@RHCE2B.COM
This always reminds of how "Label not found" was translated as "Volumenaam niet gevonden" in the Dutch version of MS-DOS.
.BAT file uses GOTO and specifies a nonexistent label, the translation to "volume name" is completely incorrect. .BAT file, it took me quite some time before I understood what was happening.
The translator apparently had seen the DIR output "Volume in drive A: has no label" and believed that the "label" is referring to a "volume label" and translated it as "volumenaam" ("volume name").
But when a
When I first got this errormessage running a