Cliff Click's Crash Course In Modern Hardware
Lord Straxus writes "In this presentation (video) from the JVM Languages Summit 2009, Cliff Click talks about why it's almost impossible to tell what an x86 chip is really doing to your code due to all of the crazy kung-fu and ninjitsu it does to your code while it's running. This talk is an excellent drill-down into the internals of the x86 chip, and it's a great way to get an understanding of what really goes on down at the hardware and why certain types of applications run so much faster than other types of applications. Dr. Cliff really knows his stuff!"
That's not entirely true. In performance-sensitive tight loops, it can still make sense to code in ASM to avoid pipeline bubbles and stalls in some very limited situations. Also, the compiler doesn't always take advantage of instructions that it could use.
However, determining that takes a lot of effort and a lot of instrumentation, and so you'd better really need that last bit of performance before you go after it.
The ringing of the division bell has begun... -PF
Sometimes it's just plain FUN FUN FUN to code in asm. You're right that most programmers will never have a need for it at all (with some exceptions, such as those messing with operating systems or embedded systems), although knowing some ASM can help a lot with debugging. I suppose one could (read: should) learn a little ASM to have a better idea of what the hardware is doing, this will allow you to optimize your code a little, or (more importantly) write it in such a way that makes it easier for the compiler to optimize.
That's the main reason why I want to shoot people who write "clever" code on the first pass. Always make the rough draft of a program clean and readable. If (and only if!) you need to optimize it, use a profiler to see what actually needs work. If you do things like manually unroll loops where the body is only executed 23 times during the program's whole lifetime, or use shift to multiply because you read somewhere that it's fast, then don't be surprised when your coworkers revoke your oxygen bit.
Dewey, what part of this looks like authorities should be involved?
Someone has to write those tools.
Yeah, but they can be written in a HLL, too. You don't have to write a program in highly-tuned assembler to make it emit highly-tuned assembler.
Dewey, what part of this looks like authorities should be involved?
That's *generally* true. It's not *always* true.
There are a lot of purely compute-bound applications (think simulations of various sorts, etc) for which the algorithmic optimizations have already been done- but it's still worth going for the last few percent of performance from "instruction fiddling". As another poster said: if your app runs for weeks at a time, 1% improvement becomes significant in terms of time saved- and throwing more hardware at the problem isn't always feasible.
The ringing of the division bell has begun... -PF
But, its certainly better to code in a high level language first, test, tweak the algorithm as much as you can, PROFILE and THEN start breaking out your assembler. No point optimising 99% of your code in super fast asm if it only spends 1% of the cpu time in it. Even if you make all that code 10x as fast, you've only saved 0.9% cpu time. :)
I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
It all depends on your problem domain. As a high energy physicist, I write plenty of code that me, a postdoc, and maybe a couple other grad students will ever see, and probably I'm the only one that will actually ever use it. I'm designing a small cluster that will get built here in a month or few, and some of my code will take up about 2 months of solid run time on it, then never see the light of day again. If I can spend 2 days getting a 5% performance improvement, even at the expense of locking the code to this cluster, it's a net win for us.
In short, I have no "customers", I know exactly what hardware my code will be running on, and it won't ever change (until they ditch the cluster in 4-5 years and make a new one, but I'll be long gone), and I don't even have to worry about maintaining the code years in the future.
All the same, I'll probably still write the code as cleanly as possible and run it through an optimizer, and leave it at that.
SIGSEGV caught, terminating
wait... not that kind of sig.
In general modern compilers are good enough that you are much more likely to get better performance by spending the time finding a better algorithm then you are hand optimizing the code. Obviously for things like H.264 where the algorithm is already set this is not true, but that's a very small fraction of the code out there.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.