Understanding Bandwidth and Latency
M. Woodrow, Jr. writes "Ars has a very eye-opening article on the real causes of bandwidth latency and why we should just drool endlessly over maximum throughput issues. In particular, I think the author's look into the PowerPC 970 and the P4's frontside bus is interesting considering how we're constantly being told by marketers that more speed is always going to translate into massive performance gains. The issue is, of course, far more complex, and this article does a good job of thinking about the problem from an almost platform agnostic point of view."
By keeping all the code in the cache, inside the processor using self-modifying code in the most vital, inner loop areas of your program.
Modifying the code prevents the processor from reading further CPU instructions during the innermost loops through the slow bus, therefore giving a gargantuan speed boost.
Self-modifying code is just another programming tactic that we sacrificed for "ease of use" long ago, and it's been all but forgotten.
No CS student should be able to graduate with a degree without being able to write self-modifying code, but since very few can nowadays, here's a link:
http://reguly.net/alvaro/cic/linux-asm/self.html