Programming As If Performance Mattered
Junks Jerzey writes "Saw the essay 'Programming as if Performance Mattered', by James Hague, mentioned at the Lambda the Ultimate programming language weblog. This is the first modern and sensible spin on how optimization has changed over the years. The big 'gotcha' in the middle caught me by surprise. An inspiring read." Hague begins: "Are particular performance problems perennial? Can we never escape them? This essay is an attempt to look at things from a different point of view, to put performance into proper perspective."
You can spend all your time optimizing for performance and when you finally release your product, your competition whose main objective was to get the product out the door faster, who uses a slower algorithm, is already first in mindshare with your customers. Not only that, the processors that you thought you would be targetting are already a generation behind and that algorithm that was going to hold back your competition runs perfectly fast on new processors.
Performance gains occur at the hardware level. Any tendency to optimize prematurely ought to be avoided, at least until after v1.0 ships.
I have been pwned because my
While the author's point seems to be that optimization and performance are not all that important, and that you can achieve better results with how you do things and not what you use, I tend to disagree with him.
The thing is, in real life applications, playing with a Targa file is not the same as service critical, 300 users, number crunching, data handling systems, where a small performance improvement must be multiplied by the number of users/uses, by many many hours of operation and by years in service to understand its true impact.
Just now I'm working on an econometric model for the Panama Canal (they're trying to make it bigger and need to figure out if it's worth the effort/investment) and playing with over 300 variables and 100 parameters to simulate dozens of different scenarios can make any server beg for more cycles, and any user beg for a crystal ball.
There are two kinds of people in the world: Those with good memory.
66 fps on a 3 GHz machine, doing a 600x600 simple RLE decode...
Ok, it's not bad for a a language like Erlang, but it's not exactly fast.
The big point here for the author is "it's fast enough". Lots of micro- (and macro-) optimizations are done when it turns out they aren't needed. And writing in a high level language you're comfortable in is important, if it'll do the job. This is a good point.
On the other hand, even a fairly naive implementation in something like C or C++ (and perhaps Java) would probably have acheived the goal without having to make 5 optimization passes (and noticable time examining behavior).
And even today, optimizations often do matter. I'm working on code that does pretty hard-real-time processing on multiple threads and keeps them synchronized while communicating with the outside world. A mis-chosen image filter or copy algorithm can seriously trash the rest of the system (not overlapping DMA's, inconvenient ordering of operations, etc). The biggest trick is knowing _where_ they will matter, and generally writing not-horrible-performance (but very readable) code as a matter of course as a starting point.
Disclaimer: I was a hard-core ASM & C programmer who for years beta-tested 680x0 compilers by critiquing their optimizers.
Depends on how bad it is. I've seen stuff that runs so slow there really isn't a way to throw more hardware at it. Of course that was written by a guy who had two goals: 1) to make sure no one but him could support his work, and 2) to do as little work as possible.
I don't know. Clean, elegant, functional code is beautiful. If you're ever going to have to work on it again, I think it's better for it to be clean and optimized.
Also depends on the size of the app. With a small app, what excuse do you have for not optimizing? Wouldn't take that long. With a big project? Depends on your work environment.
The bosses will never know if its optimal or not. If you tell them you've maxed out the server, they just think you write big badass code. A lot of times though, there isn't time to thoroughly bug check a big app (That what users are for, eh?), more less optimize it.
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
You assume that I made that reference to myself as being a bad programmer.
:)
The reason I made that statement was because just last week I was at Redmond for an interview for internship at Microsoft, and I was interviewed by the team that was trying to prevent just this sort of thing from happening.
The idea was to design heuristics-enabled compilers that would effectively detect any "bad-code" and help make managed code and pseudo-managed code the norm, or convert existing code into managed code.
I did not say that I was using a programming language that had such protections, merely that such programming languages have their own advantages. I was interviewed for creating compilers, linkers and OS-level protection that did not allow those troublesome things to exist - not use them - and hence my justification
That said, you may knowingly or unknowingly use a language designed for bad programmers even when you program C or C++ in upcoming versions of compilers that insist on managed code - they may just wrap up your code in a nice wrapper to prevent mishaps and hand it over to the linker after having taken care of your holes.
Less code does not equal faster code. You can usually get the best performance increases by using a better algorithm. For example, if you're doing a lot of random adding and deleting of entries, a hash table will be much faster than a linked list. This will have the greatest impact.
Other things that can help are doing your own memory management at times (i.e. freelists) since that will be faster than malloc/new, and will have less memory overhead. Also, design your storage to your data. If you know you'll allocate up to 64K of an item, and the item is small, allocate an array of 64K of them and maintain a freelist. This will use a lot less memory than dynamically allocating each item and will result in better locality.
I write code in the embedded space, where memory usage and performance are both equally important. Usually getting the last ounce of performance out of the compiler doesn't make much difference.
A good real-world example is that I replaced the malloc code provided by a popular embedded OS with DLMalloc, which glibc is based. The dlmalloc code is *much* more complicated, and the code path is much longer, but due to much better algorithms, operations that took an hour with the old simple malloc dropped down to 3 minutes. It went from exponential to linear time.
-Aaron
This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
If the man was coming from a BASIC programming background, his belief may have been understandable.
Some (very old) BASIC interpreters used to parse each source line each time it was executed.
Doing it that way saved memory (no intermediate code to store).
Those who sacrifice security to condemn liberty deserve to repeat history or something. - Benjamin Santayana
The biggest problem I see with performance is lack of visiblity of performance factors. At the hardware level there is cache (which is supposed to be transparent) and really deep pipelined processors. This can have a major effect on what would otherwise be an optimal algorithm. And the hardware keeps changing, so what may have been optimal at one point will become suboptimal later on.
In software, the biggest problem is lack of performance directives. POSIX pthreads is one of the biggest offenders here. Best performance practices in pthreads are based on how common implementations work. POSIX allows implementations that would cause major performance problems for so called best pthread programming practices. Example, POSIX allows pthread_cond_signal implementations to wake all waiting threads, not just one. There are programs that depend on pthread_cond_signal to wake only one thread for performance in order to avoid "thundering herd" problems. So while standards allow portability of correct programs, whey do not necessarily allow portability of performance.
We need explicit performance directives.
Back in the late '80s, early '90s, worked on (DOS-based) commercial products written in APL. APL is an interpreted language, written in Greek & math symbols, some overstrikes.
Each byte of that 640K was precious, so it was common practice to "de-comment" the code before release; remove all comments, reduce whitespace, move multiple statements onto a single line, possibly shorten variable names. You could gain a substantial (for the time) amount of memory that way. You also dynamically imported/destroyed functions.
I regularly debugged client systems with "de-commented" APL; if you could read that, you could read anything!
Real opimizations come before you right your program. Take for example that loop that you removed an instruction or two from. Say it is a searching an array. and looks like:
There are two things wrong. One you cal strlen repetitively. Strlen() is theta(n) So you have a loop that executes n times at a cost of n . n*n=n^2. That's one of the slowest algorithms around. Maybe your compiler is smart enough to see that x is not being modified and will to a s=strlen(x); then compare against X for you, but probably not.
The other thing is when searching an array, try to give it structure. If your array contains sorted characters, then you can find it in log _2 (n). Of course, of you sort by frequency (most commonly accessed at the top) then your n^2 loop *might* do better.
The article is right: constant-time operations (type checking, etc) are asymtotically infitessimal in algorithms. The article's real problem is that it is n, but on a 2d image (x*y)=n you can't do any better. Note that it is not n^2, (though it makes a square picture) because you're operating on pixels. So that will be your unit of measure - NOT time.
Which is my 2nd point. Don't measure in time or instructions. Measure in OPERATIONS. Operations are not instructions or lines of code. An Operation is everytime you have to look at a unit. It is a logical unit of cost. Hardware can be changed out. We know that hardware (performance) doubles every 18 months. The constant-time instructions will get smaller. (Also clocks per cycle are irrelevant as well). But your loops will remain the biggest part of your program.
With that, here's a crash course in CS:
To learn more, take a datastructures class at your local university. Please review Calculus II before doing that though.
Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.