Programming As If Performance Mattered
Junks Jerzey writes "Saw the essay 'Programming as if Performance Mattered', by James Hague, mentioned at the Lambda the Ultimate programming language weblog. This is the first modern and sensible spin on how optimization has changed over the years. The big 'gotcha' in the middle caught me by surprise. An inspiring read." Hague begins: "Are particular performance problems perennial? Can we never escape them? This essay is an attempt to look at things from a different point of view, to put performance into proper perspective."
You can spend all your time optimizing for performance and when you finally release your product, your competition whose main objective was to get the product out the door faster, who uses a slower algorithm, is already first in mindshare with your customers. Not only that, the processors that you thought you would be targetting are already a generation behind and that algorithm that was going to hold back your competition runs perfectly fast on new processors.
Performance gains occur at the hardware level. Any tendency to optimize prematurely ought to be avoided, at least until after v1.0 ships.
I have been pwned because my
While the author's point seems to be that optimization and performance are not all that important, and that you can achieve better results with how you do things and not what you use, I tend to disagree with him.
The thing is, in real life applications, playing with a Targa file is not the same as service critical, 300 users, number crunching, data handling systems, where a small performance improvement must be multiplied by the number of users/uses, by many many hours of operation and by years in service to understand its true impact.
Just now I'm working on an econometric model for the Panama Canal (they're trying to make it bigger and need to figure out if it's worth the effort/investment) and playing with over 300 variables and 100 parameters to simulate dozens of different scenarios can make any server beg for more cycles, and any user beg for a crystal ball.
There are two kinds of people in the world: Those with good memory.
66 fps on a 3 GHz machine, doing a 600x600 simple RLE decode...
Ok, it's not bad for a a language like Erlang, but it's not exactly fast.
The big point here for the author is "it's fast enough". Lots of micro- (and macro-) optimizations are done when it turns out they aren't needed. And writing in a high level language you're comfortable in is important, if it'll do the job. This is a good point.
On the other hand, even a fairly naive implementation in something like C or C++ (and perhaps Java) would probably have acheived the goal without having to make 5 optimization passes (and noticable time examining behavior).
And even today, optimizations often do matter. I'm working on code that does pretty hard-real-time processing on multiple threads and keeps them synchronized while communicating with the outside world. A mis-chosen image filter or copy algorithm can seriously trash the rest of the system (not overlapping DMA's, inconvenient ordering of operations, etc). The biggest trick is knowing _where_ they will matter, and generally writing not-horrible-performance (but very readable) code as a matter of course as a starting point.
Disclaimer: I was a hard-core ASM & C programmer who for years beta-tested 680x0 compilers by critiquing their optimizers.
You would think that with all the years put into developing computer languages, as well as the decades of software engineering, that these algorithms and techniques would make their way into best practices.
.Net's as well as Smalltalk's are all great. Taking the power of those libraries and making them usable across different languages, even making them scriptable would bring the speed optimizations in those libraries available to everyone.
This, of course, has already begun with many frequently used algorithms like sorting or hashing being made part of the language core libraries, but more than that, it seems that duplicating effort occurs much more often than simply that.
This is one instance where Microsoft has really come through. Their COM architecture allows for inter-language reuse of library code. By releasing a library which is binary compatible across different languages, as well as backwards compatible with itself (v2.0 supports v1.9), the COM object architecture takes much of the weight of programming difficult and repetitive tasks out of the hands of programmers and into the hands of library maintainers.
This kind of separation of job function allows library programmers the luxury of focusing on optimizing the library. It also allows the client programmer the luxury of ignoring that optimization and focusing on improving the speed and stability of his own program by improving the general structure of the system rather than the low level mundanities.
Large libraries like Java's and
I have been pwned because my
And, sure enough, there's a known, exploitable buffer overflow in Microsoft's RLE image decoder.
Depends on how bad it is. I've seen stuff that runs so slow there really isn't a way to throw more hardware at it. Of course that was written by a guy who had two goals: 1) to make sure no one but him could support his work, and 2) to do as little work as possible.
I don't know. Clean, elegant, functional code is beautiful. If you're ever going to have to work on it again, I think it's better for it to be clean and optimized.
Also depends on the size of the app. With a small app, what excuse do you have for not optimizing? Wouldn't take that long. With a big project? Depends on your work environment.
The bosses will never know if its optimal or not. If you tell them you've maxed out the server, they just think you write big badass code. A lot of times though, there isn't time to thoroughly bug check a big app (That what users are for, eh?), more less optimize it.
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
You assume that I made that reference to myself as being a bad programmer.
:)
The reason I made that statement was because just last week I was at Redmond for an interview for internship at Microsoft, and I was interviewed by the team that was trying to prevent just this sort of thing from happening.
The idea was to design heuristics-enabled compilers that would effectively detect any "bad-code" and help make managed code and pseudo-managed code the norm, or convert existing code into managed code.
I did not say that I was using a programming language that had such protections, merely that such programming languages have their own advantages. I was interviewed for creating compilers, linkers and OS-level protection that did not allow those troublesome things to exist - not use them - and hence my justification
That said, you may knowingly or unknowingly use a language designed for bad programmers even when you program C or C++ in upcoming versions of compilers that insist on managed code - they may just wrap up your code in a nice wrapper to prevent mishaps and hand it over to the linker after having taken care of your holes.
Less code does not equal faster code. You can usually get the best performance increases by using a better algorithm. For example, if you're doing a lot of random adding and deleting of entries, a hash table will be much faster than a linked list. This will have the greatest impact.
Other things that can help are doing your own memory management at times (i.e. freelists) since that will be faster than malloc/new, and will have less memory overhead. Also, design your storage to your data. If you know you'll allocate up to 64K of an item, and the item is small, allocate an array of 64K of them and maintain a freelist. This will use a lot less memory than dynamically allocating each item and will result in better locality.
I write code in the embedded space, where memory usage and performance are both equally important. Usually getting the last ounce of performance out of the compiler doesn't make much difference.
A good real-world example is that I replaced the malloc code provided by a popular embedded OS with DLMalloc, which glibc is based. The dlmalloc code is *much* more complicated, and the code path is much longer, but due to much better algorithms, operations that took an hour with the old simple malloc dropped down to 3 minutes. It went from exponential to linear time.
-Aaron
This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
If the man was coming from a BASIC programming background, his belief may have been understandable.
Some (very old) BASIC interpreters used to parse each source line each time it was executed.
Doing it that way saved memory (no intermediate code to store).
Those who sacrifice security to condemn liberty deserve to repeat history or something. - Benjamin Santayana
The biggest problem I see with performance is lack of visiblity of performance factors. At the hardware level there is cache (which is supposed to be transparent) and really deep pipelined processors. This can have a major effect on what would otherwise be an optimal algorithm. And the hardware keeps changing, so what may have been optimal at one point will become suboptimal later on.
In software, the biggest problem is lack of performance directives. POSIX pthreads is one of the biggest offenders here. Best performance practices in pthreads are based on how common implementations work. POSIX allows implementations that would cause major performance problems for so called best pthread programming practices. Example, POSIX allows pthread_cond_signal implementations to wake all waiting threads, not just one. There are programs that depend on pthread_cond_signal to wake only one thread for performance in order to avoid "thundering herd" problems. So while standards allow portability of correct programs, whey do not necessarily allow portability of performance.
We need explicit performance directives.
Back in the late '80s, early '90s, worked on (DOS-based) commercial products written in APL. APL is an interpreted language, written in Greek & math symbols, some overstrikes.
Each byte of that 640K was precious, so it was common practice to "de-comment" the code before release; remove all comments, reduce whitespace, move multiple statements onto a single line, possibly shorten variable names. You could gain a substantial (for the time) amount of memory that way. You also dynamically imported/destroyed functions.
I regularly debugged client systems with "de-commented" APL; if you could read that, you could read anything!
Real opimizations come before you right your program. Take for example that loop that you removed an instruction or two from. Say it is a searching an array. and looks like:
There are two things wrong. One you cal strlen repetitively. Strlen() is theta(n) So you have a loop that executes n times at a cost of n . n*n=n^2. That's one of the slowest algorithms around. Maybe your compiler is smart enough to see that x is not being modified and will to a s=strlen(x); then compare against X for you, but probably not.
The other thing is when searching an array, try to give it structure. If your array contains sorted characters, then you can find it in log _2 (n). Of course, of you sort by frequency (most commonly accessed at the top) then your n^2 loop *might* do better.
The article is right: constant-time operations (type checking, etc) are asymtotically infitessimal in algorithms. The article's real problem is that it is n, but on a 2d image (x*y)=n you can't do any better. Note that it is not n^2, (though it makes a square picture) because you're operating on pixels. So that will be your unit of measure - NOT time.
Which is my 2nd point. Don't measure in time or instructions. Measure in OPERATIONS. Operations are not instructions or lines of code. An Operation is everytime you have to look at a unit. It is a logical unit of cost. Hardware can be changed out. We know that hardware (performance) doubles every 18 months. The constant-time instructions will get smaller. (Also clocks per cycle are irrelevant as well). But your loops will remain the biggest part of your program.
With that, here's a crash course in CS:
To learn more, take a datastructures class at your local university. Please review Calculus II before doing that though.
Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.
When I was finally allowed to program, I did it on a Commodore PET in elementary school. Of course I was writing games, and I optimized because games are not fun when they are slow. I was too lazy to type in the source from magazines, so all of my programs grew until they were usable, then grew some more as people played them and asked for features.
Junior High did not have computers yet. I finally convinced my family to get me a computer if I paid half. With my budget, that meant a VIC20 for under $100. The VIC20 had 4KB of RAM. You could buy a 16KB expansion, but I could not afford it.
The language was the same as the PET, so I tried to run my existing programs. They ran. I tried to modify them, save, and run them, and they would not work, even if the change was to remove code. I finally tried changing all the commands to "tokens" to shorten them. IIRC, a token was the first 2 characters of a command and an underscore. Since most of the commands were 4 letters, this saved quite a few characters. I also renamed all my variables to shorten them. Then I saved and the program ran. Yeah!
Then I made another change, and the problem reappeared.
I decided that:
10 The program loaded as written to the tape. (Hard drive? Floppy disk? Never heard of them.)
20 If the program fit in memory, it would run.
30 When the program was loaded for editing, all tokens were expanded to the full command.
40 The program was saved as text, except...
50 If the tokenized version of a command was encountered, then it was saved as the token. I never figured out if they were saved as the 2 Hex number, the dollar sign and the number, or the 3-character shorthand "token" I typed.
60 GOTO 10 (and see if it runs.)
So every time I wanted to modify the longer programs, I had to change every command to the "token" format. (About half of my programs were under the 4KB limit, about half could be "fixed" using this technique, and a few were large enough that I never got them working again.) Any changes to the longer programs required 20 minutes of "tokenizing" the commands before saving it. That killed much of the fun of programming. (Today I get upset if a build takes longer than a game of Solitaire, but "getting upset" means deciding to fix the build process.)
Commodore bought BASIC from MS, and then modified it, so I do not know who to blame for the hours I wasted on this, but Commodore is gone and MS continues to take the fun out of computers, so I blame MS.
---
My next venture into computers was the C64. They had "Sprites". Half of the code in my games was controlling the graphics, and this improvement to the platform made that code obsolete. For the challenge, I upgraded one game to use Sprites. They took much of the fun out of it, and (IIRC) you were limited to 4 of them, so you had to play games (pun intended) to write PacMan. (4 Ghosts and PacMan required 5 Sprites. The dots and cherries would be handled without Sprites. It was easy to write a 3-ghost PacMan game, and really difficult to write a 4-ghost game.)
Since the C64s at school did not have tape drives, my old programs had to be typed in, if I had a printout from elemetary school (no printer at home.) I already stated that I was lazy, so they are gone. Well, I still have the tapes, but they are 2 decades old, and my PC does not have a tape drive anyway.
I spend my life entertaining my brain.