Optimizations - Programmer vs. Compiler?
Saravana Kannan asks: "I have been coding in C for a while (10 yrs or so) and tend to use short code snippets. As a simple example, take 'if (!ptr)' instead of 'if (ptr==NULL)'. The reason someone might use the former code snippet is because they believe it would result in smaller machine code if the compiler does not do optimizations or is not smart enough to optimize the particular code snippet. IMHO the latter code snippet is clearer than the former, and I would use it in my code if I know for sure that the compiler will optimize it and produce machine code equivalent to the former code snippet. The previous example was easy. What about code that is more complex? Now that compilers have matured over years and have had many improvements, I ask the Slashdot crowd, what they believe the compiler can be trusted to optimize and what must be hand optimized?"
"How would your answer differ (in terms of the level of trust on the compiler) if I'm talking about compilers for Desktops vs. Embedded systems? Compilers for which of the following platforms do you think is more optimized at present - Desktops (because is more commonly used) or Embedded systems (because of need for maximum optimization)? Would be better if you could stick to free (as in beer) and Open Source compilers. Give examples of code optimizations that you think the compiler can/can't be trusted to do."
Donald Knuth wrote "We should forget about small efficiencies, about 97% of the time. Premature optimization is the root of all evil."
As a simple example, take 'if (!ptr)' instead of 'if (ptr==NULL)'.
Both forms resolve to the same opcode. Even under my 6502 compiler.
CMP register,val
JNE
Enjoy,
It's just the normal noises in here.
"ptr == 0" must give the same result as "ptr == NULL", always.
Each compiler is different. Some will optimise things other won't.
In general, however, systems are now fast enought that when in doubt, write the clearest code possible. I mean for most apps, speed is not critical, however for all apps stability and lack of bugs is important and obscure code leads to problems.
Also, for things that are time critical, it's generall just one or two little parts that make all the difference. You only need to worry about optimizing those inner loops where all the time is spent. Use a profiler, since programmers generally suck at identifying what needs optimising.
Keep it easy to read and maintain, unless speed is critical in a certian part. Then you can go nuts on hand optimization, but document it well.
Donald Knuth was quoating Tony Hoare when he said that.
*sigh* back to work...
There's nothing wrong with short variable names.
Thus quoth Linus in the Linux Coding Style guide.
LOCAL variable names should be short, and to the point. If you have
... ...
some random integer loop counter, it should probably be called "i".
Calling it "loop_counter" is non-productive, if there is no chance of it
being mis-understood.
That last clause is an important one that often gets neglected. In fact, you should never, ever, call a variable loop_counter. That's as bad as pure reverse hungarian - it tells you how its used, not what it means.
I suggest that, for all non-trivial cases (and I'd prefer to see people err verbosely than compactly), you should use descriptive names. Not loop_counter, but maybe something like curRow? It doesn't have to be long, but at least then as the loop grows over time someone can understand a piece of code more easily than having to scroll back up to check that you are indeed in the "i" loop. Its even more critical when someone comes along and adds a nested (or containing) loop. Or whatever.
Same with "tmp". If its truly temporary, such as:
int tmp = getFooCount();
doSomething(tmp);
then it should be removed and rewritten as:
doSomething(getFooCount());
If its not that temporary, give it a real name. If you insist that it is temporary then you may have a scoping issue - having variables useful but only in part of your function could indicate that your function is doing too much work. If you insist its truly temporary, scope it down:
someRandomCode();
{
int foo = getFoo();
doSomething(foo);
doSomethingElse(foo);
}
moreCode();
At least now you've guaranteed that it is temporary. Better yet, just name it usefully.
You're special forces then? That's great! I just love your olympics!
- How often are functions called (and branches taken)
- Which functions take most of the time
- See the assembler code for each line with a mouse click (no need to guess anymore)
callgrind/kcachegrind is by far the easiest profiling solution I ever tried, and it seems answer more or less all of your questions.I think that most people forget that the reason that i, j, k, etc. are used for loop counters is that unless otherwise declared, I..N default to INTEGER in FORTRAN. This convention just carried over as programmers migrated from FORTRAN to other languages and has been passed down through the ages.
(S(SKK)(SKK))(S(SKK)(SKK))
And while we're at it - n^2 vs 2^n *is* a big deal when working on 1ghz+ systems... If we have 1000 objects, an operation that takes 10 cycles and an n^2 algorithm, then we get a runtime of 0.01 seconds (10 x 1000 x 1000 cycles ), if we have a 2^n algorithm then we get a runtime in the hours, and no amount of optimizing the code in the loop (even down to one instruction) is going to get us anywhere near the n^2 algorithm.
It's working on a 2d array of data and is presuming that it is ordered as such:
123
456
789
data[y][x];
This, in memory, is:
123456789
Similarly, the accsess order for the second loop is:
123456789
But for the first one, it is:
147258369
The first one hits memory sequentially, which is good for caching as each cache line stores a large chunk of sequential memory.
Considering hitiing the cache as oposed to hitting main memory is at least 100 times faster, you'll be lucky if the first loop is only 50 times slower.
This still presumes data stored in the specified order in memory (which is common for image formats, but not the only way things are done).
Realities just a bunch of bits.