Optimizations - Programmer vs. Compiler?

← Back to Stories (view on slashdot.org)

Optimizations - Programmer vs. Compiler?

Posted by Cliff on Friday February 25, 2005 @08:46AM from the who-can-obfuscate-better dept.

Saravana Kannan asks: "I have been coding in C for a while (10 yrs or so) and tend to use short code snippets. As a simple example, take 'if (!ptr)' instead of 'if (ptr==NULL)'. The reason someone might use the former code snippet is because they believe it would result in smaller machine code if the compiler does not do optimizations or is not smart enough to optimize the particular code snippet. IMHO the latter code snippet is clearer than the former, and I would use it in my code if I know for sure that the compiler will optimize it and produce machine code equivalent to the former code snippet. The previous example was easy. What about code that is more complex? Now that compilers have matured over years and have had many improvements, I ask the Slashdot crowd, what they believe the compiler can be trusted to optimize and what must be hand optimized?" "How would your answer differ (in terms of the level of trust on the compiler) if I'm talking about compilers for Desktops vs. Embedded systems? Compilers for which of the following platforms do you think is more optimized at present - Desktops (because is more commonly used) or Embedded systems (because of need for maximum optimization)? Would be better if you could stick to free (as in beer) and Open Source compilers. Give examples of code optimizations that you think the compiler can/can't be trusted to do."

18 of 1,422 comments (clear)

Min score:

Reason:

Sort:

Time to post the famous Knuth quote... by xlv · 2005-02-25 08:50 · Score: 4, Informative

Donald Knuth wrote "We should forget about small efficiencies, about 97% of the time. Premature optimization is the root of all evil."
Huh by NullProg · 2005-02-25 08:54 · Score: 4, Informative

As a simple example, take 'if (!ptr)' instead of 'if (ptr==NULL)'.

Both forms resolve to the same opcode. Even under my 6502 compiler.

CMP register,val
JNE

Enjoy,

--
It's just the normal noises in here.
Wrong, wrong, wrong by JoeBuck · 2005-02-25 08:54 · Score: 4, Informative

Don't give advice when you don't know C. C requires that when a 0 is converted to a pointer, the result is NULL, so it is absolutely false to claim that NULL could be defined as -1.
"ptr == 0" must give the same result as "ptr == NULL", always.
1. Re:Wrong, wrong, wrong by Anonymous Coward · 2005-02-25 09:17 · Score: 4, Informative
  
  From the ANSI C specification, section 6.2.2.3:
  
  An integral constant expression with the value 0, or such an expression cast to TypeIs_VoidPointer, is called a null pointer constant. If a null pointer constant is assigned to or compared for equality to a pointer, the constant is converted to a pointer of that type. Such a pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function. A null pointer constant has TypeIs_NULL.
  
  Once again, it should be said:
  
  Don't give advice not to give advice to not give advice when you don't know C when you don't know C when you don't know C.
2. Re:Wrong, wrong, wrong by swillden · 2005-02-25 10:31 · Score: 3, Informative
  
  Althoug the comparision (0 == ptr) might for some twisted reason com out as a non-zero value, you are not guaranteed that (!ptr) will have the same value.
  Wrong
  Don't give advice... ah, screw it. You get the idea.
  
  --
  Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
Not a question that can be asked generally by Sycraft-fu · 2005-02-25 08:56 · Score: 4, Informative

Each compiler is different. Some will optimise things other won't.

In general, however, systems are now fast enought that when in doubt, write the clearest code possible. I mean for most apps, speed is not critical, however for all apps stability and lack of bugs is important and obscure code leads to problems.

Also, for things that are time critical, it's generall just one or two little parts that make all the difference. You only need to worry about optimizing those inner loops where all the time is spent. Use a profiler, since programmers generally suck at identifying what needs optimising.

Keep it easy to read and maintain, unless speed is critical in a certian part. Then you can go nuts on hand optimization, but document it well.
That's a Tony Hoare quote, not Donalded Knuth by Dan+Ost · 2005-02-25 08:59 · Score: 4, Informative

Donald Knuth was quoating Tony Hoare when he said that.

--

*sigh* back to work...
Re:NULL not always 0 by HeghmoH · 2005-02-25 09:04 · Score: 3, Informative

NULL isn't necessarily equal to 0 at the machine level. However, at the language level, 0 is always interpreted exactly the same as NULL if it's being converted to a pointer, and vice versa. This is explicitly spelled out in the C standard, so any compiler that doesn't obey this is breaking the standard. Saying !ptr and ptr == NULL is always identical on a conforming compiler (which they pretty much all are).

--
Mod down posts with a "Free Mac Mini/iPod" sig, they're spam!
Re:You should always... by ron_ivi · 2005-02-25 09:21 · Score: 4, Informative

Using cryptic, short variable names also shaves valuable microseconds off compile time and run time.
There's nothing wrong with short variable names.
Thus quoth Linus in the Linux Coding Style guide.
Chapter 3: Naming C is a Spartan language, and so should your naming be. Unlike Modula-2 and Pascal programmers, C programmers do not use cute names like ThisVariableIsATemporaryCounter. A C programmer would call that variable "tmp", which is much easier to write, and not the least more difficult to understand. HOWEVER, while mixed-case names are frowned upon, descriptive names for global variables are a must. To call a global function "foo" is a shooting offense. GLOBAL variables (to be used only if you _really_ need them) need to have descriptive names, as do global functions. If you have a function that counts the number of active users, you should call that "count_active_users()" or similar, you should _not_ call it "cntusr()". Encoding the type of a function into the name (so-called Hungarian notation) is brain damaged - the compiler knows the types anyway and can check those, and it only confuses the programmer. No wonder MicroSoft makes buggy programs. LOCAL variable names should be short, and to the point. If you have some random integer loop counter, it should probably be called "i". Calling it "loop_counter" is non-productive, if there is no chance of it being mis-understood. Similarly, "tmp" can be just about any type of variable that is used to hold a temporary value. If you are afraid to mix up your local variable names, you have another problem, which is called the function-growth-hormone-imbalance syndrome. See next chapter.
Re:You should always... by rjstanford · 2005-02-25 09:56 · Score: 5, Informative

LOCAL variable names should be short, and to the point. If you have
some random integer loop counter, it should probably be called "i".
Calling it "loop_counter" is non-productive, if there is no chance of it
being mis-understood.

That last clause is an important one that often gets neglected. In fact, you should never, ever, call a variable loop_counter. That's as bad as pure reverse hungarian - it tells you how its used, not what it means.

I suggest that, for all non-trivial cases (and I'd prefer to see people err verbosely than compactly), you should use descriptive names. Not loop_counter, but maybe something like curRow? It doesn't have to be long, but at least then as the loop grows over time someone can understand a piece of code more easily than having to scroll back up to check that you are indeed in the "i" loop. Its even more critical when someone comes along and adds a nested (or containing) loop. Or whatever.

Same with "tmp". If its truly temporary, such as:

int tmp = getFooCount();
doSomething(tmp);

then it should be removed and rewritten as:

doSomething(getFooCount());

If its not that temporary, give it a real name. If you insist that it is temporary then you may have a scoping issue - having variables useful but only in part of your function could indicate that your function is doing too much work. If you insist its truly temporary, scope it down: ...
someRandomCode();
{
int foo = getFoo();
doSomething(foo);
doSomethingElse(foo);
}
moreCode(); ...

At least now you've guaranteed that it is temporary. Better yet, just name it usefully.

--
You're special forces then? That's great! I just love your olympics!
valgrind by cyco/mico · 2005-02-25 10:12 · Score: 4, Informative
If in doubt, use valgrind and kcachegrind. One run with callgrind gives you all the information you want:
- How often are functions called (and branches taken)
- Which functions take most of the time
- See the assembler code for each line with a mouse click (no need to guess anymore)
callgrind/kcachegrind is by far the easiest profiling solution I ever tried, and it seems answer more or less all of your questions.
i, j, k, ... by bsd4me · 2005-02-25 10:17 · Score: 4, Informative

I think that most people forget that the reason that i, j, k, etc. are used for loop counters is that unless otherwise declared, I..N default to INTEGER in FORTRAN. This convention just carried over as programmers migrated from FORTRAN to other languages and has been passed down through the ages.

--
(S(SKK)(SKK))(S(SKK)(SKK))
illegal optimization there, for C at least by r00t · 2005-02-25 10:38 · Score: 3, Informative

If by "const references" you really mean a C++ reference, OK. If you mean a pointer though, and you use C, the compiler is prohibited from performing this optimization unless you also use the "restrict" keyword. Since you did mention "restrict", it appears that you are working with C. "restrict" is not a C++ keyword. BTW, inlinedamnit is __attribute__((__alwaysinline__)) for gcc and __forceinline for Microsoft.
Re:Algorithms, Not Stupid Processor Tricks by beelsebob · 2005-02-25 10:50 · Score: 4, Informative

That's not really the point being made though - the question really is "is it worth spending the next week trying to make optimisations, or coding a better algorithm to do this."
And while we're at it - n^2 vs 2^n *is* a big deal when working on 1ghz+ systems... If we have 1000 objects, an operation that takes 10 cycles and an n^2 algorithm, then we get a runtime of 0.01 seconds (10 x 1000 x 1000 cycles ), if we have a 2^n algorithm then we get a runtime in the hours, and no amount of optimizing the code in the loop (even down to one instruction) is going to get us anywhere near the n^2 algorithm.
The never overused example that I have by roman_mir · 2005-02-25 10:56 · Score: 3, Informative

I got this job as a contractor 4 years ago now where the project was developed by over 30 junior developers and one crazy overpaid lady (hey, Julia,) who wouldn't let people touch her code so fragile it was (and it was the main action executor,) she would rather fight you for hours than make one change in the code (she left 2 months before the project release.) Now, I have never witnessed such monstrocity of a code base before - the business rules were redefined about once every 2 weeks dor 1.5 years straight. You can imagine.

So, the client decided not to pay the last million of dollars because the performance was total shit. On a weblogic cluster of 2 Sun E45s they could only achieve 12 concurrent transactions per second. So the client decided they really did not want to pay and asked us to make it at least 200 concurrent transactions per second on the same hardware. If I may make a wild guess, I would say the client really did not want to pay the last million, no matter what, so they upped the numbers a bit from what they needed. But anyway.

Myself and another developer (hi, Paul) spent 1.5 months - removing unnecessary db calls (the app was incremental, every page would ask you more questions that needed to be stored, but the app would store all questions from all pages every time,) cached XML DOM trees instead of reparsing them on every request, removed most of the session object, reduced it from 1Mb to about 8Kb, removed some totally unnecessary and bizarre code (the app still worked,) desynchronized some of the calls with a message queue etc.

At the end the app was doing 320 and over concurrent transactions per second. The company got their last million.

The lesson? Build software that is really unoptimized first and then save everyone's ass by optimizing this piece of shit and earn total admiration of the management - you are a miracle worker now.

The reality? Don't bother trying to optimize code when the business requirements are constantly changing, the management has no idea how to manage an IT dep't, the coders are so nube - there is a scent of freshness in the air and there is a crazy deadline right in front of you. Don't optimize, if the performance becomes an issue, optimize then.

--
You can't handle the truth.
Re:Clear Code by GryMor · 2005-02-25 11:39 · Score: 5, Informative

It's working on a 2d array of data and is presuming that it is ordered as such:

123
456
789

data[y][x];

This, in memory, is:
123456789

Similarly, the accsess order for the second loop is:
123456789

But for the first one, it is:
147258369

The first one hits memory sequentially, which is good for caching as each cache line stores a large chunk of sequential memory.

Considering hitiing the cache as oposed to hitting main memory is at least 100 times faster, you'll be lucky if the first loop is only 50 times slower.

This still presumes data stored in the specified order in memory (which is common for image formats, but not the only way things are done).

--
Realities just a bunch of bits.
HLSL by Saville · 2005-02-25 17:56 · Score: 3, Informative

As far as I'm concerned compilers are better than 99% of the programmers out there. Just write clear code and let the compiler do it's trick. However there are a couple cases where things aren't automatically optimized that I can think of.

It's not really a coding trick like an XOR swap, but most compilers don't yet seem to fully unroll parallel loops into good SIMD instructions or multiple threads.

The only time I've needed to bother to look at assembly output in recent years (other than debugging a release mode program) is when writing HLSL shaders. HLSL is the high level shading languge (C like) for shaders that is part of Direct3D 9. HLSL can be compiled to SM1, SM2, or SM3 assembly.

With pixel shader 2.0 you've only got 64 instruction slots, and some important instructions like POW (power), NRM (normalize), and LRP (interpolate) take multiple slots. 64 slots is not enough for a modern shader. I curse ATI for setting it bar so low. ;)

There are two flaws I've found with the Dec 04 HLSL compiler in the DirectX. Sometimes it will not automatically detect a dot product opportunity. I had some colour code to convert to black and white in a shader and wrote it as y = colour.r*0.299 + colour.g*0.587 + colour.b*0.114; as I thought that was the most clear way to write it. Under certain circumstances the compiler didn't want to convert to a single dot instruction so I had to write as y = dot( colour.rgb, half3(.299h, .587h, .0114h)); I'm not sure if that bug still exists in the current compiler release or not.

Another is often a value in the range of -1 to +1 is passed in as a colour, which means it must be packed into 0-1 range. To get it back you've got to double it and add 1.
a = p*2+1; gets converted into a single MAD instruction which takes one slot.
a = (p-0.5)*2; gets converted into an ADD and then a MUL.

Also conventional wisdom says you've got to write assembly to get maximum performance out of pixel shader 1.1 as it is basically just eight instruction slots. I don't have any snippets to verify this though.

I think this thread demonstrates that either compilers are mature enough you don't need any code tricks to help them do their job or /. posters just aren't aware of the short commings of compilers (see first sentence of this post) and would rather post obvious advice than not post at all. :)
In most cases by Z00L00K · 2005-02-25 18:56 · Score: 3, Informative

I try to write code that is simple to read and avoid complex statements. This is often easily optimized by the compiler and more important readable for whoever is doing maintenance on the code.
For the rare performance critical parts it is however worth the effort to try various constructions to get the best performance out of the code. The most problematic issue is to identify the hotspots in the code and figure out which variables that should be declared as 'register' and those that shouldn't. Ordering of statements are also important in order to match the various performance improvments the CPU can offer. One very good document on this is actually found at AMD.
One code construct that I am using that I found is very useful is to place the matching '{' and '}' in the same column in the code. This eases the effort trying to find where a block begins.
Example:
if (a == b) {

printf("Hello!\n");

}
In my opinion this produces code that has an improved readability compared to the constructs placing the '{' on the same line as the if-statement where it is much easier to miss.

--
If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.