Optimizing for memory usage will buy you much more than instruction optimization. Don't use too much memory. To quote Seymour Cray, "Memory is like orgasms, it's much better when you don't have to fake it."
Remember there is a memory speed hierachy, registers -> cache -> physical memory -> virtual (paged memory). A lot of instructions can be executed while waiting for a page of memory to be swapped in. Also, in SMP environments do not discount the affects of memory invalidation. The cache can be your friend or your worst enemy. Be smart about memory layout and usage.
In laying out structures, place the most heavily used field first. This way accesses to this field are direct, i.e. *ptr instead of *(ptr+24). This can help tremendously in some usage models.
Place fields that are used together in the same cacheline. Also try to group fields that are read-only together.
Also, when writing code to handle errors/special cases, design the code to test for cases togther. For example, if COND1 and COND2 are rare, it is better to write:
if (flag & COND1) {
handle_cond1();
} if (flag & COND2) {
handle_cond2();
}
as:
if (flag & (COND1|COND2)) {
if (flag & COND1) { handle_cond1();
} if (flag & COND2) {
handle_cond2();
}
}
In the normal case, only one test is performed to handle exceptions. Not two.
In C code, all too often people don't take advantage of the return codes of sprintf(). I've seen use of several consecutive strdup()'s. This is something that could be coded a lot more efficient with sprintf()
char buf[SIZE];
char *pc;
pc = buf; pc += sprintf(pc, "%s", str1);
/* other stuff here */
pc += sprintf(pc, "%s", str2);
Get to know your compiler. Some compilers code the if clause of an if-then-else statement inline and jump out of line for the else portion, some do the opposite. Look at the assembly that you compiler puts out and code around that. Inline execution will help.
Using const can be a winner. Most compilers are smart enough to not save const variables on the stack. It also helps in interprocedural optimization, if your compiler supports it.
Make sure that your code is lint clean. This too can help with register allocation.
Get to know the pragmas you compiler supports. A lot of times you can provide hints to the compiler to help it optimize the code better.
There is the old rule of thumb that states that functions should be no more than one page of source code long. This is especially helpful when all else fails and you need to hand code things in assembly. Hand coding a function 500 lines long is a lot more difficult than one 50 lines long. Usually only a small portion of the function really needs the optimization anyway.
Measure. Measure. Measure. The minimum number of instructions does not always mean the faster execution time. Measure to make sure that your changes are actually improving the execution speed.
Also, when you find something that you feel needs hand coding, leave the high-level code in place behind ifdef's. This will allow porting to another architecture to be easier. Also, one platform may have different semantics that allow the code to run reasonably without hand coding.
All to often people jump in head first and start tweaking pieces of code that could be solved at a higher level. Hand coding an O(n**2) function in assembly is not going to change the fact that it is O(n**2).
The best piece of advice is to optimize the problem not the solution. I was once asked to look at optimizing a program that was taking too long to run. Simplified, it sorted a ten-million-line file then selected lines that matched certain criteria. The output was on the order of ten-thousand lines. Reversing the order of the operations, select then sort, speed the program up more significantly than any sorting algorithm ever could have.
Optimizing for memory usage will buy you much more than instruction optimization. Don't use too much memory. To quote Seymour Cray, "Memory is like orgasms, it's much better when you don't have to fake it."
Remember there is a memory speed hierachy, registers -> cache -> physical memory -> virtual (paged memory). A lot of instructions can be executed while waiting for a page of memory to be swapped in. Also, in SMP environments do not discount the affects of memory invalidation. The cache can be your friend or your worst enemy. Be smart about memory layout and usage.
In laying out structures, place the most heavily used field first. This way accesses to this field are direct, i.e. *ptr instead of *(ptr+24). This can help tremendously in some usage models.
Place fields that are used together in the same cacheline. Also try to group fields that are read-only together.
Also, when writing code to handle errors/special cases, design the code to test for cases togther. For example, if COND1 and COND2 are rare, it is better to write:
as:
}In the normal case, only one test is performed to handle exceptions. Not two.
In C code, all too often people don't take advantage of the return codes of sprintf(). I've seen use of several consecutive strdup()'s. This is something that could be coded a lot more efficient with sprintf()
Get to know your compiler. Some compilers code the if clause of an if-then-else statement inline and jump out of line for the else portion, some do the opposite. Look at the assembly that you compiler puts out and code around that. Inline execution will help.
Using const can be a winner. Most compilers are smart enough to not save const variables on the stack. It also helps in interprocedural optimization, if your compiler supports it.
Make sure that your code is lint clean. This too can help with register allocation.
Get to know the pragmas you compiler supports. A lot of times you can provide hints to the compiler to help it optimize the code better.
There is the old rule of thumb that states that functions should be no more than one page of source code long. This is especially helpful when all else fails and you need to hand code things in assembly. Hand coding a function 500 lines long is a lot more difficult than one 50 lines long. Usually only a small portion of the function really needs the optimization anyway.
Measure. Measure. Measure. The minimum number of instructions does not always mean the faster execution time. Measure to make sure that your changes are actually improving the execution speed.
Also, when you find something that you feel needs hand coding, leave the high-level code in place behind ifdef's. This will allow porting to another architecture to be easier. Also, one platform may have different semantics that allow the code to run reasonably without hand coding.
All to often people jump in head first and start tweaking pieces of code that could be solved at a higher level. Hand coding an O(n**2) function in assembly is not going to change the fact that it is O(n**2).
The best piece of advice is to optimize the problem not the solution. I was once asked to look at optimizing a program that was taking too long to run. Simplified, it sorted a ten-million-line file then selected lines that matched certain criteria. The output was on the order of ten-thousand lines. Reversing the order of the operations, select then sort, speed the program up more significantly than any sorting algorithm ever could have.