Optimizations for Source-Based Distributions?

← Back to Stories (view on slashdot.org)

Optimizations for Source-Based Distributions?

Posted by Cliff on Tuesday January 7, 2003 @12:32PM from the obsessive-compusive-about-your-cycles dept.

Kenny Mann asks: "I currently run a Linux distribution called Lunar Linux and it is a source based distribution branched from the original Sorcerer GNU Linux. I've done a bit of research on compiler optimizations and such and was wondering what kind of performance is there really to be had for setting these options? I know that the more options the greater chance of unexpected failures, so my next question is what about optimizing your kernel?" Optimization is tricky, and I think the answer to this question is more complex than "yes, optimize" or "no, don't optimize". Rather there might be classes of applications that are safe to optimize and classes of applications that are not. How do those performance hounds out there feel about optimizing the kernel, however?

2 of 41 comments (clear)

Min score:

Reason:

Sort:

Re: Should be a database with this info. by OldMiner · 2003-01-07 13:34 · Score: 5, Informative

to find out what instabilities happen due to specific optimizations, and either fix GCC so it can more intelligently tell when to optimize and when not to

Having only but read the man page for gcc a couple of times, and not even that of gcc 3.0, I can say I'm woefully underqualified to comment on the subject, but I will anyhow. From what I've observed, there is no legal C++ code that doesn't self-modify which gcc can't intelligently optimize without problems. The only time that I have ever seen optimization issues was when inline assmebler was involved, which I would hope was already optimized, considering the nature of the beast. Further, many of the optimizations that gcc performs are rather simple things such as loop unrolling, function inlining, delaying popping of the stack until after several function calls, etc.

Perhaps one of the most notable optimizations, for the beginner at least, is that one needs to provide -O or gcc will not allocate any variables in a register. It'll be memory, register, operation, and back to memory over and over again. (Or perhaps just a direct memory operation if you're on x86.) Despite my early teachers' insistance that compilers were simply too smart and didn't need such hints, I tested and found that a trivial heavily looped programs often ran 3 times faster when I declared the loop counters as 'register'. The problem was that we were simply using "gcc source.c" to compile our programs. gcc produces very poor code if -O isn't used at least.

But, anyhow, I think the largest issues would be concerned with -m and -f flags which may change default or even standard behavior. For instance, -felide-constructors breaks ANSI C++ compliance, but isn't a bad idea if you create and destroy a large number of expernsive objects. (Then again, you shouldn't do that.)

--
You like splinters in your crotch? -Jon Caldara
Optimization hints. NUT ALERT: be warned and read by jsse · 2003-01-07 14:13 · Score: 5, Interesting

First answer your question on kernel - the kernel optimization by default is good enough, e.g. it uses -Os instead of -O3 because some program like kernel usually run faster with less memory trace. You might want to optimize individual modules, though.

For the rest of the packages(I know you didn't ask, but it doesn't stop me. :), you could try some crazy optimization. The hardest thing to decide is that which optimization flags in gcc work best for your system. Should you use all optimization flags? Will these flags break your system?

Inspired by rocklinux, I've tried to benchmark individual optimization flag, i.e. test each flag and discard those flags which don't give your system performance gain. Of course, the script used in link above is pretty old and you must modify for gcc3.2+. Thanks to lameass filter I won't post my script here.

That sound like wasting of time but the result is satisfying. The max. yield I could gain is as much as 19% in comparing to plain -O3 optimization. Here are the result:

vendor_id : GenuineIntel
model name : Mobile Pentium MMX
flags : fpu vme de pse tsc msr mce cx8 mmx
gcc version 3.2 (i586-pc-linux-gnu)
Result: '-O3 -march=pentium-mmx -fomit-frame-pointer -finline-functions-fcse-follow-jumps -funroll-loops -frerun-cse-after-loop - frerun-loop-opt -fno-cprop-registers -funroll-all-loops -maccumulate-outgoing-args -fschedule-insns'
Performance gain(compare to -O3 only) ~ 9.9%

vendor_id : GenuineIntel
model name : Pentium III (Coppermine)
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse
gcc version 3.2 (i686-pc-linux-gnu)
Result: '-O3 -march=pentium3 -fomit-frame-pointer -finline-functions -funroll-loops'
Performance gain(compare to -O3 only) ~ 13.7%

vendor_id : AuthenticAMD
model name : AMD Athlon(TM) MP 2000+ (a dual CPU system)
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
vendor_id : AuthenticAMD
model name : AMD Athlon(TM) MP 2000+
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
gcc version 3.2 (i686-pc-linux-gnu)
Result: '-O3 -march=athlon-mp -fomit-frame-pointer -finline-functions -fforce-mem -s -funroll-loops -frerun-loop-opt -fdelete- null-pointer-checks -fprefetch-loop-arrays -ffast-math -maccumulate-outgoing-args -fschedule-insns'
Performance gain(compare to -O3 only) ~ 19.6%

19.6%!! If you asked me, it worths it to optimize your desktop; but to the server, you'd like to have it running stable than to have it running 19% faster, you can trust me on that. :)

PS. In the processing of testing, I found some flags are dangerous and better use with care: -fmove-all-movables, -frename-registers and -malign-double. I suspected that they broke my file-util, which corrupted my entire fs. Just be careful.