Application Optimization with Compilers for LOP
An anonymous reader writes "Interested in tuning your C/C++ applications for Linux on POWER? This article compares the optimization options for both Linux on POWER C/C++ compilers: GCC and IBM XL C/C++. This paper also reviews tactics, such as Interprocedural Analysis, Profile Directed Feedback, and High Order Transformations, which are used by one or both of the compilers to extract higher performance from the Power architecture."
PowerPC CPU is a cool design - not only does it deliver great performance at lower clock speeds, but the entire design is great for compiler devs .
For one, they have true 3 register operations. Which means that every binary operation has a src1, src2, and dst. Also all opcodes are 32 bit - no exceptions (jmp offsets are easy to check for).
Because of 32 registers (not a measly 8), most of the code can run very fast off them , especially those tight loops. Also the cache touch instructions which do not segfault for invalid addresses helps you fetch arrays before their indexes are validated.
All in all, I prefer PPC to x86 on any day. Now if only they'd have a common FPU opcode set.
Quidquid latine dictum sit, altum videtur