Mac OS X Built For CISC, Not RISC
WCityMike writes "One of the programmers at Unsanity, maker of haxies, recently posted a rather shocking relevation on the company's weblog. He says that Mac OS X's Mach-O runtime ABI (Application Binary Interface) comes from a NeXTStep design for 68K processorts, and is not designed for the PowerPC architechture. Had they used the latter, things would have been approximately 10-12 percent faster. And supposedly, they can't fix it now without breaking all existing applications." The developer mentions there are workarounds in the newest GCC, but only for newly compiled programs.
Keep in mind that gcc3 optimizations mentioned would only affect applications, *not* the dynamic shared libraries, which are a significant part of the OS (think /System/Library/Frameworks)
It just so happens that I friend of mine has a copy of "PowerPC Mircoprocessor Family: Programming Environments for 32-bit Microprocessors" sitting on his desk, which I grabbed. Here is how PowerPC processors branch (from sectino 4.2.4.1 of said dead-tree document):
1. Branch relative addressing mode - the immediate displacement operand is sign exteneded and added to the current instruction address to produce the branch target address. So, PC relative addressing. There is no need for a programmatically accessible program counter because this is all done by the branch execution unit. Single 32-bit instruction.
2. Branch conditional to relative addressing mode - same as branch relative addressing, except that the branch is only executed if the proper condition codes are set. Single 32-bit instruction.
3. Branch to absolute addressing - the operand address is sign extended and used as the branch target. As the name implies, this is absolute addressing. Only problem is, the operand address is only 23 bits wide in a 32-bit implementation, and with the zero pad, it gives only 25 bits of absolute address (word alignment required). So, if you absolute address anything, you can only absolute address 25 bits worth of the address space.
4. Branch conditional to absolute - same as regular absolute addressing, except that you have to encode condition codes, so the operand address is nowo only 13 bits if I read the diagrams correctly, meaning that you can only absolutely address 15 bits of address space with the zero pad.
5. Branch conditional to link register - if you clobber the link register, you can branch to a 32-bit address. Of course, you have to clobble the link register, so I would think this would be most helpful in returning from a function call, not going to it, since the link register holds the return address. And if you use it forward instead of returning, you have to load the link register.
6. Branch conditional to count register - same as link register branching as above.
All of that said, the reason that the Mac OS ABI uses PC relative addressing is because the only way to fully address a 32-bit address space is to do PC relative addressing. According to this book, there is no two instruction width branch, eg a branch instruction which encodes an entire 32-bit absolute address in two 32-bit words (one word for branch encoding and condition codes, one word for the whole 32-bit address).
This leads me to believe that there is no way to do all absolute addressing on PowerPC unless you implement new instructions (which will take more time to get to the processor, and to decode) or limit yourself to 15 or 25 bits of the address space.
So, the short version is that that there is no way for the Mac OS ABI to do absolute addressing.
Outside of a dog, a book is a man's best friend. Inside a dog, its too dark to read.
1. Don't use externs or static variables.
2. If you are going to use an extern variable in a tight loop, don't use a local variable and assign it after the loop.
3. Pass the option -mdyanmic-no-pic to gcc if the source is in the final program because it does not work in a boundle or a dynamic library (or framework).
The AIX ABI/PEF ABI uses a register called the TOC for PIC code but it is stored with the function reference so you lose one register if the Darwin ABI goes over to the PEF ABI. You get one more register to play around with if you do not use extern or static variables.
A 64-bit PowerPC chip doesn't need to "emulate" anything to run 32-bit code, unlike the Itanium, which uses completely different instructions. There should be no speed hit: the only real difference is that the CPU can perform calculations using all 64 bits. This also won't remove the speed hit caused by the ABI.
> So, the point is that in every case, some form of relative addressing is used. In order to make relocatable code, ie code that can be linked happily with other binary objects, you have to have some sort of reference address, and PC-relative addressing is the only way to do this.
.bar:
This is wrong. The PowerPC ABI, as defined by IBM, uses r2 as a TOC (Table of Contents) pointer. The PC is never needed or used as all data space references are made relative to the TOC, not the PC. Apart from being faster, this has several other advantages, not the least of which is that one copy of code can have multiple data contexts without involving VM.
int foo;
int bar(void) { return foo; }
with macho:
_bar:
mflr r0,lr
bl *+4
mflr r2
mtlr r0
addis r3,r2,ha16(foo)
lwz r3,lo16(foo)(r3)
blr
with IBM conventions:
lwz r3,foo(rTOC)
blr
I remember those days, A4 and A5 worlds, one was used for globals in applications while the other was used for globals in resource code (i.e. extensions, control panels, and modules for programs). But Apple never used the PC as the basis for the position independence on the 68K or the PPC until Darwin (Mac OS X). The problem for using the PC is that you have to do a branch and store the result only if you need to externs and such. The problem for using a register that stores the value all the time is that you are wasting a register (68K case: A4 and A5; PPC case RTOC or r2).
Lars T.
To the guy who modded me down from perfect to terrible Karma - Apple haters still suck