Slashdot Mirror


Mac OS X Built For CISC, Not RISC

WCityMike writes "One of the programmers at Unsanity, maker of haxies, recently posted a rather shocking relevation on the company's weblog. He says that Mac OS X's Mach-O runtime ABI (Application Binary Interface) comes from a NeXTStep design for 68K processorts, and is not designed for the PowerPC architechture. Had they used the latter, things would have been approximately 10-12 percent faster. And supposedly, they can't fix it now without breaking all existing applications." The developer mentions there are workarounds in the newest GCC, but only for newly compiled programs.

4 of 82 comments (clear)

  1. Re:I'm confused by nadador · · Score: 5, Informative

    It just so happens that I friend of mine has a copy of "PowerPC Mircoprocessor Family: Programming Environments for 32-bit Microprocessors" sitting on his desk, which I grabbed. Here is how PowerPC processors branch (from sectino 4.2.4.1 of said dead-tree document):

    1. Branch relative addressing mode - the immediate displacement operand is sign exteneded and added to the current instruction address to produce the branch target address. So, PC relative addressing. There is no need for a programmatically accessible program counter because this is all done by the branch execution unit. Single 32-bit instruction.

    2. Branch conditional to relative addressing mode - same as branch relative addressing, except that the branch is only executed if the proper condition codes are set. Single 32-bit instruction.

    3. Branch to absolute addressing - the operand address is sign extended and used as the branch target. As the name implies, this is absolute addressing. Only problem is, the operand address is only 23 bits wide in a 32-bit implementation, and with the zero pad, it gives only 25 bits of absolute address (word alignment required). So, if you absolute address anything, you can only absolute address 25 bits worth of the address space.

    4. Branch conditional to absolute - same as regular absolute addressing, except that you have to encode condition codes, so the operand address is nowo only 13 bits if I read the diagrams correctly, meaning that you can only absolutely address 15 bits of address space with the zero pad.

    5. Branch conditional to link register - if you clobber the link register, you can branch to a 32-bit address. Of course, you have to clobble the link register, so I would think this would be most helpful in returning from a function call, not going to it, since the link register holds the return address. And if you use it forward instead of returning, you have to load the link register.

    6. Branch conditional to count register - same as link register branching as above.

    All of that said, the reason that the Mac OS ABI uses PC relative addressing is because the only way to fully address a 32-bit address space is to do PC relative addressing. According to this book, there is no two instruction width branch, eg a branch instruction which encodes an entire 32-bit absolute address in two 32-bit words (one word for branch encoding and condition codes, one word for the whole 32-bit address).

    This leads me to believe that there is no way to do all absolute addressing on PowerPC unless you implement new instructions (which will take more time to get to the processor, and to decode) or limit yourself to 15 or 25 bits of the address space.

    So, the short version is that that there is no way for the Mac OS ABI to do absolute addressing.

    --

    Outside of a dog, a book is a man's best friend. Inside a dog, its too dark to read.
  2. Some ways to get away from PIC code by norwoodites · · Score: 5, Informative

    1. Don't use externs or static variables.
    2. If you are going to use an extern variable in a tight loop, don't use a local variable and assign it after the loop.
    3. Pass the option -mdyanmic-no-pic to gcc if the source is in the final program because it does not work in a boundle or a dynamic library (or framework).

    The AIX ABI/PEF ABI uses a register called the TOC for PIC code but it is stored with the function reference so you lose one register if the Darwin ABI goes over to the PEF ABI. You get one more register to play around with if you do not use extern or static variables.

  3. Re:I'm confused by Anonymous Coward · · Score: 5, Informative

    > So, the point is that in every case, some form of relative addressing is used. In order to make relocatable code, ie code that can be linked happily with other binary objects, you have to have some sort of reference address, and PC-relative addressing is the only way to do this.

    This is wrong. The PowerPC ABI, as defined by IBM, uses r2 as a TOC (Table of Contents) pointer. The PC is never needed or used as all data space references are made relative to the TOC, not the PC. Apart from being faster, this has several other advantages, not the least of which is that one copy of code can have multiple data contexts without involving VM.

    int foo;
    int bar(void) { return foo; }

    with macho:
    _bar:
    mflr r0,lr
    bl *+4
    mflr r2
    mtlr r0
    addis r3,r2,ha16(foo)
    lwz r3,lo16(foo)(r3)
    blr

    with IBM conventions:
    .bar:
    lwz r3,foo(rTOC)
    blr

  4. Re:what's it going to be like w/ the 970? by Lars+T. · · Score: 5, Informative
    From The PowerPC Compiler Writer's Guide (warning: PDF):
    Both 32-bit and 64-bit implementations support most of the instructions defined by the PowerPC architecture. The 64-bit implementations support all the application instructions supported 32-bit implementations as well as the following application instructions: [...]

    The 64-bit implementations have two modes of operation determined by the 64-bit mode (SF) bit in the Machine State Register: 64-bit mode (SF set to 1) and 32-bit mode (SF cleared to 0), for compatibility with 32-bit implementations. Application code for 32-bit implementations executes without modification on 64-bit implementations running in 32-bit mode, yielding identical results. All 64-bit implementation instructions are available in both modes. Identical instructions, however, may produce different results in 32-bit and 64-bit modes:

    Addressing--Although effective addresses in 64-bit implementations have 64 bits, in 32-bit mode, the high-order 32 bits are ignored during data access and set to zero during instruction fetching. This modification of the high-order bits of the address might produce an unexpected jump following the transition from 64-bit mode to 32-bit mode. Status Bits--The register result of arithmetic and logical instructions is independent of mode, but setting of status bits depends on the mode. In particular, recording, carry-bit-setting, or overflow-bit-setting instruction forms write the status bits relative to the mode. Changing the mode in the middle of a code sequence that depends on one of these status bits can lead to unexpected results. Count Register--The entire 64-bit value in the Count Register of a 64-bit implementation is decremented, even though conditional branches in 32-bit mode only test the low-order 32 bits for zero.
    IOW, even if they use "32-bit compat mode", there should be now speed penalty whatsoever.
    --

    Lars T.

    To the guy who modded me down from perfect to terrible Karma - Apple haters still suck