Slashdot Mirror


Mac OS X Built For CISC, Not RISC

WCityMike writes "One of the programmers at Unsanity, maker of haxies, recently posted a rather shocking relevation on the company's weblog. He says that Mac OS X's Mach-O runtime ABI (Application Binary Interface) comes from a NeXTStep design for 68K processorts, and is not designed for the PowerPC architechture. Had they used the latter, things would have been approximately 10-12 percent faster. And supposedly, they can't fix it now without breaking all existing applications." The developer mentions there are workarounds in the newest GCC, but only for newly compiled programs.

23 of 82 comments (clear)

  1. the return of "fat" binaries? by tps12 · · Score: 4, Funny

    This is good news for long-time Mac fans. Back in the day ("the day" was 1994 or so, IIRC) we Mac users took seeking out the correct 68k or PPC binaries as a sign of our superiority to PC users. While Windozers happily downloaded software that would run on circa 1987 hardware, we enlightened ones could narrow our searches to programs specifically compiled for our platforms. We could even get "fat" binaries, and optionally remove the unneeded binary code using a small freeware app.

    With OS X, I had hoped we would again have a situation where just using the Mac required that extra step of compatibility checking, setting us apart from the drooling masses of Gates-worshippers. Sadly, with the Classic compatibility layer, it did not come to pass. Hopefully this revelation will set things aright.

    --

    Karma: Good (despite my invention of the Karma: sig)
  2. Now we know why it was so slow, right? by Nutrimentia · · Score: 4, Funny

    But tell me, if they could slide a PPC ABI in with the new journaling system update, couldn't they just get the performance hit and gain to cancel out? It'd be like journalling system for free! How hard can it be? 10.2.5 maybe?

  3. what's it going to be like w/ the 970? by banky · · Score: 5, Interesting

    Read World Tech talks about the 970 in depth... I wonder how the addition of 64-bit arch AND the 32-bit compat mode will affect things.

    Like the Itanium, with its poor backwards compat performance? Or will it be speedy?

    --
    ZOMG I WOULD LOVE TO KNOW ABOUT YOUR FEELINGS ON MACINTOSH VERSUS WINDOWS, VI VERSUS EMACS, AND HOW YOU'RE NOT A DORK
    1. Re:what's it going to be like w/ the 970? by Lars+T. · · Score: 5, Informative
      From The PowerPC Compiler Writer's Guide (warning: PDF):
      Both 32-bit and 64-bit implementations support most of the instructions defined by the PowerPC architecture. The 64-bit implementations support all the application instructions supported 32-bit implementations as well as the following application instructions: [...]

      The 64-bit implementations have two modes of operation determined by the 64-bit mode (SF) bit in the Machine State Register: 64-bit mode (SF set to 1) and 32-bit mode (SF cleared to 0), for compatibility with 32-bit implementations. Application code for 32-bit implementations executes without modification on 64-bit implementations running in 32-bit mode, yielding identical results. All 64-bit implementation instructions are available in both modes. Identical instructions, however, may produce different results in 32-bit and 64-bit modes:

      Addressing--Although effective addresses in 64-bit implementations have 64 bits, in 32-bit mode, the high-order 32 bits are ignored during data access and set to zero during instruction fetching. This modification of the high-order bits of the address might produce an unexpected jump following the transition from 64-bit mode to 32-bit mode. Status Bits--The register result of arithmetic and logical instructions is independent of mode, but setting of status bits depends on the mode. In particular, recording, carry-bit-setting, or overflow-bit-setting instruction forms write the status bits relative to the mode. Changing the mode in the middle of a code sequence that depends on one of these status bits can lead to unexpected results. Count Register--The entire 64-bit value in the Count Register of a 64-bit implementation is decremented, even though conditional branches in 32-bit mode only test the low-order 32 bits for zero.
      IOW, even if they use "32-bit compat mode", there should be now speed penalty whatsoever.
      --

      Lars T.

      To the guy who modded me down from perfect to terrible Karma - Apple haters still suck

  4. i have a funny feeling that by paradesign · · Score: 4, Insightful

    that apple knows what their doing, and probably had a very good reason or doing what they did. I mean its not like they havent had the past four years to change it or anything, but whatever.

    --
    I want 2D games back.
    1. Re:i have a funny feeling that by mikedaisey · · Score: 5, Insightful


      I would agree with you, but this is a legacy decision that would have saved them crucial months early in the OSX creation process...so I can just as easily see them making the choice then, to get a shipping system out the door, THEN discovering that OSX's biggest problem is that it isn't fast enough, and now trying to retrofit a solution. That matches all the information I've gleaned from mac sites in the past.

      It sucks, but if they hadn't gotten X out the door when they did Apple would have been dead in the water--it was already horrendously late for those who started waiting for Copland.

  5. Re:Future by trash+eighty · · Score: 5, Funny

    yeah can't wait for my new 68040 ubermac ;)

  6. I have known this for months by Leimy · · Score: 5, Insightful

    Uhm... This is news? I am not shocked at all. In order to get the product out sooner rather than later they stuck with the old ABI that was used for Motorola 68k[probably wrong but I have had no coffee yet]. Anyway some people say that the performance loss as a result of this "corner cutting" may be up to 7 cycles per function call which just means we should all write our code as inlines and macros :).

    Just kidding.... Anyway it may or may not be easy/hard to fix... the problem is now that its out there changing the ABI [the C ABI !!! the way functions get called and parameters get passed] is going to break everything. Maybe they can fix it but not at a significant cost to 3rd party software... I could be wrong and not have thought this out well enough though.

  7. Yay! FUD! by Anonymous Coward · · Score: 5, Insightful

    And supposedly, they can't fix it now without breaking all existing applications.

    There is no reason that an operating system can't support multiple ABIs. That means that New applications wouldn't work on older versions of the OS, but it certainly doesn't mean that they can't fix the "problem" without breaking current applications.

  8. Re:Wonder if they tried to keep it a secret? by MarkX · · Score: 5, Insightful

    Seeing as the systems incorporates Darwin which is open source I don't really think they could keep this a secret. Anyone can download Darwin and see the implementation of the ABI. Also they have been using GCC for some time and the implementation of the ABI would also be exposed there.

    At the fundamental levels of OS X, below the GUI, there is very little that Apple can dissemble about. It is OSS. Anyone can download it and read the code.

    MarkX

  9. a likely manner in which this will be corrected... by constantnormal · · Score: 5, Interesting

    While I am in complete agreement that it was originally done this way in the interests of expediency, we can all see a point very soon where the instruction set will be in a (minor/major?) state of upheaval -- when they revamp OS X for 64-bit operation on the IBM 970 chipset.

    However, it's not quite as easy as rolling it into that architecture, as they wil probably rely of the 32-bit PPC compatibility mode of the 970 to bring along a lot of the existing baggage, ruling out a wholesale conversion to another API. Which means they will either implement a foundation to migrate toward the new API, or invoke yet another API (probably 64-bit 970 only) that uses the appropriate model. Either way, it will be some years (if ever, as we can still code 68X apps using an API from decades ago that run under emulation on OS X) before we see an efficient API in widespread use.

    In any event, they will certainly retain a CISC-oriented API in the OS X stable of architectures, if only to be able to continue to wave the specter of an open source OS X on X86 in front of Microsoft, as sort of a "mutual assured destruction" weapon to prevent Microsoft from wiping them out, and possibly as a negotiating tool in keeping Microsoft coding for the Mac.

    But -- since Apple is pretty well (apparently) hamstrung on making great strides in hardware performance over the next year or so, maybe they will push the software changes as the next best way to get needed speed. It wouldn't be the first time Apple capriciously honked off developers by changing all the rules and rendering years of development obsolete.

    And the whole thing may be moot, as it appears that one can get equivalent performance improvement by compiling with gcc3.

  10. Some tweaking req'd by BoomerSooner · · Score: 5, Insightful

    The tweaking is at the Kernel/OS level so applications can run without modification (in theory). My guess is some apps will need patches while others will be okay. A perfect example of this is apps that run under 10.0 and 10.1 but not 10.2.

    With the speed of current and future processors the delivery of a stable OS is preferable to a 3 year late, and tweaked OS that runs the same things just a little faster.

  11. Possibly fixed with a 64-bit OS X? by kuwan · · Score: 5, Interesting

    It seems that Apple could easily correct this when they update OS X for a 64-bit chip (namely the PowerPC 970). Applications will need to be recompiled to be 64-bit anyway, so why not update the ABI in the process? It would certainly be incentive for developers to update their apps...

    Imagine:

    Apple:"Update your apps to 64-bit and see a 10% performance gain."

    (Of course most apps really won't need to be 64-bit, but this would be incentive for developers to update them and users to buy new machines.)

    As for Carbon, it states in the article that only Mach-o binaries use the CISC-style ABI, Carbon is not affected and uses a PowerPC-style ABI. This could be a way to "prove" his theory that you could get a 10-12% performance increase. Build two test apps, one in Cocoa and one in Carbon and then compare them to see if there really is a 10-12% speed difference.

  12. Yeah, an open sourced "secret" by BoomerSooner · · Score: 4, Insightful

    The code is open source and the gcc 3 code is open source. It doesn't appear to me they are hiding anything.

  13. Re:I'm confused by nadador · · Score: 5, Informative

    It just so happens that I friend of mine has a copy of "PowerPC Mircoprocessor Family: Programming Environments for 32-bit Microprocessors" sitting on his desk, which I grabbed. Here is how PowerPC processors branch (from sectino 4.2.4.1 of said dead-tree document):

    1. Branch relative addressing mode - the immediate displacement operand is sign exteneded and added to the current instruction address to produce the branch target address. So, PC relative addressing. There is no need for a programmatically accessible program counter because this is all done by the branch execution unit. Single 32-bit instruction.

    2. Branch conditional to relative addressing mode - same as branch relative addressing, except that the branch is only executed if the proper condition codes are set. Single 32-bit instruction.

    3. Branch to absolute addressing - the operand address is sign extended and used as the branch target. As the name implies, this is absolute addressing. Only problem is, the operand address is only 23 bits wide in a 32-bit implementation, and with the zero pad, it gives only 25 bits of absolute address (word alignment required). So, if you absolute address anything, you can only absolute address 25 bits worth of the address space.

    4. Branch conditional to absolute - same as regular absolute addressing, except that you have to encode condition codes, so the operand address is nowo only 13 bits if I read the diagrams correctly, meaning that you can only absolutely address 15 bits of address space with the zero pad.

    5. Branch conditional to link register - if you clobber the link register, you can branch to a 32-bit address. Of course, you have to clobble the link register, so I would think this would be most helpful in returning from a function call, not going to it, since the link register holds the return address. And if you use it forward instead of returning, you have to load the link register.

    6. Branch conditional to count register - same as link register branching as above.

    All of that said, the reason that the Mac OS ABI uses PC relative addressing is because the only way to fully address a 32-bit address space is to do PC relative addressing. According to this book, there is no two instruction width branch, eg a branch instruction which encodes an entire 32-bit absolute address in two 32-bit words (one word for branch encoding and condition codes, one word for the whole 32-bit address).

    This leads me to believe that there is no way to do all absolute addressing on PowerPC unless you implement new instructions (which will take more time to get to the processor, and to decode) or limit yourself to 15 or 25 bits of the address space.

    So, the short version is that that there is no way for the Mac OS ABI to do absolute addressing.

    --

    Outside of a dog, a book is a man's best friend. Inside a dog, its too dark to read.
  14. Re:I'm confused by nadador · · Score: 4, Interesting

    > Its not about branching. Its about data references using PC relative addressing. The PowerPC has no PC relative data addressing modes.

    Point taken. Section 4.2.3.1 of the same book is "Integer load and store address generation".

    1. Register indirect with immediate index addressing for integer loads and stores - In this case, you get a 16-bit index in the instruction added to the value in a general purpose register, which is used to compute the effective address.

    2. Register indirect with index addressing for integer loads and stores - this is the same as above, except that two registers are used and there is no encoded index.

    3. Register indirect addressing integer loads and stores - use just one general purpose register as an address for a load or store.

    So, the point is that in every case, some form of relative addressing is used. In order to make relocatable code, ie code that can be linked happily with other binary objects, you have to have some sort of reference address, and PC-relative addressing is the only way to do this.

    Even though there is no PC-relative addressing mode, the only way to guarentee that the relative addresses used in different object files won't clash is to do PC-relative. The fact that this is not easy on the PowerPC doesn't make it any less necessary.

    --

    Outside of a dog, a book is a man's best friend. Inside a dog, its too dark to read.
  15. Some ways to get away from PIC code by norwoodites · · Score: 5, Informative

    1. Don't use externs or static variables.
    2. If you are going to use an extern variable in a tight loop, don't use a local variable and assign it after the loop.
    3. Pass the option -mdyanmic-no-pic to gcc if the source is in the final program because it does not work in a boundle or a dynamic library (or framework).

    The AIX ABI/PEF ABI uses a register called the TOC for PIC code but it is stored with the function reference so you lose one register if the Darwin ABI goes over to the PEF ABI. You get one more register to play around with if you do not use extern or static variables.

  16. Of Babies and Bath Water. by JQuick · · Score: 5, Insightful
    Before jumping to conclusions and flinching lest we be struck by the falling sky, let's take a step back.

    First look at the most crucial benefits of the runtime environment. Mach supports an efficient and flexible framework for multiple memory objects. Objc leverages this by supporting the efficient mapping and unmapping of new bundles.

    You may think of a bundle as a set of related objects in a language like Java, but don't take that analogy too far. The concepts of delegation, and protocols, usually mean that different bundles of code have clean interfaces that do not require recompilation when one or the other changes. Sometimes even knowledge of each other's type is irrelevant.

    In any case, the best design for objc applications is a collection of separate UI definitions or nib files, and one or more libraries of code which are searched and loaded as needed at runtime.

    Statically linked code, is more efficient for some tasks, but in the context of good objc design does not fit very well. Text which is statically linked is more fragile, it must be recompiled more often. It can also take more time to initialize and load; using late binding and lazy loading, only the sections to text and definitions of objects actually called will be mapped in memory.

    Position independent code is absolutely needed for this kind of flexibility at runtime. The gcc compiler grew up on CISC, on 32 bit or 16 architectures. Position independent code, had to be relative to something, and the most commonly useful location was always "You are here", the program counter.

    I'm not sure whether a less frequently changed relative address such as a start of bundle address makes more sense for gcc on ppc. In any event, however, I would certainly not be willing to categorize Apple's reliance on position independent code as a bug. By default, use pic.

  17. Re:I'm confused by Anonymous Coward · · Score: 5, Informative

    > So, the point is that in every case, some form of relative addressing is used. In order to make relocatable code, ie code that can be linked happily with other binary objects, you have to have some sort of reference address, and PC-relative addressing is the only way to do this.

    This is wrong. The PowerPC ABI, as defined by IBM, uses r2 as a TOC (Table of Contents) pointer. The PC is never needed or used as all data space references are made relative to the TOC, not the PC. Apart from being faster, this has several other advantages, not the least of which is that one copy of code can have multiple data contexts without involving VM.

    int foo;
    int bar(void) { return foo; }

    with macho:
    _bar:
    mflr r0,lr
    bl *+4
    mflr r2
    mtlr r0
    addis r3,r2,ha16(foo)
    lwz r3,lo16(foo)(r3)
    blr

    with IBM conventions:
    .bar:
    lwz r3,foo(rTOC)
    blr

  18. Re:I'm confused by clem.dickey · · Score: 5, Interesting

    (The parent needs to be modded up. He may be an AC, but his information is accurate.)

    One problem with TOC is that is you are limited to 16K external addresses. Offset "foo" in the TOC example is 16 bits, and the low two are zero. With 64-bit addressing I suppose that drops to 8K externals.

    Another characteristic calling a separately compiled function requires that you load a different TOC. The PC-relative scheme requires that you load one value: the PC; the TOC scheme requires that you load two new values: the new PC and the new TOC.

    On the plus side, TOC makes shared libraries easier to manage because external addresses are bound to a non-shared data area.

  19. Big deal. by 0x0d0a · · Score: 5, Insightful

    This is mindblowingly unimportant. Can *anyone* think of *any* company that has a perfect ABI? No, because processors evolve, and the ABI has to stay the same. When I write an x86 program on my Linux box, it pushes *all* the arguments onto the stack. Is that the best way to do things? No. Is it done anyway? Yup. Does anyone go into a tizzy about it? No.

    Seriously, the x86 Linux ABI is probably worse off...different (worse) byte alignment from Windows, the abovementioned everything-goes-on-the-stack....

  20. portability by Nomad37 · · Score: 5, Interesting
    It seems the real reason apple did this is to maintain portability. Jobs has said multiple times that 'We like to have options' when asked about future chips in macs. It's been backed up by the top brass at apple too (eg, Infoworld article with VP's).

    It seems that they can emulate a pc-register in a risc architecture, but could they (easily) do it the other way around? Perhaps this is the real reason why they kept the abi the way it is: so they could easily port os x to whatever platform they like...

    --
    Pessimism of the intellect, optimism of the will! - Antonio Gramsci.
  21. GCC 3 by Anonymous Coward · · Score: 4, Interesting

    Using gcc 3 with OS X Jaguar 10.2.1, I checked this out using gcc's option to produce assembly-language code (gcc -S). It turns out that a function call uses only a single additional instruction (branch to link register). Since Apple has compiled OS X Jaguar with gcc 3, only legacy shared libraries and pre-OS 10.2 applications should be affected by the RISC/CISC problem. This is still a significant performance hit, but I would imagine that it is less than 10 percent figure given earlier, and also easier to fix.