Mac OS X Built For CISC, Not RISC
WCityMike writes "One of the programmers at Unsanity, maker of haxies, recently posted a rather shocking relevation on the company's weblog. He says that Mac OS X's Mach-O runtime ABI (Application Binary Interface) comes from a NeXTStep design for 68K processorts, and is not designed for the PowerPC architechture. Had they used the latter, things would have been approximately 10-12 percent faster. And supposedly, they can't fix it now without breaking all existing applications." The developer mentions there are workarounds in the newest GCC, but only for newly compiled programs.
This is good news for long-time Mac fans. Back in the day ("the day" was 1994 or so, IIRC) we Mac users took seeking out the correct 68k or PPC binaries as a sign of our superiority to PC users. While Windozers happily downloaded software that would run on circa 1987 hardware, we enlightened ones could narrow our searches to programs specifically compiled for our platforms. We could even get "fat" binaries, and optionally remove the unneeded binary code using a small freeware app.
With OS X, I had hoped we would again have a situation where just using the Mac required that extra step of compatibility checking, setting us apart from the drooling masses of Gates-worshippers. Sadly, with the Classic compatibility layer, it did not come to pass. Hopefully this revelation will set things aright.
Karma: Good (despite my invention of the Karma: sig)
Maybe Apple Are Anticipating a Move to A CISC style processor in the future?
But tell me, if they could slide a PPC ABI in with the new journaling system update, couldn't they just get the performance hit and gain to cancel out? It'd be like journalling system for free! How hard can it be? 10.2.5 maybe?
Etc, etc, ad nauseam, and so on and so forth.
Read World Tech talks about the 970 in depth... I wonder how the addition of 64-bit arch AND the 32-bit compat mode will affect things.
Like the Itanium, with its poor backwards compat performance? Or will it be speedy?
ZOMG I WOULD LOVE TO KNOW ABOUT YOUR FEELINGS ON MACINTOSH VERSUS WINDOWS, VI VERSUS EMACS, AND HOW YOU'RE NOT A DORK
that apple knows what their doing, and probably had a very good reason or doing what they did. I mean its not like they havent had the past four years to change it or anything, but whatever.
I want 2D games back.
Here is the text:
Unsanity.org
October 21, 2002
Mach-O ABI
Here I go again, ranting about Mac OS X bowels. This time I want to talk about particular implementation details of Mach-O runtime ABI (Application Binary Interface). Before you get too confused, there are two different things under the 'Mach-O' name:
* Mach-O ABI, which defines how every application in the system executes and calls functions (stack conventions, register usage, and more);
* Mach-O File Format, which defines a way for compiled executables to store different parts of them in the same file (compiled code, data, strings, etc).
The latter is not what I want to talk about today; the first is what puzzles me most. I admit I am just a "small programmer" with no relationship with the powers-that-be at Apple at all (this means, no insider contacts who can explain the reasoning behind the particular important design decisions to me), so my impressions, judgements and guesses expressed in this article may be slightly or totally off the mark. I, however, as many other developers who have dug deep into the implementation of such things, can see obvious drawbacks and oddities about Mach-O ABI, and this is what I am going to talk about.
Mach-O originates from NeXTstep, an operating system created at NeXT for its NeXTstation machines, and later expanded to x86 hardware. NeXTstations were originally based on Motorola 68k CPUs, just like old Macintoshes. Mac OS (classic), on the other hand, used an ABI for PowerPC which followed the ABI principles defined in a document by IBM/Moto for PPC processors. So as you all may already know, m68k and x86 are CISC architectures; PowerPC used in all new Macs is RISC.
To make a long story short, Mac OS X uses an ABI designed for CISC processors, mostly ignoring RISC design principles.
What do I mean by that? Mach-O ABI we see now used in Mac OS X is more or less a direct port of NeXT's Mach-O designed for m68k - it relies on PC (program counter) register to perform various manipulations with data (for the geeks: PC-relative addressing). There's nothing wrong with that, as its an effective and common practice, except for one little thing: there is no PC register in RISC processors (programmatically accessible). That is not a show-stopper though - Mach-O for PowerPC just takes one of 32 general purpose registers and turns it into a program counter-style register, to base all offset calculations off it. That works well, as you can see, as all of Mac OS X applications (except for the ones compiled with Carbon/CFM) use the Mach-O ABI.
That approach works well, except for one small thing: global/static data access adds about 7 cycle overhead per function, and about triple of that for cross-context calls (that is for the G4 class processor) compared to the old, Mac OS Classic ABI (excuse me for the geek talk). Mac OS Classic CFM ABI, in comparison, needed almost 0 cycles for static data access and about 5 for cross-context calls. To rephrase - applications in Mac OS X could be faster, if the Mach-O ABI followed the principles set for the PowerPC chip, and not the ones created over a decade ago for CISC ones.
This brings us the question, "how much faster would the applications be if the ABI was done right?". The answer is, according to some tests done by my friends at #macdev IRC channel, the speed gain would be 10-30%, depending on each particular application (how often does it calls functions). Realistically, the speed gain would be around 10 to 12 per cent (how do I get these numbers, below).
So why did Apple used an outdated ABI for a modern operating system? Frankly, I don't know the reason. About the best one I have heard - it saved Apple a few months in the Mac OS X development time so they didn't have to do massive updates to its NeXT-derived tool chain.
There are signs of change though -- the recent update to GCC, the compiler shipped with OSX, allows it to perform so-called -mdynamic-no-pic optimization, which hard-codes the data addresses in the code, so the result is roughly equivalent to the CFM ABI used in Mac OS Classic -- so the GCC itself, compiled with that optimization, is 10% faster. Applications, to take advantage of that, need to be recompiled, so it doesn't affect 80% of the titles already shipped for Mac OS X. Then again, the optimization above only works for executables and not shared libraries.
Either way, there is no way to change the ABI now, as it would break all of the existing applications - which is obviously not what Apple (or us) would want.
And after all, who cares about a 10% speed loss? You can always get a faster Mac, right?
Further reading:
Mach-O Runtime Architecture [developer.apple.com]
CFM-Based Runtime Architecture [developer.apple.com]
Thanks to #macdev regulars and an anonymous Apple engineer for helping me with this article.
Update 10/21: fixed a few phrases in the text to make it more clear; I've also been told OPENstep runs on RISC processors (non-PowerPC) - however, I have not investigated how the Mach-O ABI works there - quite possible it obeys the PowerPC guidelines, although I am pretty sure it does the same as on PowerPC.
Posted by slava at October 21, 2002 03:40 AM
Uhm... This is news? I am not shocked at all. In order to get the product out sooner rather than later they stuck with the old ABI that was used for Motorola 68k[probably wrong but I have had no coffee yet]. Anyway some people say that the performance loss as a result of this "corner cutting" may be up to 7 cycles per function call which just means we should all write our code as inlines and macros :).
Just kidding.... Anyway it may or may not be easy/hard to fix... the problem is now that its out there changing the ABI [the C ABI !!! the way functions get called and parameters get passed] is going to break everything. Maybe they can fix it but not at a significant cost to 3rd party software... I could be wrong and not have thought this out well enough though.
And supposedly, they can't fix it now without breaking all existing applications.
There is no reason that an operating system can't support multiple ABIs. That means that New applications wouldn't work on older versions of the OS, but it certainly doesn't mean that they can't fix the "problem" without breaking current applications.
Who says they need to "fix" it? Perhaps Motorola may be losing a big customer in the future... I've heard from more than one source now that big changes may be coming in Mac hardware... Absolutely all rumors of course, but this fact fits in nicely with what I've already heard...
Please consider making an automatic monthly recurring donation to the EFF
I wonder if Apple tried to keep this secret and this is the reason they kept so many of their APIs and underlying OSX undocumented? I think Apple is none too happy with Unsanity right now and might purposefully break their haxies with their next update, guess we'll wait and see. :)
While I am in complete agreement that it was originally done this way in the interests of expediency, we can all see a point very soon where the instruction set will be in a (minor/major?) state of upheaval -- when they revamp OS X for 64-bit operation on the IBM 970 chipset.
However, it's not quite as easy as rolling it into that architecture, as they wil probably rely of the 32-bit PPC compatibility mode of the 970 to bring along a lot of the existing baggage, ruling out a wholesale conversion to another API. Which means they will either implement a foundation to migrate toward the new API, or invoke yet another API (probably 64-bit 970 only) that uses the appropriate model. Either way, it will be some years (if ever, as we can still code 68X apps using an API from decades ago that run under emulation on OS X) before we see an efficient API in widespread use.
In any event, they will certainly retain a CISC-oriented API in the OS X stable of architectures, if only to be able to continue to wave the specter of an open source OS X on X86 in front of Microsoft, as sort of a "mutual assured destruction" weapon to prevent Microsoft from wiping them out, and possibly as a negotiating tool in keeping Microsoft coding for the Mac.
But -- since Apple is pretty well (apparently) hamstrung on making great strides in hardware performance over the next year or so, maybe they will push the software changes as the next best way to get needed speed. It wouldn't be the first time Apple capriciously honked off developers by changing all the rules and rendering years of development obsolete.
And the whole thing may be moot, as it appears that one can get equivalent performance improvement by compiling with gcc3.
Keep in mind that gcc3 optimizations mentioned would only affect applications, *not* the dynamic shared libraries, which are a significant part of the OS (think /System/Library/Frameworks)
The tweaking is at the Kernel/OS level so applications can run without modification (in theory). My guess is some apps will need patches while others will be okay. A perfect example of this is apps that run under 10.0 and 10.1 but not 10.2.
With the speed of current and future processors the delivery of a stable OS is preferable to a 3 year late, and tweaked OS that runs the same things just a little faster.
It seems that Apple could easily correct this when they update OS X for a 64-bit chip (namely the PowerPC 970). Applications will need to be recompiled to be 64-bit anyway, so why not update the ABI in the process? It would certainly be incentive for developers to update their apps...
Imagine:
Apple:"Update your apps to 64-bit and see a 10% performance gain."
(Of course most apps really won't need to be 64-bit, but this would be incentive for developers to update them and users to buy new machines.)As for Carbon, it states in the article that only Mach-o binaries use the CISC-style ABI, Carbon is not affected and uses a PowerPC-style ABI. This could be a way to "prove" his theory that you could get a 10-12% performance increase. Build two test apps, one in Cocoa and one in Carbon and then compare them to see if there really is a 10-12% speed difference.
infested with jello like fishes no melotron wishes
The code is open source and the gcc 3 code is open source. It doesn't appear to me they are hiding anything.
This leads me to believe that PowerPC does run of the mill PC-relative addressing, so that, for example, branches only take one instruction width, rather than having to store an entire 32-bit address in another location, etc. I'm confused by the purpose of the article, but I think the point was that PC-relative addressing forces the compiler to compute the address of a branch target in a function call to a fully-linked part of the binary. I don't think that address computation of that sort is responsible for a 10-15% performance hit, because wouldn't full time absolute addressing require more memory access on the already saturated MPX bus? If I'm incorrect in this, please tell me, and I will look for more information.
Outside of a dog, a book is a man's best friend. Inside a dog, its too dark to read.
Millions of dollars in burned money from VA-Linux, thousands of man-hours invested in slashcode, untold numbers of CPUs and hard drives sacrificed to the cause, and slashdot's editors/maintainers still can't be bothered to put a spellchecker into their story posting system.
Why, exactly, do they expect people to pay money for this again?
News for Nerds. Stuff that Matters? Like hell.
1. Don't use externs or static variables.
2. If you are going to use an extern variable in a tight loop, don't use a local variable and assign it after the loop.
3. Pass the option -mdyanmic-no-pic to gcc if the source is in the final program because it does not work in a boundle or a dynamic library (or framework).
The AIX ABI/PEF ABI uses a register called the TOC for PIC code but it is stored with the function reference so you lose one register if the Darwin ABI goes over to the PEF ABI. You get one more register to play around with if you do not use extern or static variables.
First look at the most crucial benefits of the runtime environment. Mach supports an efficient and flexible framework for multiple memory objects. Objc leverages this by supporting the efficient mapping and unmapping of new bundles.
You may think of a bundle as a set of related objects in a language like Java, but don't take that analogy too far. The concepts of delegation, and protocols, usually mean that different bundles of code have clean interfaces that do not require recompilation when one or the other changes. Sometimes even knowledge of each other's type is irrelevant.
In any case, the best design for objc applications is a collection of separate UI definitions or nib files, and one or more libraries of code which are searched and loaded as needed at runtime.
Statically linked code, is more efficient for some tasks, but in the context of good objc design does not fit very well. Text which is statically linked is more fragile, it must be recompiled more often. It can also take more time to initialize and load; using late binding and lazy loading, only the sections to text and definitions of objects actually called will be mapped in memory.
Position independent code is absolutely needed for this kind of flexibility at runtime. The gcc compiler grew up on CISC, on 32 bit or 16 architectures. Position independent code, had to be relative to something, and the most commonly useful location was always "You are here", the program counter.
I'm not sure whether a less frequently changed relative address such as a start of bundle address makes more sense for gcc on ppc. In any event, however, I would certainly not be willing to categorize Apple's reliance on position independent code as a bug. By default, use pic.
It's entirely possible that they are using the m68k ABI to allow really old Classic applications (pre PowerPC) to continue to run in Classic... I still run into the occasional program that was built for 680X0 machines, even though Apple switched to the PowerPC back in 1995.
Some of the educational software that a lot of schools use is still built with MS-DOS 5 and Apple System 7.5 in mind. Until you can get some of these developers to move to something a little more modern, you will still have a lot of excess baggage to carry in your OS. Perhaps that is why Apple is moving to systems that won't boot MacOS 9 in January 2003.
We're sorry, the phone number you have reached is imaginary. Please rotate your phone 90 degrees and try your call again
Since the most recent versions of NeXTStep were also released in Intel versions, perhaps this has more to do with code from the Intel x86 branch than the ( truly legacy ) x86 branch?
Perhaps...
Apple supposedly saved some time using Mach-O because they wouldn't have to rewrite their linker to use CFM/PEF. If we're going to hack Mach-O so it uses the standard PowerPC ABI why not just support PEF/CFM?!
There are many reasons for this. Not only are there tools already available to handle PEF/CFM, you wouldn't need the whole PEF interpreter w/shim libraries to run PEF Carbon apps!
>80 column hard wrapped e-mail is not a sign of intelligent
>life
This is mindblowingly unimportant. Can *anyone* think of *any* company that has a perfect ABI? No, because processors evolve, and the ABI has to stay the same. When I write an x86 program on my Linux box, it pushes *all* the arguments onto the stack. Is that the best way to do things? No. Is it done anyway? Yup. Does anyone go into a tizzy about it? No.
Seriously, the x86 Linux ABI is probably worse off...different (worse) byte alignment from Windows, the abovementioned everything-goes-on-the-stack....
May we never see th
It makes logical sense. OS X takes it's roots (no pun intended) from UNIX, UNIX grew up on x86 architectures long before Apple started using it, therefore it only makes sense that the code would be mostly optimised for CISC.
T Money
World Domination with a plastic spoon since 1984
It seems that they can emulate a pc-register in a risc architecture, but could they (easily) do it the other way around? Perhaps this is the real reason why they kept the abi the way it is: so they could easily port os x to whatever platform they like...
Pessimism of the intellect, optimism of the will! - Antonio Gramsci.
Using gcc 3 with OS X Jaguar 10.2.1, I checked this out using gcc's option to produce assembly-language code (gcc -S). It turns out that a function call uses only a single additional instruction (branch to link register). Since Apple has compiled OS X Jaguar with gcc 3, only legacy shared libraries and pre-OS 10.2 applications should be affected by the RISC/CISC problem. This is still a significant performance hit, but I would imagine that it is less than 10 percent figure given earlier, and also easier to fix.
Too wordy. You can't keep my interest. You lose me.
Did you have something important to say?
> It seems that Apple could easily correct this when they update OS X for a 64-bit chip
Why is that? Pray share your wisdom with us. Or is this just a street corner guesstimate?
> Applications will need to be recompiled to be 64-bit anyway
No they won't. That's the whole point.
> why not update the ABI in the process?
Why not? I'm not sure, but I do know one thing: there's a good reason the people at Apple are developers and you're not.
I remember reading about this on the OpenTransportDev list well over a year ago.
testes money is a complete idiot, dont get bent out of shape on him.
The needs of the Classic runtime have nothing to do with the native ABI. As for targetting System 7.5 in schools... when most of the computers in use are LCIIIs and earlier, you don't have a lot of choice.
You missed what I said. I said that UNIX had grown up on the x86 before Apple or Next had started using it. There fore it only makes sense that the code you are using would come from x86 code rather than PC. I realize x86 was not the first UNIX architecture, but that was the most logical place for Apple to start drawing code from.
T Money
World Domination with a plastic spoon since 1984
PowerOpen would be a suitable ABI for any UNIX app that's POSIX compliant. Perhaps it could even be extended to work with cocoa apps?
(-1, Raw and Uncut is the only way to read)
Do you really think that "an open source OS X on X86" would be a threat to Microsoft? Why would anyone want to run such a creature? There wouldn't be any apps for it. Don't tell me that it's just a simple recompile to get existing apps running on the new architecture. Things are rarely that simple, and in the case of Carbon apps, I suspect they're considerably more complex.
So why would developers target the platform? Without a company to own it and market it, they would (rightly) conclude that it was dead in the water. Yeah, there are some companies targeting Linux with products right now, but they're generally focusing on server apps, not desktop stuff.