Mac OS X Built For CISC, Not RISC
WCityMike writes "One of the programmers at Unsanity, maker of haxies, recently posted a rather shocking relevation on the company's weblog. He says that Mac OS X's Mach-O runtime ABI (Application Binary Interface) comes from a NeXTStep design for 68K processorts, and is not designed for the PowerPC architechture. Had they used the latter, things would have been approximately 10-12 percent faster. And supposedly, they can't fix it now without breaking all existing applications." The developer mentions there are workarounds in the newest GCC, but only for newly compiled programs.
Read World Tech talks about the 970 in depth... I wonder how the addition of 64-bit arch AND the 32-bit compat mode will affect things.
Like the Itanium, with its poor backwards compat performance? Or will it be speedy?
ZOMG I WOULD LOVE TO KNOW ABOUT YOUR FEELINGS ON MACINTOSH VERSUS WINDOWS, VI VERSUS EMACS, AND HOW YOU'RE NOT A DORK
I wonder if Apple tried to keep this secret and this is the reason they kept so many of their APIs and underlying OSX undocumented? I think Apple is none too happy with Unsanity right now and might purposefully break their haxies with their next update, guess we'll wait and see. :)
While I am in complete agreement that it was originally done this way in the interests of expediency, we can all see a point very soon where the instruction set will be in a (minor/major?) state of upheaval -- when they revamp OS X for 64-bit operation on the IBM 970 chipset.
However, it's not quite as easy as rolling it into that architecture, as they wil probably rely of the 32-bit PPC compatibility mode of the 970 to bring along a lot of the existing baggage, ruling out a wholesale conversion to another API. Which means they will either implement a foundation to migrate toward the new API, or invoke yet another API (probably 64-bit 970 only) that uses the appropriate model. Either way, it will be some years (if ever, as we can still code 68X apps using an API from decades ago that run under emulation on OS X) before we see an efficient API in widespread use.
In any event, they will certainly retain a CISC-oriented API in the OS X stable of architectures, if only to be able to continue to wave the specter of an open source OS X on X86 in front of Microsoft, as sort of a "mutual assured destruction" weapon to prevent Microsoft from wiping them out, and possibly as a negotiating tool in keeping Microsoft coding for the Mac.
But -- since Apple is pretty well (apparently) hamstrung on making great strides in hardware performance over the next year or so, maybe they will push the software changes as the next best way to get needed speed. It wouldn't be the first time Apple capriciously honked off developers by changing all the rules and rendering years of development obsolete.
And the whole thing may be moot, as it appears that one can get equivalent performance improvement by compiling with gcc3.
It seems that Apple could easily correct this when they update OS X for a 64-bit chip (namely the PowerPC 970). Applications will need to be recompiled to be 64-bit anyway, so why not update the ABI in the process? It would certainly be incentive for developers to update their apps...
Imagine:
Apple:"Update your apps to 64-bit and see a 10% performance gain."
(Of course most apps really won't need to be 64-bit, but this would be incentive for developers to update them and users to buy new machines.)As for Carbon, it states in the article that only Mach-o binaries use the CISC-style ABI, Carbon is not affected and uses a PowerPC-style ABI. This could be a way to "prove" his theory that you could get a 10-12% performance increase. Build two test apps, one in Cocoa and one in Carbon and then compare them to see if there really is a 10-12% speed difference.
infested with jello like fishes no melotron wishes
> Its not about branching. Its about data references using PC relative addressing. The PowerPC has no PC relative data addressing modes.
Point taken. Section 4.2.3.1 of the same book is "Integer load and store address generation".
1. Register indirect with immediate index addressing for integer loads and stores - In this case, you get a 16-bit index in the instruction added to the value in a general purpose register, which is used to compute the effective address.
2. Register indirect with index addressing for integer loads and stores - this is the same as above, except that two registers are used and there is no encoded index.
3. Register indirect addressing integer loads and stores - use just one general purpose register as an address for a load or store.
So, the point is that in every case, some form of relative addressing is used. In order to make relocatable code, ie code that can be linked happily with other binary objects, you have to have some sort of reference address, and PC-relative addressing is the only way to do this.
Even though there is no PC-relative addressing mode, the only way to guarentee that the relative addresses used in different object files won't clash is to do PC-relative. The fact that this is not easy on the PowerPC doesn't make it any less necessary.
Outside of a dog, a book is a man's best friend. Inside a dog, its too dark to read.
It's entirely possible that they are using the m68k ABI to allow really old Classic applications (pre PowerPC) to continue to run in Classic... I still run into the occasional program that was built for 680X0 machines, even though Apple switched to the PowerPC back in 1995.
Some of the educational software that a lot of schools use is still built with MS-DOS 5 and Apple System 7.5 in mind. Until you can get some of these developers to move to something a little more modern, you will still have a lot of excess baggage to carry in your OS. Perhaps that is why Apple is moving to systems that won't boot MacOS 9 in January 2003.
We're sorry, the phone number you have reached is imaginary. Please rotate your phone 90 degrees and try your call again
(The parent needs to be modded up. He may be an AC, but his information is accurate.)
One problem with TOC is that is you are limited to 16K external addresses. Offset "foo" in the TOC example is 16 bits, and the low two are zero. With 64-bit addressing I suppose that drops to 8K externals.
Another characteristic calling a separately compiled function requires that you load a different TOC. The PC-relative scheme requires that you load one value: the PC; the TOC scheme requires that you load two new values: the new PC and the new TOC.
On the plus side, TOC makes shared libraries easier to manage because external addresses are bound to a non-shared data area.
Anyone who has been paying much attention around the web should/would have been able to read this information elsewhere just like me :). There are MANY people who knew of this problem existed well before the release of Jaguar.
:)
:)
Perhaps because I look into things and try to tinker and understand them as well as do silly things like assembly language programming I picked up on this sooner than others. This isn't news to a lot of people I hang with...
Sorry if I came off sounding arrogant
If Apple would just use CFM/PEF natively we woudln't need the PEF shim crap for PEF Carbon apps.
I mean where does Apple go from here? Do they create a 3rd spec so they have to upkeep two non-native formats? Will they have to create three sets of libraries or shims?
Not very forward-thinking Apple.
>80 column hard wrapped e-mail is not a sign of intelligent
>life
It seems that they can emulate a pc-register in a risc architecture, but could they (easily) do it the other way around? Perhaps this is the real reason why they kept the abi the way it is: so they could easily port os x to whatever platform they like...
Pessimism of the intellect, optimism of the will! - Antonio Gramsci.
Using gcc 3 with OS X Jaguar 10.2.1, I checked this out using gcc's option to produce assembly-language code (gcc -S). It turns out that a function call uses only a single additional instruction (branch to link register). Since Apple has compiled OS X Jaguar with gcc 3, only legacy shared libraries and pre-OS 10.2 applications should be affected by the RISC/CISC problem. This is still a significant performance hit, but I would imagine that it is less than 10 percent figure given earlier, and also easier to fix.