How Snow Leopard Cut ObjC Launch Time In Half
MBCook writes "Greg Parker has an excellent technical article on his blog about the changes to the dynamic linker (dyld) for Objective-C that Snow Leopard uses to cut launch time in half and cut about 1/2 MB of memory per application. 'In theory, a shared library could be different every time your program is run. In practice, you get the same version of the shared libraries almost every time you run, and so does every other process on the system. The system takes advantage of this by building the dyld shared cache. The shared cache contains a copy of many system libraries, with most of dyld's linking and loading work done in advance. Every process can then share that shared cache, saving memory and launch time.' He also has a post on the new thread-local garbage collection that Snow Leopard uses for Objective-C."
Comparing this to Superfetch is ignorant beyond belief. Superfetch is part of the paging system on Windows and attempts to trigger page faults before the data is actually needed so that it's already cached when it is needed. This is quite a nice feature and one I am naturally prejudiced to like because my PhD was in this topic.
This is entirely different. Part of it is similar to the existing prebinding / prelinking stuff in Leopard / Linux, which generates the relocation tables in position-independent code. This is nothing like anything in Windows, because Windows doesn't use position-independent code for shared libraries (it uses a horribly ugly hack which performs better in the best case and much worse in the worst case). The article is a bit too light on details to understand exactly why the new version performs so much better.
The other half, however, is very clever. By caching the selector uniquing information, they are saving a lot of time when loading compilation units containing Objective-C code. Even better is the fact that, because these symbols are now not modified, they can be shared between processes without triggering copy-on-write faults. This isn't actually that hard to implement for the GNU runtime; just give the selector symbols mangled names and mark them as having common linkage (it's a bit harder on Darwin because Mach-O is weird), then you can use pointer comparison as a first step in the runtime and avoid the strcmp() call. Combining this with the prelinking support and you get the caching for free, which is very nice. I actually implemented this in Clang while writing this post, so expect to see it on non-Apple platforms soon too.
I am TheRaven on Soylent News