GNU C Library 2.17 Announced, Includes Support For 64-bit ARM
hypnosec writes "A new version of GNU C Library (glibc) has been released and with this new version comes support for the upcoming 64-bit ARM architecture a.k.a. AArch64. Version 2.17 of glibc not only includes support for ARM, it also comes with better support for cross-compilation and testing; optimized versions of memcpy, memset, and memcmp for System z10 and zEnterprise z196; and optimized version of string functions, on top of some quite a few other performance improvements, states the mailing list release announcement. Glibc v 2.17 can be used with a minimum Linux kernel version 2.6.16."
Looks like the eglibc fork was a good thing for the project. Rather than having one maintainer that resists and fights an architecture for personal reasons, the project is now being proactive in integrating a new ARM architecture.
Now if we could only get away from having so many Android-only bionic-targeting blobs.
In fairness, this is complicated a lot by two issues:
1. Many of the optimizations that help things like memcpy, memcmp, etc. are utterly wrong and backwards in any loop that actually DOES SOMETHING in its body; they only end up being optimal in the degenerate case where everything but the load and store is loop overhead and the optimal result is achieved by eliminating overhead. And on some CPU models such as most modern 32-bit x86's and some 64-bit ones, the optimal result is actually attained with a special instruction that's not usable in general for more complex loops (i.e. "rep movsb"). Factors like these make optimizing these specific functions in the compiler a task that's largely separate from general-case optimization, and when the main target libc is already providing the asm anyway, there's little demand/motivation to get the compiler to do something that won't even be used.
2. Distros want a binary library that can run optimally on all variants of a particular instruction set architecture. Relying on the compiler to optimize functions for which the optimal variant is highly cpu model specific would only give a binary that runs optimally on one model, unless a lot of logic is added to the build system to rebuild the same source file with different optimizations. This is not prohibitively difficult, but it's also not easy, and it's not worthwhile when the compiler can't even deliver the desired optimization quality yet.
Overall I agree that machine-specific asm in glibc (and elsewhere) is a disease that results in machine-specific bugs and maintenance hell, but when there are people demanding the performance and pushing benchmark-centric agendas, it's hard to fight it...