Linux x32 ABI Not Catching Wind
jones_supa writes "The x32 ABI for Linux allows the OS to take full advantage of an x86-64 CPU while using 32-bit pointers and thus avoiding the overhead of 64-bit pointers. Though the x32 ABI limits the program to a virtual address space of 4GB, it also decreases the memory footprint of the program and in some cases can allow it to run faster. The ABI has been talked about since 2011 and there's been mainline support since 2012. x32 support within other programs has also trickled in. Despite this, there still seems to be no widespread interest. x32 support landed in Ubuntu 13.04, but no software packages were released. In 2012 we also saw some x32 support out of Gentoo and some Debian x32 packages. Besides the kernel support, we also saw last year the support for the x32 Linux ABI land in Glibc 2.16 and GDB 7.5. The only Linux x32 ABI news Phoronix had to report on in 2013 was of Google wanting mainline LLVM x32 support and other LLVM project x32 patches. The GCC 4.8.0 release this year also improved the situation for x32. Some people don't see the ABI as being worthwhile when it still requires 64-bit processors and the performance benefits aren't very convincing for all workloads to make maintaining an extra ABI worthwhile. Would you find the x32 ABI useful?"
no
The maintainer(s) find it interesting, and they're developing it on their own dime... so I don't get the hate in some of these first few posts. No one's forcing you to use it, or even to think about it when you're coding something else.
If it's useful to someone, that's all that matters.
#DeleteChrome
The company I work for compiles almost all programms with 32 bits on x86-64 CPUs. It's not only cheap RAM usage, it's also expensive cache which is wasted with 64 pointer and 64 bit int. Since 3 GB is much more than our programms are using, x86-64 would be foolish. I'm eager waiting for a x32 SuSE version.
For some workloads, it's ~40% faster vs amd64, and for some, even more than that vs i386. For a typical case, though, it's typical to see ~7% speed and ~35% memory boost over amd64.
As for memory being cheap, this might not matter on your home box where you use 2GB of 16GB you have installed, but vserver hosting tends to be memory-bound. And using bad old i386 means a severe speed loss due to ancient instructions and register shortage.
The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
It's not just about "having enough RAM". While that certainly is a factor, it's not the only one. As you suggest, pretty much everyone has enough RAM to run just about any normal application with 64-bit pointers.
But if you want speed, you also have to pay attention to things like cache lines. 64-bit pointers often means larger instructions are needed to be encoded to do the same work, larger instructions means more cache misses. This can be a large difference in performance.
we went 64 bit for a reason.
We went to x86-64 for three reasons: 64-bit integer registers, more integer registers, and 64-bit pointers. Some applications need only the first two of these three, which is why x32 is supposed to exist.
In answer to my question, no, it is not dirt cheap. For any size cache you will get fewer cache misses if your data structures are smaller than if they are larger. Until the cache is so big that everything fits in it, you always win if you can double what you can cram into it.
Of course its a tradeoff, because the new RAM will have less of its spare ECC bits used up.
Which is all nice and good except this implies your data structure was mostly pointers to begin with
And that's exactly the case of scripting languages, where every structure (say, a Python object) is a collection of pointers to methods and data.
`echo $[0x853204FA81]|tr 0-9 ionbsdeaml`@gmail.com
With x32 you get: .so files).
- You get 16 registers instead of 8. This allows much more efficient code to be generated because you don't have to dump/reload automatic variables to the stack because the register pressure is reduced.
- You also get a crossover from the 64 bit ABI where the first 6 arguments are passed in registers instead of push/pop on the stack.
- If you need a 64 bit arithmetic op (e.g. long long), compiler will gen a single 64 instruction (vs. using multiple 32 ops).
- You also get the RIP relative addressing mode which works great when a lot of dynamic relocation of the program occurs (e.g.
You get all these things [and more] if you port your program to 64 bit. But, porting to 64 bit requires that you go through the entire code base and find all the places where you said: ...
int x = ptr1 - ptr2;
instead of:
long x = ptr1 - ptr2;
Or, you put a long into a struct that gets sent across a socket. You'd need to convert those to int's
Etc
Granted, these should be cleaned up with abstract typedef's, but porting a large legacy 32 bit codebase to 64 bit may not be worth it [at least in the short term]. A port to x32 is pretty much just a recompile. You get [most of] the performance improvement for little hassle.
It also solves the 2037 problem because time_t is now defined to be 64 bits, even in 32 bit mode. Likewise, in struct timeval, the tv_sec field is 64 bit
Like a good neighbor, fsck is there
He's right. If you mix x32 and amd64 binaries on the same system, then you need two copies of every shared library that they use to be mapped at the same time. And this means that every context switch between them is going to be pulling things into the i-cache that would already be present (assuming a physically-mapped cache, which is a pretty safe assumption these days) because the other process is using them.
This is why x32 doesn't make sense on a consumer platform like Ubuntu unless the entire system is compiled to use it, making the entire article a 'well, duh'. The real advantage of x32 is on custom deployments and embedded systems where you can build everything in x32 mode.
Oh, and on the subject of caches, x86 chips typically have 64 byte cache lines. If you make pointers 4 bytes instead of 8, then you can fit twice as many in a cache line, which is usually nice. It can be a problem for multithreaded applications though, because you may now end up with more contention in the cache coherency protocol.
I am TheRaven on Soylent News
Won't this require a 2nd copy of the shared libraries in memory, which will negate the benefit of a slightly smaller binary?