Slashdot Mirror


Linux x32 ABI Not Catching Wind

jones_supa writes "The x32 ABI for Linux allows the OS to take full advantage of an x86-64 CPU while using 32-bit pointers and thus avoiding the overhead of 64-bit pointers. Though the x32 ABI limits the program to a virtual address space of 4GB, it also decreases the memory footprint of the program and in some cases can allow it to run faster. The ABI has been talked about since 2011 and there's been mainline support since 2012. x32 support within other programs has also trickled in. Despite this, there still seems to be no widespread interest. x32 support landed in Ubuntu 13.04, but no software packages were released. In 2012 we also saw some x32 support out of Gentoo and some Debian x32 packages. Besides the kernel support, we also saw last year the support for the x32 Linux ABI land in Glibc 2.16 and GDB 7.5. The only Linux x32 ABI news Phoronix had to report on in 2013 was of Google wanting mainline LLVM x32 support and other LLVM project x32 patches. The GCC 4.8.0 release this year also improved the situation for x32. Some people don't see the ABI as being worthwhile when it still requires 64-bit processors and the performance benefits aren't very convincing for all workloads to make maintaining an extra ABI worthwhile. Would you find the x32 ABI useful?"

47 of 262 comments (clear)

  1. no by Anonymous Coward · · Score: 4, Insightful

    no

    1. Re:no by mlts · · Score: 4, Insightful

      For general computing, iffish.

      For embedded computing where I am worried about every chunk of space, and I can deal with the 3-4 GB RAM limit, definitely.

      This is useful, and IMHO, should be considered the mainstream kernel, but it isn't something everyone would use daily.

    2. Re:no by Just+Brew+It! · · Score: 3, Informative

      For most embedded applications you're probably better off just running a 32-bit OS and calling it a day. Embedded is mostly on 32-bit ARM processors anyway.

    3. Re:no by GPLHost-Thomas · · Score: 4, Insightful

      Well, I do find it extremely useful. Especially in Debian & Ubuntu, we have multi-arch support. For some specific workload using interpreted languages, it just reduces the memory footprint by a half. For example, PHP and Perl. If you once ran Amavis and spamassassin, you certainly know what I mean: it takes double the amount of RAM on 64 bits. Since most of our servers are running PHP, Amavis and Spamassassin, this would be a huge benefits (from 800 MB to 400 MB as the minimum server footprint), while still being able to run the rest of the workloads using 64 bits: for example, Apache itself and MySQL, which aren't taking much RAM anyway compared to these anti-spam dogs.

    4. Re:no by Bengie · · Score: 2

      You need to run in 64bit mode if you want to take advantage of many cache eviction reducing IPC increasing instructions. If you want to gain this benefit while keeping your pointer size to a minimum, then you need the x32 mode. aka, 64bit mode with truncated pointers. You can probably gain 10%-15% performance with few changes over true 32bit mode. A lot of that is hidden when using 64bit pointers because of the reducing data density for some work loads.

      x32 mode is great for anything that can take advantage of the new 64bit specific instruction, but does not need 64bit addressing. 32bit mode has a lot of weird backwards compatibility issues, so to keep things simple, they reserved some features for 64bit mode only where they deprecated some of the most annoying aspects of 32bit.

  2. Eh? by fuzzyfuzzyfungus · · Score: 3, Insightful

    If I wanted to divide my nice big memory space into 32-bit address spaces, I'd dig my totally bitchin' PAE-enabled Pentium Pro rig out of the basement, assuming the rats haven't eaten it...

  3. Nice concept by Anonymous Coward · · Score: 3, Insightful

    I do not see many cases where this would be useful. If we have a 64-bit processor and a 64-bit operating system then it seems the only benefit to running a 32-bit binary is it uses a slightly smaller amount of memory. Chances are that is a very small difference in memory used. Maybe the program loads a little faster, but is it a measurable, consistent amount? For most practical use case scenarios it does not look like this technology would be useful enough to justify compiling a new package. Now, if the process worked with 64-bit binaries and could automatically (and safely) decrease pointer size on 64-bit binaries then it might be worth while. But I'm not going to re-build an application just for smaller pointers.

    1. Re:Nice concept by maswan · · Score: 2, Informative

      The main benefit is that it runs faster. 64-bit pointers take up twice the space in caches, and especially L1 cache is very space-limited. Loading and storing them also takes twice the bandwidth to main memory.

      So for code with lots of complex data types (as opposed to big arrays of floating point data), that still has to run fast, it makes sense. I imagine the Linux kernel developers No1 benchmark of compiling the kernel would run noticably faster with gcc in x32.

      The downside is that you need a proper fully functional multi-arch system like is slowly getting adopted by Debian in order to handle multiple ABIs. And then you get into iffy things on if you want the faster /usr/bin/perl or one that can handle 6-gig lists efficiently...

    2. Re:Nice concept by sribe · · Score: 2

      So for code with lots of complex data types (as opposed to big arrays of floating point data), that still has to run fast, it makes sense.

      Well, here's the problem. Code that is that performance-sensitive can often benefit a whole lot more from a better design that does not have so many pointers pointing to itty-bitty data bits. (For instance, instead of a binary tree, a B-tree with nodes that are at least a couple of cache lines, or maybe even a whole page, wide.) There are very very few problems that actually require that a significant portion of data memory be occupied by pointers. There are lots and lots of them where the most convenient data structure uses lots of pointers, but if you're going to optimize how much you can cram in cache at once, eliminating pointers is better than shrinking them. Also, in many cases (such as the example I mentioned earlier), chunking things instead of pointers to individual items can greatly improve locality of access. And finally, of course, the irony is an awful lot of problems that are so performance-sensitive need the high performance precisely because they're dealing with large amounts of data. So yeah, it could be useful--but the problems where it is really useful are probably extremely limited.

      The downside is that you need a proper fully functional multi-arch system like is slowly getting adopted by Debian in order to handle multiple ABIs. And then you get into iffy things on if you want the faster /usr/bin/perl or one that can handle 6-gig lists efficiently...

      You also get into the problem that having two sets of libraries in use is not exactly good for cache pressure ;-)

    3. Re:Nice concept by Rockoon · · Score: 2

      64-bit pointers take up twice the space in caches, and especially L1 cache is very space-limited.

      L1 cache is typically 64KB, which is room for 8K 64-bit pointers or 16K 32-bit pointers. Now riddle me this.. if you are following thousands or more pointers, what are the chances that your access pattern is at all cache friendly?

      The chance is virtually zero.

      Of course, not all of the data is pointers, but that actually doesnt help the argument. The smaller the percentage of the cache that is pointers, the less important their size actually is, for after all when 0% are pointers then pointer size cannot have any performance impact.

      So the best case for your argument is when there are literally 8192 pointers sitting in the cache, where you would be able to instead fit 16384 pointers if they were 32-bit. But surely the act of following 16384 pointers in your access pattern is actually going to make the L1 cache 100% completely moot with a cache miss at literally every follow...

      --
      "His name was James Damore."
  4. Who cares if I'll use it? by 93+Escort+Wagon · · Score: 4, Interesting

    The maintainer(s) find it interesting, and they're developing it on their own dime... so I don't get the hate in some of these first few posts. No one's forcing you to use it, or even to think about it when you're coding something else.

    If it's useful to someone, that's all that matters.

    --
    #DeleteChrome
  5. Re:Stupid by mjrauhal · · Score: 2

    x32 at least has some merit, unlike your grasp of the history of computing. (Just not very much and probably not worth the trouble; you can probably relate.)

  6. It's not only RAM by jandar · · Score: 4, Informative

    The company I work for compiles almost all programms with 32 bits on x86-64 CPUs. It's not only cheap RAM usage, it's also expensive cache which is wasted with 64 pointer and 64 bit int. Since 3 GB is much more than our programms are using, x86-64 would be foolish. I'm eager waiting for a x32 SuSE version.

    1. Re:It's not only RAM by Austerity+Empowers · · Score: 2

      Your comment reminded me of what Larry Wall, inventor of the wrecking ball, said about Miley Cirus:

      "Leeeeroooooy Jenkins!"

  7. Re:Stupid by s.petry · · Score: 2

    I would not go that far since I'm sure a special case may exist, but that's exactly what it would be for. Hence the 'no massive wide scale adoption' or 'applications written for this' becomes an (what should be) obvious outcome.

    If I'm custom Joe and see a workload that benefits from 32 vs. 64bit OS constraints I load a 32bit OS. The reason we went to larger memory however means those special cases are extremely rare today. They happen more because "we can't get new hardware" than by choice.

    --

    -The wise argue that there are few absolutes, the fool argues that there are no probabilities.

  8. x32 is a premature optimization by bheading · · Score: 3, Interesting

    The idea makes sense in theory. Build binaries that are going to be smaller (32-bit binaries have smaller pointers compared with 64-bit) and faster (because the code is smaller, in theory cache should be used more efficiently and accesses to external memory should be reduced).

    But I suspect the problem is that the benefits simply outweigh the inconvenience of having to run with an entirely separate ABI. I doubt the average significant C program spends a lot of time doing direct addressing, and as such I suspect the size benefits of using 32-bit pointers is overstated.

  9. Re:Subject by mellon · · Score: 3, Insightful

    Memory? What about cache? Is cache dirt cheap?

  10. Re:Subject by KiloByte · · Score: 5, Interesting

    For some workloads, it's ~40% faster vs amd64, and for some, even more than that vs i386. For a typical case, though, it's typical to see ~7% speed and ~35% memory boost over amd64.

    As for memory being cheap, this might not matter on your home box where you use 2GB of 16GB you have installed, but vserver hosting tends to be memory-bound. And using bad old i386 means a severe speed loss due to ancient instructions and register shortage.

    --
    The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
  11. Re:Subject by evan.teran · · Score: 4, Interesting

    It's not just about "having enough RAM". While that certainly is a factor, it's not the only one. As you suggest, pretty much everyone has enough RAM to run just about any normal application with 64-bit pointers.

    But if you want speed, you also have to pay attention to things like cache lines. 64-bit pointers often means larger instructions are needed to be encoded to do the same work, larger instructions means more cache misses. This can be a large difference in performance.

  12. Try it by KiloByte · · Score: 2

    debootstrap --arch=x32 unstable /path/to/chroot http://ftp.debian-ports.org/debian/
    Requires an amd64 kernel compiled with CONFIG_X86_X32=y (every x32-capable kernel can also run 64 bit stuff).

    --
    The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
  13. Re:of course not by ShanghaiBill · · Score: 2

    Who would want this, some niche embedded guys?

    Not many NEGs are using 64 bit processors, and this ABI offers too little advantage to bother with. Most embedded systems run a single primary process. If that process fits in a 4GB address space (as is required to use this ABI), then the system would just use a native 32 bit ABI on a 32 bit CPU, not this 32 bit ABI on a more expensive 64 bit CPU.

  14. Re:Wont use Linux without it! by Anonymous Coward · · Score: 2, Insightful

    My dad drives a Ford and your dad drives a Chevy. Your dad sucks.

    Didn't we do this already? Like when we were twelve years old.

  15. What about shared libraries? by billcarson · · Score: 3, Insightful

    Wouldn't this require all common shared libraries (glib, mpi, etc.) to be recompiled for both x86-64 and x32? What am I missing here?

    1. Re:What about shared libraries? by mjrauhal · · Score: 3, Informative

      Yes it would. That's among the nontrivial maintenance costs.

    2. Re:What about shared libraries? by bogjobber · · Score: 2

      Nontrivial doesn't necessarily mean large. It just means significant enough that it needs to be accounted for. The actual cost will of course be dependent on the size and complexity of your codebase.

  16. More than one reason for x86-64 by tepples · · Score: 4, Interesting

    we went 64 bit for a reason.

    We went to x86-64 for three reasons: 64-bit integer registers, more integer registers, and 64-bit pointers. Some applications need only the first two of these three, which is why x32 is supposed to exist.

  17. Re:Wont use Linux without it! by mjrauhal · · Score: 2

    I could get into specifics but I shan't, because what you're blathering about has zero relevance for x32. It's not a replacement-to-be for the usual amd64 ABI, nobody is going to break amd64 to make x32 run. It's mostly a specialist tool for specific workloads (aside from being a hacker's playground, as are many things). Whether thinking it's useful as such is misguided or not, you're more so.

  18. Re:Subject by mellon · · Score: 4, Insightful

    In answer to my question, no, it is not dirt cheap. For any size cache you will get fewer cache misses if your data structures are smaller than if they are larger. Until the cache is so big that everything fits in it, you always win if you can double what you can cram into it.

  19. Re:Subject by Reliable+Windmill · · Score: 3, Informative

    You've not understood this correctly. x32 is an enhancement and optimization for executable files that do not require gigabytes of RAM, primarily regarding performance. It has nothing to do with the availability or lack of RAM in the system, or how much RAM costs to buy in the computer store.

    --
    Signature intentionally left blank.
  20. Re:Stupid by Reliable+Windmill · · Score: 2

    You've just misunderstood it. It is in essence a performance enhancement, and you would benefit from it simply from selecting x32 target (instead of x86-64) when compiling.

    --
    Signature intentionally left blank.
  21. Re:Subject by LordLimecat · · Score: 4, Funny

    Of course its a tradeoff, because the new RAM will have less of its spare ECC bits used up.

  22. Re:Subject by Anonymous Coward · · Score: 2, Insightful

    ECC memory is artificially expensive. Were ECC standard as it ought to be, it would only cost about 12.5% more. (1 bit for every byte) That is a pittance when considering the cost of the machine and the value of one's data and time. It is disgusting that Intel uses this basic reliability feature to segment their products.

  23. Re:Subject by dmbasso · · Score: 4, Informative

    Which is all nice and good except this implies your data structure was mostly pointers to begin with

    And that's exactly the case of scripting languages, where every structure (say, a Python object) is a collection of pointers to methods and data.

    --
    `echo $[0x853204FA81]|tr 0-9 ionbsdeaml`@gmail.com
  24. The return of memory models? by Just+Brew+It! · · Score: 3, Interesting

    This sure feels a lot like a throwback to the old 16-bit DOS days, where you had small/medium/large memory models depending on the size of your code and data address spaces. We've already got 32-bit mode for supporting pure 32-bit apps and 64-bit mode for pure 64-bit; supporting yet a third ABI is just going to result in more bloat as all the runtime libraries need to be duplicated for yet another combination of code/data pointer size.

    I hate to say this since I'm sure a lot of smart people put significant effort into this, but it seems like a solution in search of a problem. RAM is cheap, and the performance advantage of using 32-bit pointers is typically small.

  25. Re:ABI? by Just+Brew+It! · · Score: 2

    ABI = Application Binary Interface. Defines the pointer sizes and conventions for passing function arguments at the object code level (among other things). The ABI determines how the compiler generates object code for function call/entry/exit, and the width of pointer types.

    API defines the interfaces seen by the programmer.

  26. Re:Subject by macpacheco · · Score: 2

    That's right. Unfortunately it's called the market. The same boneheads that says x32 isn't worth it, are the same boneheads which have no idea how ECC is important, how hard it is to properly code everything worrying about cache hits is. Probably people that never wrote a single line of C or assembly code.

    But the Intel way of making the same physical hardware cost 50% more (with a simple on/off switch) will continue until ARM Cortex start giving intel some real competition (at least competing with the latest gen core i3).

    In the ARM world, you can still get a 10 yr world CPU design (and for a pitance) because there's no forced obsolecence like Intel does.

    Anxiously waiting for quad core Cortex A57 chromebooks in 2014 with 4GB of RAM. And a raspberry pi (or similar) with Cortex A53.

  27. First let's understand this x32 correctly. by macpacheco · · Score: 2

    While it's possible to have a system with 16GB that could use only x32 (the kernel is still x86_64 under x32, so the kernel can see the 16GB), for instance running thousands of tasks using up to 4GB each just fine, plus the page cache is a kernel thing, so the I/O cache can always use all memory.

    On the other hand, there are workloads that run on a 4GB system but that need x86_64 (mmaping of huge files for instance), and so boneheaded tasks reserve tons of never used RAM, it could actually use 1GB of RAM but reserve 8GB, the issue there really should be putting the coder in jail, but I digress.

    But the vast majority of linux workloads today that use even a 8GB system would run just fine under x32. Like 95-98%.
    And nobody is even suggesting a mainstream linux distro without x86_64 userland. I'm sugesting all standard tools using x32, but keeping the x86_64 shared libraries and compilers, so if you need you could use some apps with full 64bit capability. Just use x32 by default.

    Plus it's a good way to remind lazy developers that no matter how cheap RAM is, you should be serious about being efficient (specially to the KDE developers) !
    KDE functionality is great, but they really have no clue about efficiency (RAM and CPU).

  28. Re: Subject by Austerity+Empowers · · Score: 2

    Colonel Panic to be precise. He reports directly to General P. Fault.

  29. Re:BSD by adri · · Score: 2

    No, it's not the same.

    The idea is that you use the 32 bit pointer model, with 32 bit indirect instructions, but you're doing it all using the x86-64 instruction set. Ie, the task is in 64 bit mode. The 64 bit mode includes primarily more registers, so you can write / compile to tighter code.

    The stuff you described is for running 32 bit binaries that use the i386/i485/i586 instruction set, complete with the limited set of temporary registers. x86-64 has many more registers to use.

    It's not just about cache lines. :)

  30. Re:Subject by Forever+Wondering · · Score: 5, Informative

    With x32 you get:
    - You get 16 registers instead of 8. This allows much more efficient code to be generated because you don't have to dump/reload automatic variables to the stack because the register pressure is reduced.
    - You also get a crossover from the 64 bit ABI where the first 6 arguments are passed in registers instead of push/pop on the stack.
    - If you need a 64 bit arithmetic op (e.g. long long), compiler will gen a single 64 instruction (vs. using multiple 32 ops).
    - You also get the RIP relative addressing mode which works great when a lot of dynamic relocation of the program occurs (e.g. .so files).

    You get all these things [and more] if you port your program to 64 bit. But, porting to 64 bit requires that you go through the entire code base and find all the places where you said:
        int x = ptr1 - ptr2;
    instead of:
        long x = ptr1 - ptr2;
    Or, you put a long into a struct that gets sent across a socket. You'd need to convert those to int's
    Etc ...

    Granted, these should be cleaned up with abstract typedef's, but porting a large legacy 32 bit codebase to 64 bit may not be worth it [at least in the short term]. A port to x32 is pretty much just a recompile. You get [most of] the performance improvement for little hassle.

    It also solves the 2037 problem because time_t is now defined to be 64 bits, even in 32 bit mode. Likewise, in struct timeval, the tv_sec field is 64 bit

    --
    Like a good neighbor, fsck is there ...
  31. Re:It has some value for embedded systems by Pinhedd · · Score: 2

    wrong architecture.

    Cost sensitive embedded systems use ARM based microprocessors to which this is not applicable.

  32. Re:Wont use Linux without it! by armanox · · Score: 2

    I can recompile and run 20 year old SunOS apps no problem with OpenSolaris. Try that with Linux?

    Depends on what it's looking for, but in theory should work. 20 years? CLI or GUI based? Probably wants TCL/TK and/or Motiff if it's GUI, make sure they're installed. I'm willing to try, if you have source code that old...

    Hairyfeet mentioned he tried linux and people kept calling back angry that their printer stopped working after an Ubuntu update.

    I did not even know it existed? I will keep Linux on a VM I suppose but only CentOS as Redhat likes to make somewhat ABIs that do not break after each freaking update!

    If you need stability then you should go with a stable OS. Fedora, OpenSuSE, and Ubuntu change too fast for enterprise use - which is what makes RHEL great.

    With that said, I don't seem to have issues running some older software I have laying around for Linux. Oracle Database 8 installed on RHEL 5 when I tried it last, old version of Code Forge IDE ran in new Fedora Linux (think I installed it last on FC 16, designed for Red Hat Linux 5.x/6.x (old Red Hat, not RHEL). Similar results with Matlab. The software isn't broken by kernel changes - the libraries needed do change (static linked vs dynamic linked, makes a big difference in how long your software lasts) (stuff looking for a particular glib or libc seem to be the biggest offenders in Linux, from what I've seen). Windows has seen some issues with that over the years (dropping DOS libraries, dropping Win 16 from 64 bit Windows, etc).

    Most UNIX operating systems do seem to maintain greater compatibility in userland, but I've issues on IRIX with stuff built for 5.x not working on my systems (Octanes running 6.5.x) - but it's the same deal - dynamically linked programs not being able to find their libraries.

    --
    I'm starting to think GNU is the problem with "GNU/Linux" these days.
  33. Re:Subject by TheRaven64 · · Score: 4, Interesting

    He's right. If you mix x32 and amd64 binaries on the same system, then you need two copies of every shared library that they use to be mapped at the same time. And this means that every context switch between them is going to be pulling things into the i-cache that would already be present (assuming a physically-mapped cache, which is a pretty safe assumption these days) because the other process is using them.

    This is why x32 doesn't make sense on a consumer platform like Ubuntu unless the entire system is compiled to use it, making the entire article a 'well, duh'. The real advantage of x32 is on custom deployments and embedded systems where you can build everything in x32 mode.

    Oh, and on the subject of caches, x86 chips typically have 64 byte cache lines. If you make pointers 4 bytes instead of 8, then you can fit twice as many in a cache line, which is usually nice. It can be a problem for multithreaded applications though, because you may now end up with more contention in the cache coherency protocol.

    --
    I am TheRaven on Soylent News
  34. Re:Subject by TheRaven64 · · Score: 3, Informative

    The C standard does not guarantee that sizeof(long) is as big as sizeof(void*). The type that you want is intptr_t (or ptrdiff_t for differences between pointers). If you've gone through replacing everything with long, then good luck getting your code to run on win64 (where long is 4 bytes).

    --
    I am TheRaven on Soylent News
  35. Errm by countach · · Score: 4, Interesting

    Won't this require a 2nd copy of the shared libraries in memory, which will negate the benefit of a slightly smaller binary?

  36. Re:Subject by foobar+bazbot · · Score: 2

    You're running a Python script and you care about L1/L2 cache efficiency??

    Your system is probably context switching between hundreds of MB.

    Amongst.

    Your system is context-switching amongst hundreds of MB.

    Frohe Weihnachten from das Grammer-SS!

  37. Re:Subject by Bengie · · Score: 2

    Yes and no. The larger your cache, the higher its latency. Can't get around this. L1 caches tend to be small to keep the execution units fed with typically 1 or 2 cycle latencies. L2 caches tend to be about 16x larger, but have about 10x the latency.

    L2 cache may have high latency, but it still has decent bandwidth. To help hide the latency, modern CPUs have automatic pre-fetching and also async low-priority pre-fetching instructions that allow programmer to tell the CPU to attempt to load data from memory into L1 prior to needing it, and only if the CPU finds an open slot for memory access.

    After a certain size, "normal" cache is slower than main-memory. That's why we're starting to see integrated eDRAM, which is mostly just system memory built into the CPU or package. The other issue you need to be careful about is each layer of cache adds accumulative fixed latency.

    The easiest way to hide high latency is to have lots of concurrent work going on. Hyper-threading banks a lot on this. When one virtual code is stalling on memory access, the other code can step in and make use of any free execution units on a per cycle basis. Because there are lots of units, there is usually an idle one somewhere that can be used. Ironically, having two virtual cores sharing the same resources means resources are split, primarily the L1 cache. While hyper-threading helps hide latency caused by memory access, it also increases the chance of an address getting evicted from L1.

    To help with this, Intel increased the size of their L1 cache on the more recent CPUs, but this also increased the latency from 1 cycle to 2 cycle. To help compensate, they increased the bandwidth of the L1 and allow larger loads. Twice the bandwidth, twice the size, but twice the latency. Single thread code takes a minor hit, but concurrent work stands to gain a decent amount.

    Increasing cache sizes is not as simple as it seems.