Linux x32 ABI Not Catching Wind

← Back to Stories (view on slashdot.org)

Linux x32 ABI Not Catching Wind

Posted by Soulskill on Tuesday December 24, 2013 @11:20AM from the try-a-bigger-sail dept.

jones_supa writes "The x32 ABI for Linux allows the OS to take full advantage of an x86-64 CPU while using 32-bit pointers and thus avoiding the overhead of 64-bit pointers. Though the x32 ABI limits the program to a virtual address space of 4GB, it also decreases the memory footprint of the program and in some cases can allow it to run faster. The ABI has been talked about since 2011 and there's been mainline support since 2012. x32 support within other programs has also trickled in. Despite this, there still seems to be no widespread interest. x32 support landed in Ubuntu 13.04, but no software packages were released. In 2012 we also saw some x32 support out of Gentoo and some Debian x32 packages. Besides the kernel support, we also saw last year the support for the x32 Linux ABI land in Glibc 2.16 and GDB 7.5. The only Linux x32 ABI news Phoronix had to report on in 2013 was of Google wanting mainline LLVM x32 support and other LLVM project x32 patches. The GCC 4.8.0 release this year also improved the situation for x32. Some people don't see the ABI as being worthwhile when it still requires 64-bit processors and the performance benefits aren't very convincing for all workloads to make maintaining an extra ABI worthwhile. Would you find the x32 ABI useful?"

175 of 262 comments (clear)

Min score:

Reason:

Sort:

no by Anonymous Coward · 2013-12-24 11:24 · Score: 4, Insightful

no
1. Re:no by rudy_wayne · 2013-12-24 12:24 · Score: 1
  
  Catching Wind
  LOL
2. Re:no by mlts · 2013-12-24 13:27 · Score: 4, Insightful
  
  For general computing, iffish.
  For embedded computing where I am worried about every chunk of space, and I can deal with the 3-4 GB RAM limit, definitely.
  This is useful, and IMHO, should be considered the mainstream kernel, but it isn't something everyone would use daily.
3. Re:no by Just+Brew+It! · 2013-12-24 14:14 · Score: 3, Informative
  
  For most embedded applications you're probably better off just running a 32-bit OS and calling it a day. Embedded is mostly on 32-bit ARM processors anyway.
4. Re:no by GPLHost-Thomas · 2013-12-24 22:16 · Score: 4, Insightful
  
  Well, I do find it extremely useful. Especially in Debian & Ubuntu, we have multi-arch support. For some specific workload using interpreted languages, it just reduces the memory footprint by a half. For example, PHP and Perl. If you once ran Amavis and spamassassin, you certainly know what I mean: it takes double the amount of RAM on 64 bits. Since most of our servers are running PHP, Amavis and Spamassassin, this would be a huge benefits (from 800 MB to 400 MB as the minimum server footprint), while still being able to run the rest of the workloads using 64 bits: for example, Apache itself and MySQL, which aren't taking much RAM anyway compared to these anti-spam dogs.
5. Re:No by Technomancer · 2013-12-24 23:46 · Score: 1
  
  Well, x32 cannot completely replace x86_64. Because I did not put 32GB RAM in my workstation to run everything with only 4GB of user address space.
  And Firefox and Chrome can run out of 32 bit address space pretty quickly.
  So x32 is an useless gimmick. Maybe useful on low memory system like embedded Linux with 2 or 4GB of RAM.
6. Re:no by Carewolf · 2013-12-25 01:48 · Score: 1
  
  Who cares about most? If you need a fast embedded processor you need x86, and if you want fast you probably buy a modern embedded x86 that supports 64bit, but you don't need 64bit, but running x32 would give you improved performance at no extra cost. Of course it makes sense, you just won't see it in phones because x86 are still too powerful and power-hungry for that market.
7. Re:no by Just+Brew+It! · 2013-12-25 02:16 · Score: 1
  
  Modern "embedded" x86 processors generally sacrifice a fair amount of performance to meet their target power/heat numbers. Once you make those concessions, the performance gap relative to ARM narrows considerably. Furthermore, the most computationally demanding tasks in the embedded space tend to be either graphics/video (which will use embedded GPU hardware) or amenable to running on a DSP (implemented using GPU compute or dedicated DSP hardware).
8. Re:No by wolrahnaes · 2013-12-25 05:26 · Score: 1
  
  Right, because disk space is at a premium. Oh wait, a terabyte of disk costs as much as a case of good beer.
  Also, in theory any x86 app should simply recompile as x32 with no trouble even if they're the kind that makes x64-incompatible assumptions, so unless you need to support closed binaries in x86 mode you can trade the x86 libraries for x32 and gain a bit of performance in certain tasks with no notable change in disk space.
  For the average Linux user, the only closed binaries they're likely to run are Flash and possibly a GPU driver. Both have x64 builds, so I'd feel comfortable saying that the majority of users can entirely drop x86 support if their distro was to start offering x32 builds.
  
  --
  I used to get high on life, but I developed a tolerance. Now I need something stronger.
9. Re:no by Megol · 2013-12-25 07:03 · Score: 1
  
  You are making the mistake of thinking embedded == low performance/low power. But that is completely orthogonal to what embedded means. Embedding is the use of a computer system for a specific purpose commonly with hardware and software optimized for that purpose. For a system optimized for reasonable cost and high performance it is very hard to beat Intel x86 processors. For a system optimized for extremely low power consumption ARM is an alternative.
10. Re:no by mebrahim · 2013-12-25 08:00 · Score: 1
  
  Just use i386. You can even run i386 userland over an amd64 kernel. I can't see the big benefit of x32 over this scheme.
  
  --
  Persian Project Management Software as a Service
11. Re:no by Bengie · 2013-12-25 08:33 · Score: 2
  
  You need to run in 64bit mode if you want to take advantage of many cache eviction reducing IPC increasing instructions. If you want to gain this benefit while keeping your pointer size to a minimum, then you need the x32 mode. aka, 64bit mode with truncated pointers. You can probably gain 10%-15% performance with few changes over true 32bit mode. A lot of that is hidden when using 64bit pointers because of the reducing data density for some work loads.
  
  x32 mode is great for anything that can take advantage of the new 64bit specific instruction, but does not need 64bit addressing. 32bit mode has a lot of weird backwards compatibility issues, so to keep things simple, they reserved some features for 64bit mode only where they deprecated some of the most annoying aspects of 32bit.
Subject by Daimanta · 2013-12-24 11:24 · Score: 1, Insightful

With memory being dirt cheap I ask: Who cares?

--
Knowledge is power. Knowledge shared is power lost.
1. Re:Subject by mellon · 2013-12-24 11:48 · Score: 3, Insightful
  
  Memory? What about cache? Is cache dirt cheap?
2. Re:Subject by KiloByte · 2013-12-24 11:49 · Score: 5, Interesting
  
  For some workloads, it's ~40% faster vs amd64, and for some, even more than that vs i386. For a typical case, though, it's typical to see ~7% speed and ~35% memory boost over amd64.
  As for memory being cheap, this might not matter on your home box where you use 2GB of 16GB you have installed, but vserver hosting tends to be memory-bound. And using bad old i386 means a severe speed loss due to ancient instructions and register shortage.
  
  --
  The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
3. Re:Subject by evan.teran · 2013-12-24 11:49 · Score: 4, Interesting
  
  It's not just about "having enough RAM". While that certainly is a factor, it's not the only one. As you suggest, pretty much everyone has enough RAM to run just about any normal application with 64-bit pointers.
  But if you want speed, you also have to pay attention to things like cache lines. 64-bit pointers often means larger instructions are needed to be encoded to do the same work, larger instructions means more cache misses. This can be a large difference in performance.
4. Re: Subject by Anonymous Coward · 2013-12-24 12:02 · Score: 1
  
  Cache requires cash.
5. Re:Subject by ProzacPatient · 2013-12-24 12:03 · Score: 1
  
  Desktop memory is cheap but ECC server memory can be very expensive
6. Re:Subject by haruchai · 2013-12-24 12:14 · Score: 1
  
  Damn straight. Just spent $1000 for used 16 4GB sticks of HP DDR3 ECC registered memory; that's considered a bargain. New sticks would be $120 each.
  
  --
  Pain is merely failure leaving the body
7. Re:Subject by TheRealMindChild · 2013-12-24 12:33 · Score: 1
  
  Sort of. It will be in the form or l4 or even a next layer, l5 cache. While this is still faster than grabbing system memory, we are approaching the point where it isn't
  
  --
  
  "When life gives you lemons, don't make lemonade. Make life take the lemons back!" -- Cave Johnson
8. Re:Subject by mellon · 2013-12-24 12:34 · Score: 4, Insightful
  
  In answer to my question, no, it is not dirt cheap. For any size cache you will get fewer cache misses if your data structures are smaller than if they are larger. Until the cache is so big that everything fits in it, you always win if you can double what you can cram into it.
9. Re:Subject by Mike+Buddha · 2013-12-24 12:37 · Score: 1
  
  This is what Apple needs for it's silly 64 Bit mobile processors & OS.
  
  --
  by Mike Buddha -- Someday the mountain might get him, but the law never will.
10. Re:Subject by Reliable+Windmill · 2013-12-24 12:45 · Score: 3, Informative
  
  You've not understood this correctly. x32 is an enhancement and optimization for executable files that do not require gigabytes of RAM, primarily regarding performance. It has nothing to do with the availability or lack of RAM in the system, or how much RAM costs to buy in the computer store.
  
  --
  Signature intentionally left blank.
11. Re:Subject by LordLimecat · 2013-12-24 12:50 · Score: 4, Funny
  
  Of course its a tradeoff, because the new RAM will have less of its spare ECC bits used up.
12. Re:Subject by Anonymous Coward · 2013-12-24 12:51 · Score: 2, Insightful
  
  ECC memory is artificially expensive. Were ECC standard as it ought to be, it would only cost about 12.5% more. (1 bit for every byte) That is a pittance when considering the cost of the machine and the value of one's data and time. It is disgusting that Intel uses this basic reliability feature to segment their products.
13. Re:Subject by ultranova · 2013-12-24 13:19 · Score: 1
  
  Until the cache is so big that everything fits in it, you always win if you can double what you can cram into it.
  
  Which is all nice and good except this implies your data structure was mostly pointers to begin with, so if you want to increase cache efficiency forget about pointer size and redesign them for better locality.
  I suspect this is the real reason why this ABI has not caught wind: anyone who cares has already taken steps that render it pointless.
  
  --
  Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
14. Re:Subject by Anonymous Coward · 2013-12-24 13:30 · Score: 1
  
  This needs to be modded up since these two points are exactly where the benefit lies and how non-trivial the benefit is!
  vs amd64: much lower memory bandwidth and much higher cache density
  vs i386: more registers
  Despite what people seem to think, most binaries will probably never need 64-bit addressing. After all, look at your current process list and how many of those are anywhere near a 4GiB virtual size?
  Memory might be cheap to buy but it sure as hell isn't cheap to access (especially when you have several cores fighting for it on a bus shared with a GPU and a display controller powering a high-res display).
15. Re:Subject by dmbasso · 2013-12-24 13:32 · Score: 4, Informative
  
  Which is all nice and good except this implies your data structure was mostly pointers to begin with
  And that's exactly the case of scripting languages, where every structure (say, a Python object) is a collection of pointers to methods and data.
  
  --
  `echo $[0x853204FA81]|tr 0-9 ionbsdeaml`@gmail.com
16. Re:Subject by haruchai · 2013-12-24 13:39 · Score: 1
  
  "Spare ECC bits" - what??
  
  --
  Pain is merely failure leaving the body
17. Re:Subject by macpacheco · 2013-12-24 15:28 · Score: 2
  
  That's right. Unfortunately it's called the market. The same boneheads that says x32 isn't worth it, are the same boneheads which have no idea how ECC is important, how hard it is to properly code everything worrying about cache hits is. Probably people that never wrote a single line of C or assembly code.
  But the Intel way of making the same physical hardware cost 50% more (with a simple on/off switch) will continue until ARM Cortex start giving intel some real competition (at least competing with the latest gen core i3).
  In the ARM world, you can still get a 10 yr world CPU design (and for a pitance) because there's no forced obsolecence like Intel does.
  Anxiously waiting for quad core Cortex A57 chromebooks in 2014 with 4GB of RAM. And a raspberry pi (or similar) with Cortex A53.
18. Re:Subject by Anonymous Coward · 2013-12-24 15:28 · Score: 1
  
  Except you've trashed your cache by needing two copies of every standard library— unless you really never have any workloads that need to access more than 4GB of address space.
  The few workloads that are faster under x32 can be modified to be less insanely pointer heavy (e.g. using offsets) and then they're usually faster on both x86 and x86_64.
  The huge inflation of libraries and the loss of the capability to handling large addresses spaces just isn't justified by the very narrow benefits.
19. Re:Subject by Austerity+Empowers · 2013-12-24 15:38 · Score: 1
  
  The other applications running on your system who also want to use memory and are programmed by people who don't care about resource utilization. Or the other VM, or the 8 other VMs.
  Processor power, memory and disk space should be considered like earth's natural resources. They're in limited supply and should never be wasted no matter how available they may seem at the present.
20. Re:Subject by Austerity+Empowers · 2013-12-24 15:40 · Score: 1
  
  Until the cache is so big that everything fits in it
  A day, that by virtue of this arguement, can never come!
21. Re: Subject by Austerity+Empowers · 2013-12-24 15:41 · Score: 2
  
  Colonel Panic to be precise. He reports directly to General P. Fault.
22. Re:Subject by Forever+Wondering · 2013-12-24 16:03 · Score: 5, Informative
  
  With x32 you get:
  - You get 16 registers instead of 8. This allows much more efficient code to be generated because you don't have to dump/reload automatic variables to the stack because the register pressure is reduced.
  - You also get a crossover from the 64 bit ABI where the first 6 arguments are passed in registers instead of push/pop on the stack.
  - If you need a 64 bit arithmetic op (e.g. long long), compiler will gen a single 64 instruction (vs. using multiple 32 ops).
  - You also get the RIP relative addressing mode which works great when a lot of dynamic relocation of the program occurs (e.g. .so files).
  You get all these things [and more] if you port your program to 64 bit. But, porting to 64 bit requires that you go through the entire code base and find all the places where you said:
  int x = ptr1 - ptr2;
  instead of:
  long x = ptr1 - ptr2;
  Or, you put a long into a struct that gets sent across a socket. You'd need to convert those to int's
  Etc ...
  Granted, these should be cleaned up with abstract typedef's, but porting a large legacy 32 bit codebase to 64 bit may not be worth it [at least in the short term]. A port to x32 is pretty much just a recompile. You get [most of] the performance improvement for little hassle.
  It also solves the 2037 problem because time_t is now defined to be 64 bits, even in 32 bit mode. Likewise, in struct timeval, the tv_sec field is 64 bit
  
  --
  Like a good neighbor, fsck is there ...
23. Re:Subject by haruchai · 2013-12-24 17:13 · Score: 1
  
  These are RDIMMs, not UDIMMs.
  Besides it's the company's money and we can only buy from approved buyers or we don't get reimbursed.
  
  --
  Pain is merely failure leaving the body
24. Re:Subject by Tim12s · 2013-12-24 18:07 · Score: 1
  
  That seams reasonable advantage. If it could take me from 60K tps to 100K tps per blade its a no-brainer. I doubt its going to allow office/home application to run any noticeably quicker but with a blade centre of 16 blades, I'll want to get my monies worth before needing to expand.
25. Re:Subject by mwvdlee · 2013-12-24 20:13 · Score: 1
  
  ~35% memory boost is quite nice if you're running memory-bound multithreading processes; each thread being relatively light on CPU% but uses lots of memory.
  I run a webserver where one of the batch jobs is exactly that. ~35% memory boost would be very close to ~35% increase in throughput.
  
  --
  Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
26. Re:Subject by mwvdlee · 2013-12-24 20:15 · Score: 1
  
  After all, look at your current process list and how many of those are anywhere near a 4GiB virtual size?
  I'm sure my processes would gladly use more than 4GB.
  But since the 64-bit laptop they run on is maxed out at 3GB, I doubt 32-bit adress space would be the bottleneck.
  
  --
  Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
27. Re:Subject by oursland · 2013-12-24 20:43 · Score: 1
  
  As you suggest, pretty much everyone has enough RAM to run just about any normal application with 64-bit pointers.
  Most users will never need 64-bit pointers. Only applications which require more than 4 GB of addressable memory within that single program will ever use this. Examples of such applications include in-core scientific computing and very large media file editing.
28. Re:Subject by oursland · 2013-12-24 20:45 · Score: 1
  
  Except you've trashed your cache by needing two copies of every standard library
  You should learn how virtual memory and cache works before spouting this bullshit.
29. Re:Subject by KiloByte · 2013-12-24 21:01 · Score: 1
  
  You get 16 registers instead of 8.
  More like 14 instead of 6: you can't exactly put IP to much use.
  
  --
  The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
30. Re:Subject by kthreadd · 2013-12-24 21:27 · Score: 1
  
  The address space is not limited to the amount of physical memory you have. Take memory mapped files for example. You need enough address space if you map a large file, but the entire file doesn't have to be in physical memory.
31. Re:Subject by Anonymous Coward · 2013-12-24 21:35 · Score: 1
  
  You get all these things [and more] if you port your program to 64 bit. But, porting to 64 bit requires that you go through the entire code base and find all the places where you said:
  int x = ptr1 - ptr2;
  instead of:
  long x = ptr1 - ptr2;
  You mean instead of
  ptrdiff_t x = ptr1 - ptr2;
  Not that this makes your problem easier if you have used int in the past, but you should at least apply the correct fix.
32. Re:Subject by mwvdlee · 2013-12-24 21:36 · Score: 1
  
  But at that point you're using swap on a harddisk or similar, which is much, much slower.
  Unless you explicitely need >4GB support, I think the memory saving from x32 ABI outweighs the reduced address space for most real world applications.
  
  --
  Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
33. Re:Subject by TheRaven64 · 2013-12-24 22:04 · Score: 4, Interesting
  
  He's right. If you mix x32 and amd64 binaries on the same system, then you need two copies of every shared library that they use to be mapped at the same time. And this means that every context switch between them is going to be pulling things into the i-cache that would already be present (assuming a physically-mapped cache, which is a pretty safe assumption these days) because the other process is using them.
  This is why x32 doesn't make sense on a consumer platform like Ubuntu unless the entire system is compiled to use it, making the entire article a 'well, duh'. The real advantage of x32 is on custom deployments and embedded systems where you can build everything in x32 mode.
  Oh, and on the subject of caches, x86 chips typically have 64 byte cache lines. If you make pointers 4 bytes instead of 8, then you can fit twice as many in a cache line, which is usually nice. It can be a problem for multithreaded applications though, because you may now end up with more contention in the cache coherency protocol.
  
  --
  I am TheRaven on Soylent News
34. Re:Subject by TheRaven64 · 2013-12-24 22:15 · Score: 3, Informative
  
  The C standard does not guarantee that sizeof(long) is as big as sizeof(void*). The type that you want is intptr_t (or ptrdiff_t for differences between pointers). If you've gone through replacing everything with long, then good luck getting your code to run on win64 (where long is 4 bytes).
  
  --
  I am TheRaven on Soylent News
35. Re:Subject by KiloByte · 2013-12-25 00:17 · Score: 1
  
  That's for selected contrived tests. For real-life cases, you get far less, like that 7% figure, or for quite a few test cases even 0%.
  
  --
  The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
36. Re:Subject by Anonymous Coward · 2013-12-25 01:42 · Score: 1
  
  Don't you mean WINDBLOWS?
37. Re:Subject by funky_vibes · 2013-12-25 02:32 · Score: 1
  
  Any code that assumes ptrsize == sizeof(int) is broken to begin with.
38. Re:Subject by ColdWetDog · 2013-12-25 05:13 · Score: 1
  
  Of course its a tradeoff, because the new RAM will have less of its spare ECC bits used up.
  You need to study the concept of ECC a bit more. :) It's not some kind of reallocated bit, like hard drive's reallocated sectors.
  Ya'll have been seriously whooshed. That buzzing noise. It's not a FBI drone.
  
  --
  Faster! Faster! Faster would be better!
39. Re:Subject by Grishnakh · 2013-12-25 05:38 · Score: 1
  
  Python can be compiled.
40. Re:Subject by foobar+bazbot · 2013-12-25 06:05 · Score: 2
  
  You're running a Python script and you care about L1/L2 cache efficiency??
  Your system is probably context switching between hundreds of MB.
  Amongst.
  Your system is context-switching amongst hundreds of MB.
  Frohe Weihnachten from das Grammer-SS!
41. Re:Subject by Megol · 2013-12-25 07:10 · Score: 1
  
  No need for swapping => many algorithms can be made more efficient by spreading data into a sparse virtual address space. While the used memory can be much less than even 1GiB it can be spread over 2^64 address space (or 2^48 for current AMD64 processors). And one doesn't even need to use one of those algorithms to benefit from a large address space, there are other places where it helps.
42. Re:Subject by Megol · 2013-12-25 07:14 · Score: 1
  
  Why don't you just buy an AMD processor and get the ECC support much cheaper? Yes Intel processors are faster (ATM at least) but AMD processors beat the crap of almost everything else.
43. Re:Subject by Megol · 2013-12-25 07:25 · Score: 1
  
  I guess you mean (R)SP? The (R)IP (=instruction pointer?) isn't an addressable register in x86. Even ignoring RSP that still leaves 15 available registers and RSP can be used in single-entry user level code if needed.
44. Re:Subject by Bengie · 2013-12-25 08:43 · Score: 1
  
  Until the cache is so big that everything fits in it, you always win if you can double what you can cram into it.
  Which is all nice and good except this implies your data structure was mostly pointers to begin with, so if you want to increase cache efficiency forget about pointer size and redesign them for better locality.
  I suspect this is the real reason why this ABI has not caught wind: anyone who cares has already taken steps that render it pointless.
  Exactly. Their target audience learned to use a single array of structs with a single pointer instead of allocating thousands of individual objects and tracking their individual pointers. Then they can just use a which ever power of two offset they want. 16bit offsets if they want, that's even smaller than 32bit pointers. A single large array should have better page hits than a bunch of objects. The allocator can easily see a single large allocation and use a large 2MB page instead of 4KB.
45. Re:Subject by Bengie · 2013-12-25 09:08 · Score: 2
  
  Yes and no. The larger your cache, the higher its latency. Can't get around this. L1 caches tend to be small to keep the execution units fed with typically 1 or 2 cycle latencies. L2 caches tend to be about 16x larger, but have about 10x the latency.
  
  L2 cache may have high latency, but it still has decent bandwidth. To help hide the latency, modern CPUs have automatic pre-fetching and also async low-priority pre-fetching instructions that allow programmer to tell the CPU to attempt to load data from memory into L1 prior to needing it, and only if the CPU finds an open slot for memory access.
  
  After a certain size, "normal" cache is slower than main-memory. That's why we're starting to see integrated eDRAM, which is mostly just system memory built into the CPU or package. The other issue you need to be careful about is each layer of cache adds accumulative fixed latency.
  
  The easiest way to hide high latency is to have lots of concurrent work going on. Hyper-threading banks a lot on this. When one virtual code is stalling on memory access, the other code can step in and make use of any free execution units on a per cycle basis. Because there are lots of units, there is usually an idle one somewhere that can be used. Ironically, having two virtual cores sharing the same resources means resources are split, primarily the L1 cache. While hyper-threading helps hide latency caused by memory access, it also increases the chance of an address getting evicted from L1.
  
  To help with this, Intel increased the size of their L1 cache on the more recent CPUs, but this also increased the latency from 1 cycle to 2 cycle. To help compensate, they increased the bandwidth of the L1 and allow larger loads. Twice the bandwidth, twice the size, but twice the latency. Single thread code takes a minor hit, but concurrent work stands to gain a decent amount.
  
  Increasing cache sizes is not as simple as it seems.
46. Re:Subject by Bengie · 2013-12-25 09:19 · Score: 1
  
  You can get DDR3-1600 16GB Crucial for $200 brand new with a life time warranty for parts and labor. That's $50/4GB. If you're paying $120/4GB, it's because you don't know how to shop or you'll void the warranty on your over priced POS server.
47. Re:Subject by Bengie · 2013-12-25 09:20 · Score: 1
  
  Sorry, DDR3-1600 16GB ECC Registered
48. Re:Subject by Bengie · 2013-12-25 09:22 · Score: 1
  
  Don't complain about memory prices when you company only buys from price gouging companies. Like I posted above, the going rate of DDR3-1600 ECC RDIMMs is about $50/4GB.
49. Re:Subject by macpacheco · 2013-12-25 10:28 · Score: 1
  
  Not the issue. For instance I'm using raspberry pi's as text console terminals, and I'm typing this on a chromebook ARM (Cortex A15 Exynos 5). It became my primary terminal, because for a 100% solid state computer (no moving parts, I can move it around when it's on, without ANY fear of breaking anything) it's dirt cheap.
  ECC is a nice feature, but I don't have customers demanding it, nor I had any cases of corrupt production data on a system without ECC. It should be a standard feature, with zero cost overhead, once it's a standard feature. Oh, it's not 12.5% extra cost. That's one parity bit for each 8 bits of data, ECC is 3 extra bits for every 8 bits (I'm not sure if it could be 3 extra bits for each 16 or even 32 bits), so it's more like 37,5% extra bits (assuming it works on a 8 bit level). It's been a long, long, long time since I studied ECC algorithms, but the 12.5% extra cost is parity for sure, not ECC. For instance if it worked on 16 bit level, it would be 18,75% overhead instead (3 extra bits for every 16 bits).
  This just goes to show most people criticizing x32 (definitely not all of them) don't have a clue what they're talking about, they seem to either the marketing people that work for companies that want to kill this, or people that have never learned assembly / low level computer stuff in detail. This discussion would be so much more productive if those uneducated people either educate themselves before talking about x32 or just went away.
  Oh, you can't call yourself a great C/C++ programmer if you don't have a full and correct understanding of CPU cache hits and stack frames which are the really important factors x32 affects. The hard RAM savings at the least important feature (except for embedded systems, but someone pointed very correctly the odds of an embedded system running an intel compatible system is one in a thousand or less). That's ARM / MIPS cpu land.
  The trend is quite the opposite, ARM is coming to the notebook / desktop world, and there's no stopping it (although it won't take over, it will be a slow movement).
50. Re:Subject by jabuzz · 2013-12-25 11:42 · Score: 1
  
  Anyone doing embedded on x86 and is worried about memory is doing it all wrong anyway. The embedded systems that are memory constrained are ARM and MIPS not x86.
  So yes embedded systems are memory constrained, but why on earth would an x86 system be memory constrained.
51. Re:Subject by haruchai · 2013-12-25 13:48 · Score: 1
  
  Big company bureaucracy - 13000 users, lots of compliance to meet, rules to follow.
  
  --
  Pain is merely failure leaving the body
52. Re:Subject by oursland · 2013-12-25 18:21 · Score: 1
  
  then you need two copies of every shared library that they use to be mapped at the same time.
  
  No, you don't. The libraries will be paged in on-demand. If you never use more than 4 GB address space within a single program, the amd64 libraries will never be paged in. Typically, I-cache lines will be driven by LRU, so you'll have them filled also on a on-demand basis.
53. Re:Subject by macpacheco · 2013-12-25 18:42 · Score: 1
  
  I both agree and disagree with you. Since you aren't explaining, I'll explain for you.
  If you're trying to say that it would be very bad to have many more physically different processors, than yes, agreed 100%.
  As a matter of fact, we should have even less CPU models. For instance, mobile / low power / desktop / server CPUs. STUPID
  The fair way to do it would be having core i5 small cache, core i5 large cache, core i7 small cache, core i7 large cache. The current core i7 users would use the small cache model (large by core i5 standards) and the servers would use either core i5 large cache or core i7 large cache. Get rid of the core i3 altogether.
  But the same CPU would have ECC standard, and would be able to be configured at boot time to limit CPU frequencies, limit GPU frequencies, limit voltages (low frequency + low external voltage = ultra low power), overclock, you name it.
  After Intel killed AMD's mojo with plays taken by Microsoft and IBM at their darkest times, ARM has a good chance of eating Intel's lunch from the bottom up. Actually ARM licensees, since ARM design CPUs, but their licensees actually build them.
  I also believe that AMD will make an interesting come back in the next few years.
  95% of my customer server's are core iX processors, they rarely are willing to pay 4x more for ECC, hardware I/O cache, and even then they end up with SATA drives anyways !
  BTW, I'm not saying Intel will do it out of good intentions, only severe competition will make them do it. It will reduce profits.
  BTW, the argument that Intel has reserve juice to do what they did when AMD got close them, I'm not so sure.
  Linux / Android / iOS is making Intel CPUs no longer some gold standard due to compatibility. Linux / gcc / binutils already supports almost every CPU on the planet. Anything that runs on Intel and we have the source, can be ported to ARM, MIPS and PowerPC (the only relevant CPU architectures left, all the others are dying, even PowerPC is a little bit of a stretch).
54. Re:Subject by macpacheco · 2013-12-25 18:49 · Score: 1
  
  You probably don't understand the ARM ecosystem...
  ARM only design the CPUs, and license them.
  ARM licensees actually build the CPUs.
  So in the ARM world, there are at least 6 strong players. And 95+% of the costs / profits are on making the CPUs and the whole system, so the competition is a GIVEN.
  Cortex A15 have 5 licensees.
  Cortex A57 have 6 licensees.
  And those are the high end CPUs. The middle end / low end have even more licensees.
  In a way, the ARM model is fantastic for the consumers, much better than the Intel / AMD duopoly will ever be.
55. Re:Subject by hobarrera · 2013-12-26 18:06 · Score: 1
  
  Water is cheap too. Should I just leave the tap running all day long?
  Yes, memory is cheap, but that philosphy of not-caring because it's cheap is what has lead to incredibly bloated programas nowadays. What huge features does MSO 2010 have, that justify the over 10x space and memory used in comparison to MSO '95? Prettier looking, a few extra features. But that's essentialy it. A that's just a normal, random example too.
of course not by rhubarb42 · 2013-12-24 11:26 · Score: 1

no. time will fairly quickly diminish the value as 64 bit cpus get faster.
1. Re:of course not by ShanghaiBill · 2013-12-24 11:57 · Score: 2
  
  Who would want this, some niche embedded guys?
  Not many NEGs are using 64 bit processors, and this ABI offers too little advantage to bother with. Most embedded systems run a single primary process. If that process fits in a 4GB address space (as is required to use this ABI), then the system would just use a native 32 bit ABI on a 32 bit CPU, not this 32 bit ABI on a more expensive 64 bit CPU.
2. Re:of course not by petermgreen · 2013-12-25 07:52 · Score: 1
  
  I thought the interest in this was coming from vendors of budget VM/cloud hosting. When you are stacking massive numbers of VMs on a host ram can become a limiting factor on how many VMs you can put on a host but at the same time it would be nice to actually take advantage of those x64 cores and since you are running lots of different things rather than lots of copies of one big thing fixing the software to be less pointer heavy is not really a practical option.
  BUT having to have two copies of all the libraries in each VM would kill the advantages, so to really take advantage of this they need to first get a good quality port of a full linux userland, then fight through the politics of actually getting it included in at least one major distro (an unofficial port of sid isn't going to cut it for pursuading people it's a viable and maintainable option for their servers).
  One complication with x32 is that in terms of conditional defines it looks much like x86-64. So this means that a lot of software will try and fail to use x86-64 assembler and will therefore need fixing (either to port or disable the assembler, prefferablly the former).
  
  --
  note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
3. Re:of course not by Bengie · 2013-12-25 09:26 · Score: 1
  
  It's entirely a data locality issue and L1 caches aren't getting much larger.
Eh? by fuzzyfuzzyfungus · 2013-12-24 11:27 · Score: 3, Insightful

If I wanted to divide my nice big memory space into 32-bit address spaces, I'd dig my totally bitchin' PAE-enabled Pentium Pro rig out of the basement, assuming the rats haven't eaten it...
1. Re:Eh? by Phydeaux314 · 2013-12-24 12:34 · Score: 1
  
  http://ask.slashdot.org/story/09/02/12/2115242/how-to-keep-rats-from-eating-my-cables
  
  --
  Never underestimate the stupidity inherent in all human beings.
2. Re:Eh? by hey! · 2013-12-24 14:53 · Score: 1
  
  I assumed "it" referred to his basement.
  
  --
  Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
3. Re:Eh? by Pinhedd · 2013-12-24 16:03 · Score: 1
  
  I used to have pet rats. Rats will eat anything
4. Re:Eh? by oursland · 2013-12-24 20:48 · Score: 1
  
  That's a pretty ignorant stance. Your PPro didn't have the expressive abilities of the newer registers, instructions, and enhanced SIMD systems, let alone the performance improvements.
5. Re:Eh? by fuzzyfuzzyfungus · 2013-12-25 03:08 · Score: 1
  
  Oh, the PPro is worse in basically every conceivable way, with 32-bit address spaces being just one of its sins compared to later designs; my point was that (barring possible niche applications, like the next few years when phones people care about still have less than 4GB of RAM/memory mapped peripherals, or Very Large workloads that break down nicely and are large enough that the RAM savings actually pays for enough programmers to make up for the hassle and still leave some extra cash) taking a perfectly good AMD64/EMT64 connected to a huge chunk of cheap RAM and voluntarily pretending that it's 1995 and you can't afford one of them fancy RISC workstations just seems a little sick.
  
  I do recognize that there are specific areas that could benefit (though apparently not enough to show much life on the kernel development side); but for virtually everything, people choose to throw a bit of extra silicon rather than more expensive humans at the problem almost every time.
6. Re:Eh? by foobar+bazbot · 2013-12-25 06:26 · Score: 1
  
  This bit:
  
  when phones people care about still have less than 4GB of RAM/memory mapped peripherals,
  suggests you may not understand the issue. x32 lets you have lots of RAM, but each process only has a 4GB address space. So if you have 8GB of RAM, you can have a half-dozen processes occupying 1-2GB each.
  Maybe you're aware of this and just discount the importance. After all, Android seems to be designed to ensure every app is periodically killed and has to be reloaded from disk periodically -- I'm not sure if this is a proactive defense against memory leaks, an effort to ensure programmers don't neglect proper state-saving just because their device has a lot of RAM, or what -- so the obvious benefit of leaving multiple apps resident so you can swap quickly doesn't materialise on that platform. But it could make sense for > 4GB with OSes that aren't Android, or with a hypothetical future version of Android that's more gentle with the task-killing.
7. Re:Eh? by TheOneFreeman · 2013-12-25 07:16 · Score: 1
  
  Even rats want Pentium Inside?
8. Re:Eh? by oursland · 2013-12-25 18:27 · Score: 1
  
  This is not about saving RAM, but improving cache utilization. It's not that hard to simply roughly double the existing cache by reducing the size of pointers. As caching is THE most performance improving optimization available on CPUs, this has profound improvements on run times. x32 can still support systems with many, many GBs, but each individual process space is limited to 4 GB.
9. Re:Eh? by fuzzyfuzzyfungus · 2013-12-25 23:32 · Score: 1
  
  I apologize if my intent wasn't clear; but I drew the 4GB distinction not because I assumed that x32 would limit the system as a whole to 4GB; but because that's the point where limiting a process to 4GB becomes an actual tradeoff:
  
  If you don't even have 4GB of RAM, your application isn't going to have a chance to use more than 4GB, because they don't exist, so being 'limited' to a 32-bit address space is mostly irrelevant, since you'll bump into physical limits first (it might unacceptably reduce the amount of entropy that various ASLR schemes can bring to bear, since the layout randomization is happening within a vastly smaller memory space; but I definitely don't understand the strengths and weaknesses of that stuff well enough to say whether that is real or almost wholly theoretical, or irrelevant).
  
  If you do have more than 4GB, accepting a 4GB process limit becomes a real tradeoff, since you won't hit a physical limit first. For many processes, it won't be a difficult choice, the world is rotten with little helper processes and things that should never consume more than a few MB under any conditions; but for some borderline cases it could be an issue that you have to think about, test, etc.
You got it. by Qwertie · 2013-12-24 11:32 · Score: 1

Some people don't see the ABI as being worthwhile when it still requires 64-bit processors
There's your answer. If I'm writing a program that won't need over 2GB, the decision is obvious: target x86. How many developers even know about x32? Of those, how many need what it offers? That little fraction will be the number of users.
1. Re:You got it. by loufoque · 2013-12-24 11:54 · Score: 1
  
  This way you'll be able to make it magically much faster when building it for x32 or amd64.
2. Re:You got it. by Chalnoth · 2013-12-24 12:50 · Score: 1
  
  True. But for the vast majority of applications, that greater number of registers only translates into a small performance increase. I can potentially see x32 being useful for a rather small amount of heavily hand-optimized code (e.g. a massively optimized math or physics library), but for the vast majority of applications this performance benefit will be tiny.
  To me, the real problem for the adoption of x32 is that so few programs on PC's need to worry that much about optimization. When it does become worthwhile for them to worry about optimization, there are likely to be many things that are more worthwhile to tackle for improving performance (e.g. algorithmic inefficiencies, using excessive I/O).
3. Re:You got it. by VortexCortex · 2013-12-24 15:07 · Score: 1
  
  Some people don't see the ABI as being worthwhile when it still requires 64-bit processors
  There's your answer. If I'm writing a program that won't need over 2GB, the decision is obvious: target x86. How many developers even know about x32? Of those, how many need what it offers? That little fraction will be the number of users.
  Wait, what are you talking about? "target x86" Wat? Are you writing code in Assembly? How do you target C or higher level code code for x86 vs x86-64, or ARM for that matter?
  Ooooh, wait, you're one of those proprietary Linux software developers? Protip: 1's and 0's are in infinite supply, so Economics 101 says they have zero price regardless of cost to create. What's scarce is your ability to create new configurations of bits -- new source code -- not the bits. Just like a mechanic, home builder, burger joint, or any other labor market you do the work once, you get paid once for your work and then you do more work to get more money. If you base a business on artificial scarcity you're going to have a bad time, mkay?
4. Re:You got it. by Carewolf · 2013-12-25 01:58 · Score: 1
  
  The extra registers "only" make x32 10-15% faster than x86-32. The main speed-up comes from requiring AMD64 which ensure it has SSE2 and can use SSE math instead of x87 math. Of course you could get the same but getting rid of i686 as you base and jumping to requiring SSE2. Going from x87 to SSE is 50-100% faster even without any vectorizaton.
Nice concept by Anonymous Coward · 2013-12-24 11:34 · Score: 3, Insightful

I do not see many cases where this would be useful. If we have a 64-bit processor and a 64-bit operating system then it seems the only benefit to running a 32-bit binary is it uses a slightly smaller amount of memory. Chances are that is a very small difference in memory used. Maybe the program loads a little faster, but is it a measurable, consistent amount? For most practical use case scenarios it does not look like this technology would be useful enough to justify compiling a new package. Now, if the process worked with 64-bit binaries and could automatically (and safely) decrease pointer size on 64-bit binaries then it might be worth while. But I'm not going to re-build an application just for smaller pointers.
1. Re:Nice concept by mjrauhal · 2013-12-24 11:52 · Score: 1
  
  You misunderstand the desired impact. "Loads a little faster" doesn't really enter into it. It's rather that system memory is _slow_, and you have to cram a lot of stuff into CPU cache for things to work quickly. That's were the smaller pointers help, with some workloads. Especially if you're doing a lot of pointery data structure heavy computing where you often compile your own stuff to run anyway.
  Still not saying it's necessarily worth the maintenance hassle, but let's understand the issues first.
2. Re:Nice concept by maswan · 2013-12-24 11:54 · Score: 2, Informative
  
  The main benefit is that it runs faster. 64-bit pointers take up twice the space in caches, and especially L1 cache is very space-limited. Loading and storing them also takes twice the bandwidth to main memory.
  So for code with lots of complex data types (as opposed to big arrays of floating point data), that still has to run fast, it makes sense. I imagine the Linux kernel developers No1 benchmark of compiling the kernel would run noticably faster with gcc in x32.
  The downside is that you need a proper fully functional multi-arch system like is slowly getting adopted by Debian in order to handle multiple ABIs. And then you get into iffy things on if you want the faster /usr/bin/perl or one that can handle 6-gig lists efficiently...
3. Re:Nice concept by loufoque · 2013-12-24 11:56 · Score: 1
  
  Any application that does heavy-numerical computation should not be affected by much by the ABI if at all. All function calls are inlined inside the critical loop.
4. Re:Nice concept by cnettel · 2013-12-24 12:17 · Score: 1
  
  Any application that does heavy-numerical computation should not be affected by much by the ABI if at all. All function calls are inlined inside the critical loop.
  The ABI here also defines the size of all pointers. All pointers are 32-bit here. Any purely compute intensive application will not be affected much, but something including some complexity in data structures, with pointers, could possibly benefit a lot. On the other hand, if all your code does is traversing trees, you should seriously consider allocating them in one bunch and using internal indices (of smaller integer type) rather than native pointers anyway.
5. Re:Nice concept by sribe · 2013-12-24 12:29 · Score: 2
  
  So for code with lots of complex data types (as opposed to big arrays of floating point data), that still has to run fast, it makes sense.
  Well, here's the problem. Code that is that performance-sensitive can often benefit a whole lot more from a better design that does not have so many pointers pointing to itty-bitty data bits. (For instance, instead of a binary tree, a B-tree with nodes that are at least a couple of cache lines, or maybe even a whole page, wide.) There are very very few problems that actually require that a significant portion of data memory be occupied by pointers. There are lots and lots of them where the most convenient data structure uses lots of pointers, but if you're going to optimize how much you can cram in cache at once, eliminating pointers is better than shrinking them. Also, in many cases (such as the example I mentioned earlier), chunking things instead of pointers to individual items can greatly improve locality of access. And finally, of course, the irony is an awful lot of problems that are so performance-sensitive need the high performance precisely because they're dealing with large amounts of data. So yeah, it could be useful--but the problems where it is really useful are probably extremely limited.
  
  The downside is that you need a proper fully functional multi-arch system like is slowly getting adopted by Debian in order to handle multiple ABIs. And then you get into iffy things on if you want the faster /usr/bin/perl or one that can handle 6-gig lists efficiently...
  You also get into the problem that having two sets of libraries in use is not exactly good for cache pressure ;-)
6. Re:Nice concept by loufoque · 2013-12-24 12:42 · Score: 1
  
  Number crunching rarely involve any pointers in the critical parts, the only exception I can think of is sparse matrices, which is actually usually done with fixed-size indexes rather than pointers.
  Game engines however probably have a lot of trees of pointers for their scene graph, so they could be affected. But if they're well-optimized, they're designed to that each level fits exactly inside a cache line, and changing the size of the pointers will mess that up.
7. Re:Nice concept by LWATCDR · 2013-12-24 12:54 · Score: 1
  
  Simple.
  It is just as fast.
  Takes less drive space.
  Uses less memory.
  As to rebuilding apps it should be just a simple compile and yes while memory is cheap it is not always available even today. What about x86 tablets on Atom? I mean really does ls need to be 64bit what about more?
  
  --
  See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
8. Re:Nice concept by Rockoon · 2013-12-24 13:02 · Score: 2
  
  64-bit pointers take up twice the space in caches, and especially L1 cache is very space-limited.
  
  L1 cache is typically 64KB, which is room for 8K 64-bit pointers or 16K 32-bit pointers. Now riddle me this.. if you are following thousands or more pointers, what are the chances that your access pattern is at all cache friendly?
  
  The chance is virtually zero.
  
  Of course, not all of the data is pointers, but that actually doesnt help the argument. The smaller the percentage of the cache that is pointers, the less important their size actually is, for after all when 0% are pointers then pointer size cannot have any performance impact.
  
  So the best case for your argument is when there are literally 8192 pointers sitting in the cache, where you would be able to instead fit 16384 pointers if they were 32-bit. But surely the act of following 16384 pointers in your access pattern is actually going to make the L1 cache 100% completely moot with a cache miss at literally every follow...
  
  --
  "His name was James Damore."
9. Re:Nice concept by Megol · 2013-12-25 07:33 · Score: 1
  
  So where does the performance increases for some code using the x32 model comes from? Surely it couldn't be from increased data cache hit ratios? (It is - use performance counters for verification)
It has some value for embedded systems by Anonymous Coward · 2013-12-24 11:37 · Score: 1

It has value for embedded cost-sensitive systems, of which there are many.
If it came out a few years earlier, it would have been more prevalent.
1. Re: It has some value for embedded systems by jmauro · 2013-12-24 12:03 · Score: 1
  
  I think the embedded systems that need this would be better off just getting a faster 32-bit processor.
2. Re:It has some value for embedded systems by Pinhedd · 2013-12-24 16:05 · Score: 2
  
  wrong architecture.
  Cost sensitive embedded systems use ARM based microprocessors to which this is not applicable.
Who cares if I'll use it? by 93+Escort+Wagon · 2013-12-24 11:37 · Score: 4, Interesting

The maintainer(s) find it interesting, and they're developing it on their own dime... so I don't get the hate in some of these first few posts. No one's forcing you to use it, or even to think about it when you're coding something else.
If it's useful to someone, that's all that matters.

--
#DeleteChrome
1. Re:Who cares if I'll use it? by turkeydance · 2013-12-24 12:04 · Score: 1
  
  "not catching wind"? now, there's an open to interpretation analogy.
2. Re:Who cares if I'll use it? by Phydeaux314 · 2013-12-24 12:37 · Score: 1
  
  Yeah, you can't use sailing analogies here. Cars only.
  
  --
  Never underestimate the stupidity inherent in all human beings.
Re:Stupid by mjrauhal · 2013-12-24 11:39 · Score: 2

x32 at least has some merit, unlike your grasp of the history of computing. (Just not very much and probably not worth the trouble; you can probably relate.)
It's not only RAM by jandar · 2013-12-24 11:41 · Score: 4, Informative

The company I work for compiles almost all programms with 32 bits on x86-64 CPUs. It's not only cheap RAM usage, it's also expensive cache which is wasted with 64 pointer and 64 bit int. Since 3 GB is much more than our programms are using, x86-64 would be foolish. I'm eager waiting for a x32 SuSE version.
1. Re:It's not only RAM by Austerity+Empowers · 2013-12-24 15:51 · Score: 2
  
  Your comment reminded me of what Larry Wall, inventor of the wrecking ball, said about Miley Cirus:
  "Leeeeroooooy Jenkins!"
2. Re:It's not only RAM by Ecuador · 2013-12-24 16:48 · Score: 1
  
  I don't get it. x86-64 doubles the general purpose and SSE registers over x86. This alone makes a (usually quite big) difference even for programs that don't use 64bit arithmetic. The point of the x32 ABI as I understand it is to keep that advantage without having 64bit pointers.
  But you just compile with 32bits losing all the advantages of x86-64?
  
  --
  Violence is the last refuge of the incompetent. Polar Scope Align for iOS
3. Re:It's not only RAM by jandar · 2013-12-24 22:00 · Score: 1
  
  But you just compile with 32bits losing all the advantages of x86-64?
  Yes. Our data-structures are very pointer-heavy and thus not cache-friendly. The most critical hardware part of our servers is the cache.
4. Re:It's not only RAM by Guy+Harris · 2013-12-25 07:08 · Score: 1
  but.. I've heard* that it is generally better to compile in 64-bit mode, because the 32-bit part of the CPU is "legacy" and potentially less efficient than the 64-bit operations.
  I hate watching videos, but I looked at the slides from the presentation, and slide 16 has, as one of its points, "Prefer 64-bit code, 32-bit data", which doesn't sound very consistent with "oh, the 32-bit stuff is legacy".
  On x86, some reasons to prefer 64-bit code are that you have twice as many registers (although instructions that use the 8 new registers are one byte longer, as they need an additional instruction prefix to add additional register specifier bits), and that, as you're allowed to break binary compatibility when going 64-bit, the calling sequence was changed to support passing parameters in registers. Those have nothing to do with the 32-bit stuff being potentially-slower "legacy".
  In fact, if you're talking about processors that don't have "Itanium" in the name, it's not clear what "the 32-bit part of the CPU" is:
  
  for most RISC architectures, and for {System/3x0}/{z/Architecture}, the 64-bit version of the instruction set is just a widened flavor of the 32-bit version and the same data paths, registers, and instruction decoder can be used in 32-bit and 64-bit mode;
  
  for x86, "the 32-bit part of the CPU" would be the lower half of 8 of the GPRs, the program counter, and the ALU(s), and the parts of the instruction decoder that handle the prefix used to get at extra registers as INC/DEC instructions with embedded register numbers, so the data paths, half of the registers, and almost all of the instruction decoder are used in both modes;
  
  for ARM, "the 32-bit part of the CPU" would be the lower half of 16 of the GPRs, the program counter, and the ALU(s), and the part of the instruction decoder that handles AArch32 instructions, so a lot of that is used in both modes;
  so I'm not sure why "the 32-bit part of the CPU" would be "legacy" and possibly slower.
Re:Stupid by s.petry · 2013-12-24 11:42 · Score: 2

I would not go that far since I'm sure a special case may exist, but that's exactly what it would be for. Hence the 'no massive wide scale adoption' or 'applications written for this' becomes an (what should be) obvious outcome.
If I'm custom Joe and see a workload that benefits from 32 vs. 64bit OS constraints I load a 32bit OS. The reason we went to larger memory however means those special cases are extremely rare today. They happen more because "we can't get new hardware" than by choice.

--
-The wise argue that there are few absolutes, the fool argues that there are no probabilities.
x32 is a premature optimization by bheading · 2013-12-24 11:43 · Score: 3, Interesting

The idea makes sense in theory. Build binaries that are going to be smaller (32-bit binaries have smaller pointers compared with 64-bit) and faster (because the code is smaller, in theory cache should be used more efficiently and accesses to external memory should be reduced).
But I suspect the problem is that the benefits simply outweigh the inconvenience of having to run with an entirely separate ABI. I doubt the average significant C program spends a lot of time doing direct addressing, and as such I suspect the size benefits of using 32-bit pointers is overstated.
1. Re:x32 is a premature optimization by mysidia · 2013-12-24 11:52 · Score: 1
  
  But I suspect the problem is that the benefits simply outweigh the inconvenience of having to run with an entirely separate ABI.
  Well; if the benefits outweigh the inconvenience --- then it seems x32 should be catching on more than it is.
  Personally I think it is a bad idea because of the 4GB program virtual address space limit; which applications will be frequently exceeding, especially the server applications that would otherwise benefit the most from optimization.
2. Re:x32 is a premature optimization by bheading · 2013-12-24 13:41 · Score: 1
  
  Oops, I meant the other way round. The inconvenience outweighs the benefit.
3. Re:x32 is a premature optimization by VortexCortex · 2013-12-24 15:15 · Score: 1
  
  and faster (because the code is smaller, in theory cache should be used more efficiently
  
  Your skill is Not enough. when you blow registers onto the stack the code crawls. x86-64 has more registers. Code compiled for is far faster than x86 because of the extra registers. The L1 cache is how big on your CPU? Is your binary MEGABYTES in size? If your code is jumping all over the digital universe generating cache misses then you're purposefully doing something more idiotic than this universe should care about.
4. Re:x32 is a premature optimization by tlhIngan · 2013-12-24 18:33 · Score: 1
  
  Personally I think it is a bad idea because of the 4GB program virtual address space limit; which applications will be frequently exceeding, especially the server applications that would otherwise benefit the most from optimization.
  
  You're making an assumption that the 4GB limit is prohibitive. For some applications, it could be - databases and scientific processing, and definitely games. But there are plenty of other applications that won't really benefit from the enlarged address space - would a word processor benefit? A 1GB word processor document is a fair amount of text - probably close to the point where it's best to split the document up for easier managability.
  Or various streaming processing algorithms - like DSP which can benefit from the added registers, and really, don't consume much RAM at all (since the data streams in and out).
  Sure, one reason to go 64-bit is to bust the 4GB limit. Another reason is to get speed advantages because x64 offers more registers and other stuff.
  Heck, iOS probably uses something like that to get the insane speedups because the ARM AArch64 is a MUCH faster architecture, yet every iOS device shipping now only has 1GB of RAM. Here 64-bit is for speed, not memory.
5. Re:x32 is a premature optimization by Narishma · 2013-12-24 23:55 · Score: 1
  
  Very few games benefit from more than 4GB of RAM. Most do just fine with 2GB or less.
  
  --
  Mada mada dane.
6. Re:x32 is a premature optimization by funky_vibes · 2013-12-25 02:44 · Score: 1
  
  Only people who run binary distros think it's an "inconvenience" to have separate ABIs.
  64bit pointers are a ridiculous waste of memory if you have less than 4G RAM, which is most embedded systems. It increases part cost since you needlessly have more flash to store your binaries.
7. Re:x32 is a premature optimization by mysidia · 2013-12-25 04:23 · Score: 1
  
  But there are plenty of other applications that won't really benefit from the enlarged address space - would a word processor benefit?
  Basically most server-side applications will benefit and use large memory address spaces already; especially .NET and Java-based applications; Mail servers, DB Servers, storage servers of various kinds.
  Microsoft Word? Definitely.
  But Word processors, and most desktop applications these days are web-based -- with a greater and greater portion "moving to the cloud" and becoming web-only day by day. Web browsers need large memory address space to cache rendering information about increasingly complex web pages; many users also open a number of simultaneous windows (or tabs) and expect fast instantaneous switching between them --- requiring yet even more cache; 4GB of RAM cache is a paltry amount by today's standards.
8. Re:x32 is a premature optimization by mysidia · 2013-12-25 04:27 · Score: 1
  
  Very few games benefit from more than 4GB of RAM. Most do just fine with 2GB or less.
  Most games these days are Flash games; or Javascript/HTML5. Those that are native are largely GPU-bound.
  Flash games and such don't benefit from x32 or shorter pointers, either.
  Games are not written for CPU performance. You are not optimizing for games, by using shorter pointers.
9. Re:x32 is a premature optimization by Guy+Harris · 2013-12-25 07:31 · Score: 1
  
  x86-64 has more registers.
  ...than IA-32 or whatever you want to call it; it does not have more registers than x32.
Maybe by cold+fjord · 2013-12-24 11:50 · Score: 1

It depends on the delta. There are still many 32bit problems out there, and there are plenty of cases where having extra performance helps. If you have enough of the right size problems you could even reduce the number of systems that you would need.
It looks like it could allow packing a single system tighter with less wasted resources.
Reducing the footprint of individual programs could also have some benefits from system performance / management, especially in tight resource situations.
One minor drawback is that you would need to structure your user execution and runtime environment to account for the additional executable format.
Pulling some of the architectural advantages of the 64bit architecture (number of registers, etc.) into 32bit land should be gravy. A lot of that will depend on exactly how they behave in 32bit mode.

--
much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell
Try it by KiloByte · 2013-12-24 11:56 · Score: 2

debootstrap --arch=x32 unstable /path/to/chroot http://ftp.debian-ports.org/debian/
Requires an amd64 kernel compiled with CONFIG_X86_X32=y (every x32-capable kernel can also run 64 bit stuff).

--
The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
Great for smart phones by MobyDisk · 2013-12-24 12:00 · Score: 1

This could have a home on smart phones. A smaller memory footprint is *key* on smartphone apps.
ARM by tepples · 2013-12-24 12:01 · Score: 1

I thought "embedded cost-sensitive systems" would be using ARM CPUs, not Intel or AMD x86-64 CPUs with 32-bit pointers.
Seems reasonable. by gallondr00nk · 2013-12-24 12:03 · Score: 1

There's plenty of applications around still without a 64 bit binary. From what I understand this layer just allows 32 bit programs to utilize some performance enhancing features of 64 bit architecture. It seems a genuinely good idea.
1. Re:Seems reasonable. by cnettel · 2013-12-24 12:21 · Score: 1
  
  There's plenty of applications around still without a 64 bit binary. From what I understand this layer just allows 32 bit programs to utilize some performance enhancing features of 64 bit architecture. It seems a genuinely good idea.
  It allows 32-bit programs, which are *recompiled*, to benefit from those features. You still need the source and x32 builds of all dependencies. However, sometimes I guess there could be porting issues due to pointer size assumptions (but no other hard assumptions of x86 ABI behavior). Those codebases could not be recompiled for x64, but might port to x32 more easily.
2. Re:Seems reasonable. by Whip · 2013-12-25 14:21 · Score: 1
  
  So, I'm not sure about how the current linux implementations work, but when Solaris went 64-bit, they added an optimization where when you run an *unchanged* 32-bit executable, the libc would recognize that it's on a better processor, and use the improved features of the processor in places that it could, for performance. So, for example, if you called memcpy(), it would use 64-bit load/store instructions (and registers) to copy, giving you twice (or more) the performance for those calls, with no changes to your old code -- you don't even need the source code available!
  Is this at all what is intended to be possible with the x32 implementations on linux, now? That would be an additional advantage that I haven't seen mentioned yet.
3. Re:Seems reasonable. by cnettel · 2013-12-26 23:10 · Score: 1
  
  Not really. For an x32 binary, you would have x32 libc and use all the fancy features. For an x86 binary running on an AMD64 processor, you are still stuck in "compatibility mode" on the processor, even when you enter libc, which means you can only use actual x86 instructions (with the smaller register file etc). It is my impression that on-the-fly switching between long and compatibility mode within the same proecss would still incur a cost that's comparable to (at the very least) a kernel mode transition, so the benefits would only exist for very few operations. Large memcpys wouldn't be among them, since the x86 vector instructions are actually quite fine for that purpose.
Too little, too late by TeknoHog · 2013-12-24 12:14 · Score: 1

x32 would have been nice as the first transition away from x86-32, but memory needs keep increasing, and we are far too used to full 64-bit spaces. In fact, it feels like we're finally over with the 32-64 bit transition, and people no longer worry about different kinds of x86 when buying new hardware. So introducing this alternative is a needless complication. As others have pointed out, it's too special a niche to warrant its own ABI.

--
Escher was the first MC and Giger invented the HR department.
1. Re:Too little, too late by Reliable+Windmill · 2013-12-24 12:34 · Score: 1
  
  It's not a complication, it's an enhancement. A majority of software does not need a 64-bit address space and can thus be streamlined while still getting the benefits of doing fast 64-bit integer math, among other things. Obviously you just select the target when compiling and that's that, it's like enabling an optimization, so what are you talking about?
  
  --
  Signature intentionally left blank.
2. Re:Too little, too late by TeknoHog · 2013-12-24 23:35 · Score: 1
  
  Obviously you just select the target when compiling and that's that, it's like enabling an optimization, so what are you talking about?
  If it's that easy, then I'm all for it :)
  IMHO, having different sets of libraries for the different ABIs is a kind of hassle -- we have it now for running i386 binaries on x86-64, and it's not pretty if we need to add a third set. Thus the argument about cache efficiency is moot, as explained in another post.
  Then again, as long as there are people interested in developing and using it, I'm not complaining, even if I wouldn't use it myself. x32 should be great for something like a DVR where the software selection is more limited and you need all the efficiency you can get, so you only need one set of libraries.
  
  --
  Escher was the first MC and Giger invented the HR department.
Re:Wont use Linux without it! by Anonymous Coward · 2013-12-24 12:18 · Score: 2, Insightful

My dad drives a Ford and your dad drives a Chevy. Your dad sucks.
Didn't we do this already? Like when we were twelve years old.
Is kernel still 64bit? by ThePhilips · 2013-12-24 12:20 · Score: 1

General question about x32 ABI: is the OS still can use more than 4GB RAM w/o penalties? IOW, is kernel still 64bit? Only userspace is x32? Or x32 and pure 64-bit can run alongside?
Anyway. Most performance-sensitive programs went 64-bit anyway - since RAM is cheap and there are bunch of faster but memory-hogging algorithms.

--
All hope abandon ye who enter here.
1. Re:Is kernel still 64bit? by mjrauhal · 2013-12-24 12:36 · Score: 1
  
  The kernel needs to be an amd64 one for x32 to work, at least as things stand now. The most common situation would _probably_ be an amd64 system with some specialist x32 software doing performance intensive stuff. (Or possibly a hobbyist system running an all-x32 userspace for the hack value.)
  Yeah, working with big data is unlikely to benefit, and data _is_ generally getting bigger.
2. Re:Is kernel still 64bit? by Reliable+Windmill · 2013-12-24 12:38 · Score: 1
  
  Of course the OS is still 64-bit in that regard, it's just the address space of that particular application which is reduced to 32-bit to streamline it. The majority of all executable files do not require several gigabytes of RAM, hence it makes sense to streamline their address space.
  
  --
  Signature intentionally left blank.
3. Re:Is kernel still 64bit? by ThePhilips · 2013-12-24 12:59 · Score: 1
  
  The majority of all executable files do not require several gigabytes of RAM, hence it makes sense to streamline their address space.
  
  I know that. Many commercial *NIX systems are doing it. Though... Having a 32-bit "cat" doesn't really changes anything.
  That why I have mentioned the memory hungry algorithms. Many applications are doing it this days. Needless to mention that java this days is started almost exclusively with the "-d64".
  The market for 4GB address space is really small. Because modern general programming practices generally disregard the resources in general, RAM in particular. (The (number of) CPUs being the most disregarded resource.)
  
  --
  All hope abandon ye who enter here.
4. Re:Is kernel still 64bit? by VortexCortex · 2013-12-24 16:51 · Score: 1
  
  I do some alternative OS development. When I setup a program to run there are 3 different 64bit modes (programming models) for me to select to run the program under: ILP64, LLP64, and LP64. In ILP64 you get 64 bit ints, longs, long longs, and pointers. In LLP64 you get 32bit longs and ints, and 64bit long longs and pointers. In LP64 you get 32bit ints, 64 bit longs, long longs and pointers. Note: All these pointers are 64 bit (but the hardware may have less bits than this, the OS will query it, code must have 64 bit pointers). There are also 16bit and 32bit programming models. In 16bit mode you can still access 32bit instructions (if the hardware has it), but are (typically) limited to 640KiB RAM (unless in unrealmode) since you have 20 effective bits of segmented addressing, In 32bit mode you can still access 64bit instructions, but are limited to 4GiB RAM (32bit pointers).
  So, yeah, the OS can still use 64bit pointers, while running code as 32 bit mode (w/ 32bit pointers). Accessing 64 bit operations in 32bit mode code can't run on 32 bit systems if it uses any 64 bit registers / opcode. However "x32" is x86-64 64bit mode, but only uses 32 bit pointers and data fields thus limiting programs to 4GiB memory.
  IIRC Windows is LLP64 and Linux distros are typically LP64. I use LLP64 in my hobby OS kernel. All my executable code is compiled to VM opcode and must be enrolled with the OS before it can be ran. Enrolling generates a separate signed private copy (and registers media types / public interfaces for IPC), and on install (if the code is trusted) it will be translated into native machine code (otherwise interpreted, esp. if in development or self modifiable). Electing to run the program under a different programming model and can run applications in x86, or x86-64 modes. This does cause a problem sometimes with programs that expect memory mapped saved data to be the same bit width between executions, but screw those idiotic programs: Storage is cross platform or bust -- OS offers a state serializer API, but foreign code doesn't know about / use it. The benefit is that the OS has far more control over the machine code, and all program "binaries" run on x86/x86-64 or ARM, since they're all cross platform bytecode -- There's no per-execution compilation overhead since it compiles on install (or model switch). This way is better because a traditional compiler is not prescient and will not produce a binary that will magically perform best in every environment forever, amen.
  I have something different than "x32". I can run in 32bit mode with 64bit extensions, thereby the code can be smaller if it doesn't use many 64bit registers or 64 bit pointers. I can also run in 64bit mode code with 32bit pointers (like x32), while still allowing 64 bit pointers. This way a single program can have "fast" pointers in "near" (under 4GiB) space while also having "slow" pointers using the full 64bit "far" space. 32bit ints simply update the lower bits of 64 bit pointers, and these have the high 32bits "segment extended" automatically by filling the pointer register with the program's base 64bit address. Pointers can be upcast to 64 bit pointers, (esp. for IPC), but downcast can throw a segfault. 64bit mode pointers are "far" by default, so a program must elect to have "near" pointers by declaring them such. Programs can use more than 4GiB, but can have at most 4GiB "near" RAM allocated at once -- It's up to the programmer to how they should handle "out of near memory" error -- They could allocate from far memory instead, but this complicates the program; The use case should typically be to use all near or all far pointers. The prime need for my take on "x32" being able to upcast from 32bit pointers to 64bit pointers in my OS is to call functions on other enrolled process interfaces via 64bit IPC.
  However, we're talking about a brain-dead Linux kernel so you can just keep quibbling about which platform dependent binary encoding is best to statically compile your "platform independent" source code into since that kernel isn't an OS (has no interpretor, compiler, debugger, file system, or editor built in).
5. Re:Is kernel still 64bit? by unixisc · 2013-12-24 23:16 · Score: 1
  
  Of course the OS is still 64-bit in that regard, it's just the address space of that particular application which is reduced to 32-bit to streamline it. The majority of all executable files do not require several gigabytes of RAM, hence it makes sense to streamline their address space.
  If the address space alone is reduced to 32-bit, is there a reason why it should be limited to 2GB, as opposed to 4GB? For native x32 only applications (on Pentium generation CPUs), I understand why they have to be 4GB, but for this case, where the underlying kernel is x64 and only user space is x32 to allow better memory utilization, why wouldn't the user space be able to utilize the entire 32-bits?
6. Re:Is kernel still 64bit? by Guy+Harris · 2013-12-25 07:39 · Score: 1
  
  General question about x32 ABI: is the OS still can use more than 4GB RAM w/o penalties?
  Yes.
  
  IOW, is kernel still 64bit?
  Yes.
  
  Only userspace is x32?
  Userland can be IA-32 (or whatever you want to call the old 32-bit-only instruction set, with 8 registers), x32, or x86-64.
  
  Or x32 and pure 64-bit can run alongside?
  Yes, just as IA-32 and x86-64 code can run (in separate processes, typically; dunno if anybody's done thunking to let 32-bit code call 64-bit code or vice versa) alongside each other.
7. Re:Is kernel still 64bit? by petermgreen · 2013-12-25 08:23 · Score: 1
  
  Given that linux already allows 32-bit x86 processes on a 64-bit kernel to use the full 32-bit address space I don't see why they wouldn't allow x32 processes to do the same.
  Windows limits 32-bit processes on a 64-bit kernel to 2GB by default as a precaution against sloppy pointer handling but this can be turned off by makring the program as large address aware.
  32-bit kernels generally limit user processes to significantly less than 4GB because it is more efficient to have kernel and user memory in different parts of the same address space than to perform an addresses space switch every time the kernel needs to read data from user mode memory. Having said that there did at one stage exist "4G/4G patches" for linux to allow user processes to use nearly all the adress space at the cost of making kernel access to user memory slower.
  * Specfically consider what happens if you convert two pointers to signed integers and then compare them.
  
  --
  note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
What about shared libraries? by billcarson · 2013-12-24 12:22 · Score: 3, Insightful

Wouldn't this require all common shared libraries (glib, mpi, etc.) to be recompiled for both x86-64 and x32? What am I missing here?
1. Re:What about shared libraries? by mjrauhal · 2013-12-24 12:32 · Score: 3, Informative
  
  Yes it would. That's among the nontrivial maintenance costs.
2. Re:What about shared libraries? by Arker · 2013-12-24 16:11 · Score: 1
  
  Funny thing I notice in articles of this sort. There are always comments saying it's dumb because there is no point in optimising software for performance because hardware is so cheap. And there are comments like yours, complaining that having to do a recompile to achieve it is too big a burden.
  Do you see the tension between the thoughts? Because if hardware is so cheap that it is more reasonable to tell the user to upgrade his computer, rather than optimise your software, then does it not follow that same line of thought that it will usually be no burden at all for a developer to compile a fresh binary?
  It's very little burden to me. I type make on one computer and use the other computer until its done. And I am a scrounger with relatively little computing power - there are probably a lot of preteens that have more/better hardware than I. So it hardly seems like it should be 'nontrivial' for someone who is actually running a business doing that very thing.
  
  --
  =-=-=-=-=-=-=-=-=-=-=-=-=-=-
  Friends don't let friends enable ecmascript.
3. Re:What about shared libraries? by bogjobber · 2013-12-24 19:59 · Score: 2
  
  Nontrivial doesn't necessarily mean large. It just means significant enough that it needs to be accounted for. The actual cost will of course be dependent on the size and complexity of your codebase.
4. Re:What about shared libraries? by petermgreen · 2013-12-25 08:31 · Score: 1
  
  The problem is not the succesful compiles, hardware is cheap enough as you say.
  The problem is when you build with a new and exotic set of options (and sometimes even when you don't) you WILL run into software that fails to build. You have to work out why, then come out with a fix, then either carry that fix locally forever forward porting it with each new version or convince upstream that your fix is worthwhile to accept. Even if upstream do accept your fix you may find that they break it again later.
  x32 is especially fun because a lot of software sees it as "x86 like" or "x64 like" and proceeds to try and fail to use inline assembler.
  When you are building one or two apps this isn't too much of a problem, when you are trying to rebuild something the size of a major desktop/server linux distro it can become decidedly nontrivial
  
  --
  note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
More than one reason for x86-64 by tepples · 2013-12-24 12:23 · Score: 4, Interesting

we went 64 bit for a reason.
We went to x86-64 for three reasons: 64-bit integer registers, more integer registers, and 64-bit pointers. Some applications need only the first two of these three, which is why x32 is supposed to exist.
1. Re:More than one reason for x86-64 by Bengie · 2013-12-25 09:46 · Score: 1
  
  Actual question, how does "mandatory SSE2" play into open source where the compiler can detect SSE2 and compile for it? If you can properly detect and target SSE2 at compile time, assumptions aren't needed.
Re:Wont use Linux without it! by mjrauhal · 2013-12-24 12:29 · Score: 2

I could get into specifics but I shan't, because what you're blathering about has zero relevance for x32. It's not a replacement-to-be for the usual amd64 ABI, nobody is going to break amd64 to make x32 run. It's mostly a specialist tool for specific workloads (aside from being a hacker's playground, as are many things). Whether thinking it's useful as such is misguided or not, you're more so.
Re:Sometimes efficiency and performance come first by Rockoon · 2013-12-24 12:42 · Score: 1

Having smaller data structures is much better for the small 64-byte cache lines of modern CPUs.
If your data structure includes pointers that you actually use, then you are randomly accessing memory anyways. If you arent using those pointers, then I suggest 0-sized pointers which are compatible with x64.

--
"His name was James Damore."
Re:Stupid by Reliable+Windmill · 2013-12-24 12:47 · Score: 2

You've just misunderstood it. It is in essence a performance enhancement, and you would benefit from it simply from selecting x32 target (instead of x86-64) when compiling.

--
Signature intentionally left blank.
Re: hunger games part 34: breaking wind by Anonymous Coward · 2013-12-24 12:53 · Score: 1

It's like the title to a cheesy novel.
Why isn't it done dynamically? by Anonymous Coward · 2013-12-24 12:53 · Score: 1

It's no big surprise that takeup is low when developers are forced to make a conscious choice between x32 ABI and full 64-bit operation for their entire program. It's the wrong approach.
A far better approach would have been to enhance the 64-bit ABI to allow 32-bit pointers to be used wherever the compiler can guarantee that pointer operations will remain within the 32-bit range. There is no shortage of such situations even in pointer-flexible C, and it's even easier to find such small-range use in more tightly constrained languages. It's even possible to start off a pointer as x32 and then promote it to 64-bit on casts or wherever it is no longer possible to track where it's pointing --- that would make 32-bit pointers usable part of the time in almost all programs.
Done that way, there would be no complaints of lack of x32 adoption. Everyone would be using it and benefiting from it to the greatest extent possible in their programs, without losing access to the full 64-bit space.
The either-or choice of 32-bit or 64-bit ABIs was a mistake.
1. Re:Why isn't it done dynamically? by SpaceLifeForm · 2013-12-25 05:55 · Score: 1
  
  Agree with your main points. However, I would not call how we have arrived here a mistake. It is just the result of doing the wholesale conversion from 32 bit to 64 bit without considering performance impact, especially at the cache level. Premature optimization is the root of all evil, no?
  
  --
  You are being MICROattacked, from various angles, in a SOFT manner.
The main use cases are vertically integrated by BusterB · 2013-12-24 13:05 · Score: 1

Think Atom processors running Android, or High-performance computing applications. Neither of these require a huge external ecosystem, but if you get a 30-40% boost in some workload, they are worth it. It's my understanding that small-cache Atoms benefit from this more than huge Xeons.
1. Re:The main use cases are vertically integrated by Guy+Harris · 2013-12-25 07:13 · Score: 1
  
  Think Atom processors running Android, or High-performance computing applications.
  As long as you're not doing high-performance computing on more than 3-4GB of data. mmap() and MapViewOfFile() aren't all that cheap, so if you are working on more data, and can fit it in main memory, you'd probably prefer Full Frontal x86-64 to x32.
No by Technomancer · 2013-12-24 13:23 · Score: 1

And I don't want another set of libraries in my system in addition to 64 bit and 32 bit emulation.
Re:Supplant 32-bit ABI by 0123456 · 2013-12-24 13:36 · Score: 1, Interesting

Eventually, I assume that all binaries which don't need 64-bit addressing (which will probably always be more than 90% of them) will switch to this ABI since having access to the extended register set without the overhead of all the bus bandwidth and cache space lost to store lots of zeroes is a HUGE win with zero cost.
Uh, no.
Really, no.
It's just not going to happen.
90+% of applications are not CPU-intensive, so they don't give a crap. 90% of the other applications that are CPU-intensive would benefit far more from removing pointer accesses than from making the pointers half the size. Only the remaining 1% are going to go through the hassle of dicking around with a complete second set of libraries on their system just so they can halve the size of their pointers.
There's simply no benefit at all from compiling the vast majority of desktop x86 applications in anything other than x86-64. Which is why no sane x86 distro is even going to consider using this kludge.
The return of memory models? by Just+Brew+It! · 2013-12-24 14:29 · Score: 3, Interesting

This sure feels a lot like a throwback to the old 16-bit DOS days, where you had small/medium/large memory models depending on the size of your code and data address spaces. We've already got 32-bit mode for supporting pure 32-bit apps and 64-bit mode for pure 64-bit; supporting yet a third ABI is just going to result in more bloat as all the runtime libraries need to be duplicated for yet another combination of code/data pointer size.
I hate to say this since I'm sure a lot of smart people put significant effort into this, but it seems like a solution in search of a problem. RAM is cheap, and the performance advantage of using 32-bit pointers is typically small.
Re:ABI? by Just+Brew+It! · 2013-12-24 14:34 · Score: 2

ABI = Application Binary Interface. Defines the pointer sizes and conventions for passing function arguments at the object code level (among other things). The ABI determines how the compiler generates object code for function call/entry/exit, and the width of pointer types.
API defines the interfaces seen by the programmer.
BSD by manu0601 · 2013-12-24 14:54 · Score: 1

I understand it is the same beast as the COMPAT_NETBSD32 option that has been available in NetBSD for 15 years now. It works amazingly well: one can throw a 64 bit kernel on a 32 bit userland and it just works, except for a few binaries that rely on ioctl(2) on some special device to cooperate with the kernel.
NetBSD even had a COMPAT_LINUX32 option for 7 years, which enables running a 32 bit Linux binary on a 64 bit NetBSD kernel. Of course the Linux ABI is a fast moving target, and one often misses the latest system call that a given Linux binary requires, but it is funny to see that Linux feature was supported on non Linux OS first.
1. Re:BSD by adri · 2013-12-24 15:53 · Score: 2
  
  No, it's not the same.
  The idea is that you use the 32 bit pointer model, with 32 bit indirect instructions, but you're doing it all using the x86-64 instruction set. Ie, the task is in 64 bit mode. The 64 bit mode includes primarily more registers, so you can write / compile to tighter code.
  The stuff you described is for running 32 bit binaries that use the i386/i485/i586 instruction set, complete with the limited set of temporary registers. x86-64 has many more registers to use.
  It's not just about cache lines. :)
Re:Supplant 32-bit ABI by AReilly · 2013-12-24 15:18 · Score: 1

This.
With a slight caveat that in that last one percent is probably the use case of DOM inside a browser page looks sufficiently like an irreducible thicket of tiny objects, and still wants all the speed that it can get, which is why Google is pushing x32 for Chrome plugins. Maybe it helps a bit for Javascript compilation too.
At least if your x32 is (a) sandboxed in a browser process and (b) generated by a JIT then the library duplication badness should be negligible and the result mostly invisible to the user.
For my own code, I wouldn't touch it with a bargepole. Storing pointers in memory? Madness...

--
-- Andrew
First let's understand this x32 correctly. by macpacheco · 2013-12-24 15:38 · Score: 2

While it's possible to have a system with 16GB that could use only x32 (the kernel is still x86_64 under x32, so the kernel can see the 16GB), for instance running thousands of tasks using up to 4GB each just fine, plus the page cache is a kernel thing, so the I/O cache can always use all memory.
On the other hand, there are workloads that run on a 4GB system but that need x86_64 (mmaping of huge files for instance), and so boneheaded tasks reserve tons of never used RAM, it could actually use 1GB of RAM but reserve 8GB, the issue there really should be putting the coder in jail, but I digress.
But the vast majority of linux workloads today that use even a 8GB system would run just fine under x32. Like 95-98%.
And nobody is even suggesting a mainstream linux distro without x86_64 userland. I'm sugesting all standard tools using x32, but keeping the x86_64 shared libraries and compilers, so if you need you could use some apps with full 64bit capability. Just use x32 by default.
Plus it's a good way to remind lazy developers that no matter how cheap RAM is, you should be serious about being efficient (specially to the KDE developers) !
KDE functionality is great, but they really have no clue about efficiency (RAM and CPU).
1. Re:First let's understand this x32 correctly. by Mr+Z · 2013-12-24 15:57 · Score: 1
  
  I have a warm place in my heart for x32 for the reasons you mention: 95% - 98% of code is perfectly happy with 32-bit pointers. (I would posit 99%+ for many folks, actually. The kernel and a few big apps benefit from 64-bit, and the rest fit well in 4GB. I put "web browser" in the "big app" list, but it is only one app even if it's one of the biggest cycle-eaters.)
  Now that the ABI has matured and presumably they've had some time to work the kinks out, I'd love to see them post some up-to-date benchmarks. The previous benchmarks were actually somewhat disappointing. I expected a bit more noticeable speedup.
  
  --
  Program Intellivision!
2. Re:First let's understand this x32 correctly. by macpacheco · 2013-12-24 19:18 · Score: 1
  
  humm, I'm running firefox / chrome with 3GB total system RAM just fine
  dozens of tabs, flash, java, you name it
  many pages with hundreds of jpegs open
  the maximum virtual memory space for those jobs don't even get to 1GB
  I'm a MySQL/pgsql/Progress DBA and the only case I've seen that would require x86_64 is a customer with 6 Progress databases, a single local client attaches to all 6 dbs, requiring over 4GB of address space, all other cases don't come even close, all jvm I've ever seen, maxed out at 1.3GB
  again, most people posting on the x32 subject don't even know what exactly is a mmap mapping, a shared memory segment, memory allocated from sbrk and other methods, stacks allocated dynamically, and jump to the conclusion a browser must require over 4GB of address space
  unless you're running some exotic scientific application, some database with a huge cache, something custom made, you'll have to work realllllyyyyyy hard to need more than even 2GB of memory address space
  The issue benchmarks aren't being run isn't kinks in x32 itself, is most realworld apps can't be simply recompiled for x32 (a little bit of assembly code will mess it up for instance, most virtual machines have some assembly code for raw memory access - to read/write from C style structures for instance), and those maintainers don't want to get it done
  Those are the really telling benchmarks (java apps, python apps, php apps, perl apps) I'm willing to be some serious money those will have a 10+% speedup, and they're the typical CPU hogs, specially poorly written java apps we have no control over the source. Those virtual machines use pointers for everything, since almost everything is malloc'ed (each variable for instance).
  Databases also use and abuse pointers and benefit greatly from extra registers. So an x32 mysql / pgsql would be great (I'm not even considering Progress ever releasing an x32 version, they're way too conservative).
3. Re:First let's understand this x32 correctly. by Mr+Z · 2013-12-25 03:42 · Score: 1
  
  I think we're in violent agreement. The only reason I included Firefox in the list is that I've seen it top 4GB on my own system. Maybe it's a x86-64 related memory leak, since all the memory measurements I hear people touting for the 32-bit version are far lower. Or maybe it's just full of pointers, which wouldn't be too surprising, really. :-)
  
  again, most people posting on the x32 subject don't even know what exactly is a mmap mapping, a shared memory segment, memory allocated from sbrk and other methods, stacks allocated dynamically, and jump to the conclusion a browser must require over 4GB of address space
  
  Well I only included Firefox because, on my computer right now, it has mapped over 3GB, and has 2.1GB resident. That still fits within the 3GB/1GB split of 32-bit Linux. I have seen it go as high as 6GB, but its usual steady state for me is around 4GB. I never close any tabs. You're right though that a web-browser shouldn't require that much RAM under normal circumstances. FWIW, I am quite familiar with mmap, shared memory, sbrk, huge pages (including using mmap to map files on a hugetlbfs to get larger pages and improve my TLB performace), etc. I didn't include Firefox out of ignorance.
  So, I reiterate: I think we're in violent agreement that x32 looks interesting and relevant, and the vast majority of applications don't need 64-bit, and many would benefit from smaller pointers.
  A few applications I've written (large heuristic solvers, for example) benefit from 10GB - 15GB RAM. And from what I hear, some EDA apps they use at work could use 200+GB. And, of course, there's the ever present large databases, as you mentioned. But that's pretty specialized as compared to everything else.
  
  --
  Program Intellivision!
Too specific by Kagetsuki · 2013-12-24 17:38 · Score: 1

So for me the answer is no. The whole thing reminds me of doing ARM assembler with thumb code mixed in. If you have a very specific usage for it then yes, it would certianly be useful - but it's going to be up to the people who need it to actually use and improve it. Everyone else has no need to care and the average developer shouldn't *need* to care or even be aware of it.
Re:Wont use Linux without it! by armanox · 2013-12-24 18:40 · Score: 2

I can recompile and run 20 year old SunOS apps no problem with OpenSolaris. Try that with Linux?
Depends on what it's looking for, but in theory should work. 20 years? CLI or GUI based? Probably wants TCL/TK and/or Motiff if it's GUI, make sure they're installed. I'm willing to try, if you have source code that old...

Hairyfeet mentioned he tried linux and people kept calling back angry that their printer stopped working after an Ubuntu update.
I did not even know it existed? I will keep Linux on a VM I suppose but only CentOS as Redhat likes to make somewhat ABIs that do not break after each freaking update!
If you need stability then you should go with a stable OS. Fedora, OpenSuSE, and Ubuntu change too fast for enterprise use - which is what makes RHEL great.
With that said, I don't seem to have issues running some older software I have laying around for Linux. Oracle Database 8 installed on RHEL 5 when I tried it last, old version of Code Forge IDE ran in new Fedora Linux (think I installed it last on FC 16, designed for Red Hat Linux 5.x/6.x (old Red Hat, not RHEL). Similar results with Matlab. The software isn't broken by kernel changes - the libraries needed do change (static linked vs dynamic linked, makes a big difference in how long your software lasts) (stuff looking for a particular glib or libc seem to be the biggest offenders in Linux, from what I've seen). Windows has seen some issues with that over the years (dropping DOS libraries, dropping Win 16 from 64 bit Windows, etc).
Most UNIX operating systems do seem to maintain greater compatibility in userland, but I've issues on IRIX with stuff built for 5.x not working on my systems (Octanes running 6.5.x) - but it's the same deal - dynamically linked programs not being able to find their libraries.

--
I'm starting to think GNU is the problem with "GNU/Linux" these days.
Probably not as much as 40% faster... by Anonymous Coward · 2013-12-24 20:16 · Score: 1

Ulrich Drepper has some x32 benchmarks on his blog. The biggest difference is for the cacheline benchmark where x32 is the same speed as x86_32 but x86_64 is 75% of the speed.
In all the other benchmarks the difference is much smaller and there are some surprising results such as x32 being slower than x86_32 (nbench (FOURIER)) and slower by a large amount compared to x86_64 (PARSEC blackscholes SIMD)...
Re:Wont use Linux without it! by makomk · 2013-12-24 23:06 · Score: 1

Running 20 year old Linux binaries is certainly possible too - I think one or two of the kernel devs do it from time to time but it requires a kernel option that's not always enabled and old versions of libraries.
Scientific computing by Fruit · 2013-12-24 23:39 · Score: 1

Desktops aren't the intended audience for x32. This stuff is for very specific scientific compute jobs that are pointer intensive (i.e., graphs etc). You won't see GNOME/KDE/whatever packages for this architecture.
The popularity of this arch won't manifest itself in general purpose software packages: most computation will be in one-off custom programs that are never released. That doesn't mean this architecture isn't popular, it just means you're using the wrong metric.
Errm by countach · 2013-12-25 01:10 · Score: 4, Interesting

Won't this require a 2nd copy of the shared libraries in memory, which will negate the benefit of a slightly smaller binary?
Re:Sometimes efficiency and performance come first by Megol · 2013-12-25 07:20 · Score: 1

How about using them but not storing them as pointers? 32 bit indices into a known structure is one possibility and separating a 32 bit word into one structure index and one index into that structure. E.g. if there are at most 256 structures the 32 bit value is read as one 8 bit structure index (looked up in a separate array) and 24 bit index into the structure. (That's a trick question BTW, to a first approximation no-one optimizes their code :( )
Who actually compiles every package on the system? by tepples · 2013-12-25 13:16 · Score: 1

how does "mandatory SSE2" play into open source where the compiler can detect SSE2 and compile for it?
Apart from Gentoo and FreeBSD users that use a "ports" style system, I guess it's more common to install binary packages from the distribution's repository than to compile source packages. Besides, not all applications can even be open source, especially games and video-on-demand players.
Another article on x32 by Shewmaker · 2014-01-02 15:48 · Score: 1

For those interested in x32, I wrote an article for Linux Weekly News last May. x32 ABI support by distributions may have some information on x32 you might not have been aware of.
In short, I found it easy to use the experimental x32 architecture for Debian, and there are certain scientific apps out there that might get significant benefit from it. Web application accelerators like Varnish might also have something to gain by using x32.

--
"For the Snark was a Boojum, you see." -From the Hunting of the Snark: An Agony in Eight Fits, by Lewis Carroll