Slashdot Mirror


Are 64-bit Binaries Slower than 32-bit Binaries?

JigSaw writes "The modern dogma is that 32-bit applications are faster, and that 64-bit imposes a performance penalty. Tony Bourke decided to run a few of tests on his SPARC to see if indeed 64-bit binaries ran slower than 32-bit binaries, and what the actual performance disparity would ultimately be."

19 of 444 comments (clear)

  1. Re: OSNews by duffbeer703 · · Score: 5, Funny

    In case anyone hasn't realized it yet, this article proves that OSNews is the most retarded website on the planet.

    The typical story is titled like "A comprehensive review of the Atari ST". The contents are typically something like... "I found an old Atari ST, but my cdrom wouldn't fit in the 5.25" disk drive and mozilla wouldn't compile. So the Atari sucks"

    I benchmarked a skilled Chinese abacus user against a C-programmer implementing an accounting system. The chinese dude figured out that 1+1=2 before the C-programmer loaded his editor, so the abacus is faster.

    --
    Conformity is the jailer of freedom and enemy of growth. -JFK
  2. It all depends... by paul248 · · Score: 5, Funny

    It all depends on how many of those 64 bits are 1's. 1's are a lot heavier than 0's, so too many of them will slow your program down a lot. If you compare a 32-bit program with all 1's, it will run significantly slower than a 64-bit program with only a few 1's. It's simple, really.

    1. Re:It all depends... by Uncle+Gropey · · Score: 5, Funny

      It's not that the 1's are heavier, it's that they tend to snag in the system bus and take longer to travel than the smoother 0's.

  3. gcc? by PineGreen · · Score: 5, Interesting

    Now, gcc is known to produce shit code on sparcs. I am not saying 64 is always better, but to be hones, the stuff should at least have been compiled with Sun CC, possibly with -fast and -fast64 flags...

  4. How mature are the compilers? by Anonymous Coward · · Score: 5, Interesting

    The surmise that ALL 64 bit binaries are slower than 32 is incorrect...

    At this stage of development for the various 64-bit architectures, there is very likely a LOT of room for improvement in the compilers and other related development tools and giblets. Sorry, but I don't consider gcc to be necessarily the bleeding edge in terms of performance on anything. It makes for an interesting benchmarking tool because it's usable on many, many architectures, but in terms of its (current) ability to create binaries that run at optimum performance, no.

    I worked on DEC Alphas for many years, and there was continuing progress in their compiler performance during that time. And, frankly, it took a long time, and it probably will for IA64 and others. I'm sure some Sun SPARC-64 users or developers can provide some insight on that architecture as well. It's just the nature of the beast.

  5. Opteron is faster in 64 bit by citanon · · Score: 5, Informative

    But that's only because it has two extra execution units for 64 bit code. 64 bit software is not inherently faster. Most people here would know this, but I just thought I might preemptively clear up any confusion.

    1. Re:Opteron is faster in 64 bit by fifirebel · · Score: 5, Informative

      Also because in 64-bit mode, the Opteron has access to more registers. The IA-32 architecture is so register-limited that throwing more registers at any task makes a huge difference.

  6. And 32 bit is slower than 16 bit by gvc · · Score: 5, Interesting

    I recall being very disappointed when my new VAX 11/750 running BSD 4.1 was much slower than my PDP 11/45 running BSD 2.8. All the applications I tested: cc, yacc, etc. were faster on the 16-bit PDP than the 32-bit VAX.

    I kept the VAX anyway.

  7. If 32bit is faster than 64... by CatGrep · · Score: 5, Funny

    Then 16bit binaries should be even faster then 32.

    And why stop there?

    8bits should really scream.

    I can see it now: 2GHz 6502 processors, retro computing. The 70's are back.

  8. Something is wrong. by DarkHelmet · · Score: 5, Interesting
    Maybe it's me, but how the hell is OpenSSL slower in 64 bit?

    It makes absolutely no sense. Operations concerning large integers were MADE for 64 bit.

    Hell, if they made a 1024 bit processor, it'd be something like OpenSSL that would actually see the benefit of having datatypes that bit.

    Something is wrong, horribly wrong with these benchmarks. Either OpenSSL doesn't have proper support for 64 bit data types, this fellow compiled something wrong, or some massive retard published benchmarks for the wrong platform in the wrong place.

    Or maybe I'm just on crack.

    --
    /^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$/i
  9. Forward thinking by Wellmont · · Score: 5, Interesting

    Well considering that manufacturers have been working like crazy to produce both 64 bit hardware and software applications, one could see that there is still some stuff to be done in the field.
    What most of the posts are considering and the test itself are "concluding" is that it has to be slower over all and even in the end when 64 bit computing finally reaches it's true breadth. However when the bottlenecks of the pipeline (in this case the cache) and the remaining problems are removed you can actually move that 64 bit block in the same time it takes to move a 32 bit block.
    Producing to 32bit pipes takes up more space then creating a 64bit pipe in the end, no matter which way you look at it and no matter what kind of applications or processes its used for.
    However the big thing that could change this theory is Hyper Compressed Carbon chips, that should replace silicon chips within a decade. (that's fairly conservative estimate.

  10. Re:Benchmarks by Frymaster · · Score: 5, Funny
    There are 3 types lies. Lies. Damned Lies. ...and benchmarks.

    i've got some specint stats that show that damned lies are up to 30% faster.

  11. of course, they are by ajagci · · Score: 5, Informative

    Both 32bit and 64bit binaries running on the same processor get the same data paths and the same amount of cache on many processors. But, for one thing, 64bit binaries use up more cache memory for both code and data. So, yes, if you run 32bit binaries on a 64bit processor with a 32bit mode, then the 32bit binaries will generally run faster. But the reason why they run well and all the data paths are wide is because the thing is a 64bit processor in the first place--that's really what "64bit" means.

    64bit may help with speed only if software is written to take advantage of 64bit processing. But the main reason to use 64bit processing is for the larger address space and larger amount of memory you can address, not for speed. 4Gbytes of address space is simply too tight for many applications and software design started to suffer many years ago from those limitations. Among other things, on 32bit processors, memory mapped files have become almost useless for just the applications where they should be most useful: applications involving very large files.

  12. Re:Moving more data by dfung · · Score: 5, Informative

    Oh, now I'll *cough* a little too.

    Modern processors (which actually stretches back at least 10 years) really want to run out of cache as much as possible, both for instruction and data access. And they've never wanted to do it more than now when in the x86 world, the processor core and L1 cache are operating at 3200MHz vs. 400MHz for the RAM.

    One thing that has to happen is that you make a bet on locality of execution (again both for instructions and data) and burst load a section of memory into the caches (L2 and L1, and sometimes even L3). In implementation terms, it takes some time to charge up the address bus, so you increase bandwidth and execution speed by charging up address n, but doing a quick read of n+1, n+2, n+3, and more on the latest CPUs. You only have to wiggle the two low-order address lines for the extra reads, so you don't pay the pre-charge penalty that you would for access randomly in memory.

    That's good if you're right about locality and bad if you're wrong. That's what predictive branching in the processor and compiler optimizations are all about - tailoring execution to stay in cache as much as possible.

    On a 64-bit processor, those burst moves really are twice as big and they really do take longer (the memory technology isn't radically different between 32- and 64-bit architectures, although right now it would be odd to see a cost-cutting memory system on a 64-bit machine). If all the accesses of the burst are actually used in execution, then both systems will show similar performance (the 64-bit will have better performance on things like vector opcodes, but for regular stuff, 1 cycle is 1 cycle). If only half of the bursted data is used, then the higher overhead of the burst will penalize the 64-bit processor.

    If you're running a character based benchmark (I've never looked at gzip, but it seems like it must be char based), then it's going to be hard for the 64-bit app and environment to be a win until you figure out some optimization that utilizes the technology. If your benchmark was doing matrix ops on 64-bit ints, then you'll probably find that that Opteron, Itanium, or UltraSparc will be pretty hard to touch.

    A hammer isn't the right tool for every job as much as you'd like it to be. I actually think that the cited article was a reasonable practical test of performance, but extrapolating from that would be like commenting on pounding nails with a saw - it's just a somewhat irrelevant measure.

    I guess I'm violently agreeing with renehollan's comment about speed bumps - apps that can benefit from an architectural change are as important as more concrete details such as compiler optimizations.

  13. There's always a trade-off by KalvinB · · Score: 5, Insightful

    between precision and speed.

    It's not surprising that 64-bit processors are rated much slower than 32-bit ones. The fastest 64-bit AMD is rated 2.0ghz while the fastest AMD 32-bit is 2.2ghz.

    If you use a shovel you can move it very fast to dig a hole. If you use a backhoe you're going to move much slower but remove more dirt at a time.

    Using modern technology to build a 386 chip would result in one of the highest clock speeds ever but it would be practically useless. Using 386 era technology to build a 64 bit chip would be possible but it'd be massive and horribly slow.

    I'm still debating whether or not to go with 64-bit for my next system. I'd rather not spend $700 on a new system so I can have a better graphics card and then have to spend several hundred more shortly after to replace the CPU and MB again. But then again, 64-bit prices are still quite high and I'd probably be able to be productive on 32-bit for several more years before 32-bit goes away.

    Ben

  14. Re: OSNews by Guuge · · Score: 5, Insightful

    News flash: 64-bit apps are, usually, slightly slower than 32-bit ones. Duh. Any developer who's been around 64-bit environments for more than a few weeks knows this. It's not like there's some subtle magic going on here; bigger pointers means more data to schlep around.

    That is the sort of "obvious" conventional wisdom that the article is questioning. In fact, 64-bit architecture means a lot more than pointer size, and merely counting bits is no way to estimate performance.

  15. Slower? It depends. by BobaFett · · Score: 5, Informative

    Depends mainly on what data the test is using. If it's floating-point heavy, and uses double, then it always was 64-bit. On 64-bit hardware it'll gain the full-width data path and will be able to load/store 64-bit floating-point numbers faster, all things being equal. If it uses ints (not longs), it is and will stay 32-bit, there will be no difference unless the hardware is capable of loading two 32-bit numbers at once, effectively splitting the memory bus in two (HP-PA RISC can do it, his old Sun cannot, newest Suns can, I don't know if Opterons can). Finally, if the test uses data types which convert from 32 to 64 bits it will become slower, but only if it does enough math on these types. The later is important, since every half-complicated program uses pointers, explicitly or implicitly, but not every program does enough pointer arithmetics compared to other operations to make a difference. However, if it does, then it'll copy pointers in and out of main memory all the time, and you can fit half as many 64-bit pointers into the cache.
    That's where the slowdown comes (plus some possible library issues, early 64-bit HP and Sun system libraries were very slow for some operations).
    If your process resident memory size is the same in 64 and 32-bit mode, you should not see any slowdown. If you do, it's an issue with the library of the compiler (even though the compiler in this case is the same, the code generator is not, and there may be some low-level optimizations it does differently). If resident size of 64-bit application is larger, you are likely to see slowdown, and the more memory-bound the program is the larger it'll be.

  16. Re: OSNews by be-fan · · Score: 5, Informative

    On SPARC, there are no 64-bit-only optimizations. The only reason to use 64-bit math is either if you need 64-bit integers, or use 64-bit pointers. Since none of the benchmarks can use either (the MySQL benchmark could, but the machine only had 256MB of RAM).

    --
    A deep unwavering belief is a sure sign you're missing something...
  17. Re:OSNews = UnNews? by fitten · · Score: 5, Informative

    Don't know how they could be exactly the same *except* for the word size. In order to process the two different word sizes, there will have to be differences in circuitry (ALU is wider, so are lots of things like the buffers between pipeline stages and such).

    One of the issues that people forget is that a 64-bit processor may be able to retire a set number of 64-bit, say, integer additions per clock cycle (NOTE: retiring an operation per clock cycle does NOT mean that the operation takes one clock cycle to perform). Well, the odds are that it will also retire the same number of 32-bit integer additions per clock cycle. It may take 5 clock cycles to do either sized addition even. So, what do you have that is different? Well, on the SPARC, most simple operations are going to be similar in execution time. Regarless of the number of register windows that the particular architecture supports (which may come into play in some codes), you still basically have 32 registers for use in your computational kernel. The only real difference between many 32-bit and 64-bit versions of the code will be the amount of data that has to be moved around.

    Where the 64-bit will help is when the 32-bit code has to synthesize 64-bit operations or has to do things like work on bit streams (not word/byte streams exactly) and can work on 64-bits at a time rather than doing really the same thing on 32-bits two times as much (128 bytes can be traversed in 32 32-bit operations or 16 64-bit operations - half the number of reads/operations).

    All of this is pretty well understood by those who have dealt with these type systems before. However, the relative newcomer Opteron has an additional twist. In 64-bit mode, there are twice as many registers that can be used compared to 32-bit mode. This may (read: will) cause some codes to be done faster simply because more data can be stored in registers rather than memory, even L1 cache is a bit slower than a register.