Are 64-bit Binaries Slower than 32-bit Binaries?
JigSaw writes "The modern dogma is that 32-bit applications are faster, and that 64-bit imposes a performance penalty. Tony Bourke decided to run a few of tests on his SPARC to see if indeed 64-bit binaries ran slower than 32-bit binaries, and what the actual performance disparity would ultimately be."
I read this piece yesterday. Here's a tip for those of you who may currently or need to work on building an x86 to x86_64bit cross-compiler under the Linux operating system.
:-*), has something called the CrossTool at http://kegel.com/crosstool that should be of major help to anyone working with 64-bit Linux systems.
;-)
One of my tight friends, Dan Kegel (cute pic of him here, oh and he works for Google, so he's super-smart and rich!
You may even be able to list it as COTS on your project even though it's free as in beer. In any case, I've tried it, it's sweet, you should try it, it works great for what it does, just like most *nix apps. I prefer having one small tool do something really well than one large software package do a bunch of things really crappily.
Anyway, stop by Dan's page and say hi. Tell him I sent ya
Background: 28/M/Bi-Sexual; Owner of a Linux company; MBA Harvard 2003; B.S. Comp Sci MIT 2000
From reading the article, the answer is: Sometimes, depending.
In his explanation, he said something of the order of "if you want speed, use the 32-bit version of the binaries, because otherwise the computer physically has to move twice as much data around for each operation it does." Only if you want the extra memory mapping capability of a 64-bit binary, he said, would you need to use the 64-bit version.
I suppose in summary, though, it depends on exactly what your binary does.
Now, gcc is known to produce shit code on sparcs. I am not saying 64 is always better, but to be hones, the stuff should at least have been compiled with Sun CC, possibly with -fast and -fast64 flags...
The surmise that ALL 64 bit binaries are slower than 32 is incorrect...
At this stage of development for the various 64-bit architectures, there is very likely a LOT of room for improvement in the compilers and other related development tools and giblets. Sorry, but I don't consider gcc to be necessarily the bleeding edge in terms of performance on anything. It makes for an interesting benchmarking tool because it's usable on many, many architectures, but in terms of its (current) ability to create binaries that run at optimum performance, no.
I worked on DEC Alphas for many years, and there was continuing progress in their compiler performance during that time. And, frankly, it took a long time, and it probably will for IA64 and others. I'm sure some Sun SPARC-64 users or developers can provide some insight on that architecture as well. It's just the nature of the beast.
I recall being very disappointed when my new VAX 11/750 running BSD 4.1 was much slower than my PDP 11/45 running BSD 2.8. All the applications I tested: cc, yacc, etc. were faster on the 16-bit PDP than the 32-bit VAX.
I kept the VAX anyway.
My understanding is that when you switch an Athlon64 or Opteron into 64bit mode, that you suddenly get access to more general purpose registers than the x86 normally has. So the compiler can generate more efficient code in 64bit mode, making use of the extra registers and so forth. I don't know if this makes a difference in real world apps or not though.
The guy seemed to have his conclusion written before he started... Or at least that's how it seemed to me. When he was doing the SSL test, he said that the results were ONLY about 10% slower on the 64 bit version. Now I might be far too much of a graphics programmer.... but I would consider 10% to be a rather significant slowdown.
The other thing that bothered me of course was when he said that the file sizes were only 50% bigger in some cases... sure, code is never all that big, but... still...
The main product I work on, which was designed in a freaking vacuum, is so tightly tied to wintel that I've had to spend the greater part of a year gutting int and making it portable. Kind of. We currently use 1.5 gig of for the database cache. If we go any higher, we run out of memory. /3gb switch, but we kept having very odd things happen.
We tried win2k3 and the
This database could very easily reach 500 gig, but anything above 150 gig and performance goes in the toilet.
My solution...
Get a low-to-midrange Sun box that can handle 16+g and has a good disk subsystem. But that's not a current option. Like I said, this thing was designed in a vacuum. The in-memory data-structures are the network data structures. That are all packed on 1-byte boundaries. Can you say SIGBUS? A Conversion layer probably wouldn't be that hard, if it weren't build as ONE FREAKING LAYER!
Sorry, I had to rant. Anyway, a single 64 bit box would enable us to replace several IA32 servers. For large databases, 64bits is a blessing.
Matt
First, anyone with half a brain already knows what his "scientific" results prove. Second, anyone with two thirds of a brain has already performed similar (but probably better) tests and come to the same conclusion.
And third, OpenSSL uses assembly code hand-crafted for the CPU when built for the 32-bit environment (solaris-sparcv9-gcc) and compiles C when built for the 64-bit environment (solaris64-sparcv9-gcc). Great comparison, guy.
Apples, meet Oranges (or Wintels).
Mark
It makes absolutely no sense. Operations concerning large integers were MADE for 64 bit.
Hell, if they made a 1024 bit processor, it'd be something like OpenSSL that would actually see the benefit of having datatypes that bit.
Something is wrong, horribly wrong with these benchmarks. Either OpenSSL doesn't have proper support for 64 bit data types, this fellow compiled something wrong, or some massive retard published benchmarks for the wrong platform in the wrong place.
Or maybe I'm just on crack.
/^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$/i
We've got an Itanic box at work that has WinXP 64bit edition on it so we can build & test some 64bit Windows binaries.
It's the slowest box in the place! Open a terminal (oops, command shell, or whatever they call it on Windoze) and do a 'dir' - it scrolls so slowly that it feels like I'm way back in the old days when I was running a DOS emulator on my Atari ST box.
Pretty much everything is _much_ slower on that box. It's amazingly bad and I've tried to think of reasons for this: Was XP 64bit built with debugging options turned on when they compiled it? But even if that were the case it wouldn't account for all of it - I'd only expect that to slow things down maybe up to 20%, not by almost an order of magnitude.
Well considering that manufacturers have been working like crazy to produce both 64 bit hardware and software applications, one could see that there is still some stuff to be done in the field.
What most of the posts are considering and the test itself are "concluding" is that it has to be slower over all and even in the end when 64 bit computing finally reaches it's true breadth. However when the bottlenecks of the pipeline (in this case the cache) and the remaining problems are removed you can actually move that 64 bit block in the same time it takes to move a 32 bit block.
Producing to 32bit pipes takes up more space then creating a 64bit pipe in the end, no matter which way you look at it and no matter what kind of applications or processes its used for.
However the big thing that could change this theory is Hyper Compressed Carbon chips, that should replace silicon chips within a decade. (that's fairly conservative estimate.
From the article:
[...] you'll likely end up in a position where you need to know your way around a Makefile.
Well duh. What a surprise: compiling for a different platform might requires Makefile tweaking.
Am I the only one to think that was a dummy article wasting a spot for much more interesting articles about 64 bit computing?
A message from the system administrator: 'I've upped my priority. Now up yours.'
but more to the point, why would you advertise your sexuality in a technical post? I mean, if a straight person (and believe me, there is no such thing as "bi", you're either straight, queer, or in denial about being queer) were to post "I love vagina" in every post, you'd rightfully make fun of him.
But we're supposed to care that you consider a man's ass a sexual input? Stop looking for so much attention and lets talk computers.
My understanding of low level languages may not be comprehensive, however I am aware that for (lets use the simplest example I am familiar with) MIPS there are a number of registers for the storing of data that will be 'saved' and returned to the caller function, these registers are commonly known as $S0 - $S7, these registers have to be saved in the subroutine in order not to loose the volatile information stored therein.
... ..
.. ..
.. ...
for example:
sw $T0, 0($S0)
having more registers would allow you to bypass this step of writing the data to the mem address of $T0, you could use one of the new registers that are not volatile and store it there, thus removing the need for perhaps 5 instructions at a time on each return from a subroutine alone.
rather than the Store Word instruction (SW) you could just:
addi $T1, $ZERO, $S0
which would not be lost in the return to the calling function.
further to this, and i'm not sure that the intel x86 performs the same way, when you wish to load a
large number, i think in MIPS its >8bit into a register (16 bit register size) you have to infact perform TWO operations to load ONE number.
basically you load the first (largest significant bits) first
number = xFFFF FFF0
LUI $T0, xFFFF #load to the upper half of the
# register, because the address
# space only allows for 8 bit size
ADDI $T0, cFFF0 # add the second portion of the
# number.
on the basis that x86 shares some of these things, then 64 bit must be faster GIVEN even ground with compilers and so forth, these are assumed (EVEN THO THAT IS NOT THE CASE) because otherwise its all pissing in the wind.
if this has errors, forgive me, this is not my area of specialty by a long stretch.
-Archfile
But what's the fsking point?
News flash: 64-bit apps are, usually, slightly slower than 32-bit ones. Duh. Any developer who's been around 64-bit environments for more than a few weeks knows this. It's not like there's some subtle magic going on here; bigger pointers means more data to schlep around.
I think your parent's complaint is that is sort of like a cursory analysis indicating that triangular wheels aren't quite so good as round ones. If you really needed to be told this, you aren't in the audience that the article sounds like it's trying to address.
Certainly, many applications need 64 bits to operate. That doesn't mean it's the best tool for all jobs. The tone of the article sounds like it's exploring some big question that nobody's thought about before, and that's just silly.
If it makes you feel better, programs from 1995 tend to run a lot faster on modern hardware. Gzip a kernel on a 66MHz Pentium and then on 2GHz Opteron and you'll see what I mean.
The article mentions tweaking the LD_LIBRARY_PATH...
I was told a long time ago by a number of people I considered to be Solaris gurus -- not to mention in a number of books, Sun docs, etc. -- that the LD_LIBRARY_PATH variable was not only heading towards total deprecation, but introduced a system-wide security issue.
In its stead, we were supposed to use the "crle" command to set our library paths.
On all of my boxes I use crle and not LD_LIBRARY_PATH and everything works as expected.
Any pro developers or Solaris technical folks that can comment on this?
I put NetBSD on most of my Sparc hardware. Because then I can run and build from the same exact source tree of packages as I use on my Intel boxes. And run a kernel built from exactly the same source.
Which brings up a point: both NetBSD/Sparc and NetBSD/Sparc64 will run on an Ultra 1, which is a 64 bit machine. Why doesn't somebody install each NetBSD port on two seperate Ultra 1 machines. Then the benchmark comparision can be between the normal apps that build on both systems, running in parallel on two identical systems. Its exactly the same codebase except for the 32 or 64 bittedness.
---
When we get solid state hard drives and if they're reliable and fast as regular ram then ram will be gone and the SSD will take over. So in essence your machine may just allocate itself a huge chunk of the drive as it's own memory space..
Imagine a machine that can grab 16g for it's memory usage and your video card having a huge chunk for itself also. Along with your terrabits of information space if things pan out well enough.
Ummm... I beg to differ on the reasons...
Most 64/32bit hybrid machines probably just split the arithmatic/logic units in half (just takes a single wire to connect them anyhow). Having an extra ALU around will surely push more 32bit numbers through the pipe. It's not going be as fast as a 64bit optimized application would gain from having the combined operations should it need them though.
I'm beginning to wonder these days how much CPU speed even matters though. We have larger applications that can't fit in cache, page switching from RAM that isn't anywhere near the speeds of the CPU, and hard drives that are only 40MB/s on a good day/sector, with latency averaging around 5-6ms. CPU is the least of my worries. As long as the hard disk is being utilized properly, you'll probably not see significant differences between processor speeds. I'm a firm believer that people think that 500MHz is slow because the hard drive in the machine was too slow. Unless you are running photoshop, SETI, Raytracing, etc., you probably wouldn't notice if I replaced your 3GHz processor with a 1GHz.
Karma Clown
How exactly did you get 2 x 32bit processors running 64bit code?
Normal people worry me!
By what method is a processor judged to be 16,32 or 64 bits?...
The 6502 had 8bit data, but 16 bit address bus, and was considered an '8 bit'
68000 had 16 bits data, 32 bits address - this was a 16 bit
So, why can't we just increase the address bus size of processors, to 64,while keeping the databus size at 32bits. have some new 64 bit addressing modes. The processor can still be 32 bit data, so the size of programs can stay the same....Store program code in the first 4Gigs of memory, (zero page!) , and the pointers within that code can remain 32bits, but have the option of using bigger 64bit pointers to point at data in the remaining 2^63 bytes. This should give best of both worlds of 32vs64 bit.
It shouldn't take longer to read 64 bits than 32 on a 64 bit architecture. Theoretically, a 32 bit machine will read 32 bits on a number of clock cycles, and a 64 bit machine should read 64 bits on the same ammount of clock cycles. This doesn't necessarily mean faster execution times on 64 bit though. A lot of it depends on the compiler, and the OS.
Also, if you just rebuild a an application that was designed around 32 bit in 64 bit mode, you probably aren't going to notice an improvement (if any at all).
I noticed the article used GCC, which probably hasn't caught on with 64 bit yet.. I'm pretty sure SUN has their own compiler, which probably would produce better 64 bit code than GCC on a SUN box.
If the compiler sucks, then it would suck equally for 32-bit and 64-bit binaries! They use the same code generator!
A deep unwavering belief is a sure sign you're missing something...
This is modded Insightful?
You've completely missed the entire point of the test. This has nothing to do with your next purchase decision -- it's purely designed to test whether or not the common claim that using 64-bit values decreases performance due to memory latency is true. This test makes no claims whatsoever that it has anything to do with whether or not you should be using a 64-bit setup. RTFA.
The "obsolete architecture" is one of the few where 64-bit and 32-bit operations have no inherent performance advantage on the processor, unlike the Opteron and Itanium processors where 64-bit mode has several advantages over 32-bit mode (extra registers or not being emulated). This makes it a perfect testbed for evaluating this claim. The speed of the processor has absolutely no relevance to the question at hand (with the exception of testing memory access starvation on system with a greater CPU to bus clock difference).
It's a shame you're too wrapped up in a "buy, buy, buy" mindset to consider the value of curiosity and of testing commonly held beliefs.
If it's for-profit but free, you're not the customer -- you're the product (e.g., the Slashdot Beta's "audience").