Are 64-bit Binaries Slower than 32-bit Binaries?
JigSaw writes "The modern dogma is that 32-bit applications are faster, and that 64-bit imposes a performance penalty. Tony Bourke decided to run a few of tests on his SPARC to see if indeed 64-bit binaries ran slower than 32-bit binaries, and what the actual performance disparity would ultimately be."
In case anyone hasn't realized it yet, this article proves that OSNews is the most retarded website on the planet.
The typical story is titled like "A comprehensive review of the Atari ST". The contents are typically something like... "I found an old Atari ST, but my cdrom wouldn't fit in the 5.25" disk drive and mozilla wouldn't compile. So the Atari sucks"
I benchmarked a skilled Chinese abacus user against a C-programmer implementing an accounting system. The chinese dude figured out that 1+1=2 before the C-programmer loaded his editor, so the abacus is faster.
Conformity is the jailer of freedom and enemy of growth. -JFK
I read this piece yesterday. Here's a tip for those of you who may currently or need to work on building an x86 to x86_64bit cross-compiler under the Linux operating system.
:-*), has something called the CrossTool at http://kegel.com/crosstool that should be of major help to anyone working with 64-bit Linux systems.
;-)
One of my tight friends, Dan Kegel (cute pic of him here, oh and he works for Google, so he's super-smart and rich!
You may even be able to list it as COTS on your project even though it's free as in beer. In any case, I've tried it, it's sweet, you should try it, it works great for what it does, just like most *nix apps. I prefer having one small tool do something really well than one large software package do a bunch of things really crappily.
Anyway, stop by Dan's page and say hi. Tell him I sent ya
Background: 28/M/Bi-Sexual; Owner of a Linux company; MBA Harvard 2003; B.S. Comp Sci MIT 2000
It all depends on how many of those 64 bits are 1's. 1's are a lot heavier than 0's, so too many of them will slow your program down a lot. If you compare a 32-bit program with all 1's, it will run significantly slower than a 64-bit program with only a few 1's. It's simple, really.
From reading the article, the answer is: Sometimes, depending.
From the article:
I create a very simple C file, which I call hello.c:
main()
{
printf("Hello!\n");
}
Watch out... SCO owns this bit of code too...
--ken
Bitcoin pyramid: Join here: http://www.bitcoinpyramid.com/r/1427 it's FREE!
I can only assume that this is only going to be limited to SPARC...I mean, we've already seen the major differences between Itanium and Opteron dealing with 32 bit apps, right? Or is this a different question, since Opteron gets to run 32bit effectively "native"? And, at this point, when running 32 bit apps on a 64 bit chip, just what can "native" mean anyway?
Given a choice between free speech and free beer, most people will take the beer.
I beg to differ. OSNews rocks.
The Jeffrey -- since 1979
In his explanation, he said something of the order of "if you want speed, use the 32-bit version of the binaries, because otherwise the computer physically has to move twice as much data around for each operation it does." Only if you want the extra memory mapping capability of a 64-bit binary, he said, would you need to use the 64-bit version.
I suppose in summary, though, it depends on exactly what your binary does.
Aren't there certian optimizations and, in general, better coding for most 32 bit applications (on the lowest level of the code) because people have used it for so long? Couldn't it just be that we need to refine coding for 64 bit processors?
Most "tech gurus" I've talked to at my university about the benefites of 64bit processing say that it is in part due to the increase of the number of registers (allowing you to use more at the same time, shortening the number of cycles needed). Could time allow us to write more efficient kernels, etc for 64 bit processors?
So either the code isn't good enough, or perhaps there's another physical limitation (longer pipelines, etc) on the chip itself? Correct me if I'm wrong.
Now, gcc is known to produce shit code on sparcs. I am not saying 64 is always better, but to be hones, the stuff should at least have been compiled with Sun CC, possibly with -fast and -fast64 flags...
Well neither of you have provided any actual evidence proving they rock.. or sock... o.O -tromps off to OSNews to check out their benchmarks- I shall be back ^_^
Yes they are, but only by about 10-20%.
Makes me wonder what tricks AMD has managed to pull out of their hat to increase 64 bit performance by 20-30%...
It's been a long time.
The surmise that ALL 64 bit binaries are slower than 32 is incorrect...
At this stage of development for the various 64-bit architectures, there is very likely a LOT of room for improvement in the compilers and other related development tools and giblets. Sorry, but I don't consider gcc to be necessarily the bleeding edge in terms of performance on anything. It makes for an interesting benchmarking tool because it's usable on many, many architectures, but in terms of its (current) ability to create binaries that run at optimum performance, no.
I worked on DEC Alphas for many years, and there was continuing progress in their compiler performance during that time. And, frankly, it took a long time, and it probably will for IA64 and others. I'm sure some Sun SPARC-64 users or developers can provide some insight on that architecture as well. It's just the nature of the beast.
But that's only because it has two extra execution units for 64 bit code. 64 bit software is not inherently faster. Most people here would know this, but I just thought I might preemptively clear up any confusion.
I recall being very disappointed when my new VAX 11/750 running BSD 4.1 was much slower than my PDP 11/45 running BSD 2.8. All the applications I tested: cc, yacc, etc. were faster on the 16-bit PDP than the 32-bit VAX.
I kept the VAX anyway.
My understanding is that when you switch an Athlon64 or Opteron into 64bit mode, that you suddenly get access to more general purpose registers than the x86 normally has. So the compiler can generate more efficient code in 64bit mode, making use of the extra registers and so forth. I don't know if this makes a difference in real world apps or not though.
The guy seemed to have his conclusion written before he started... Or at least that's how it seemed to me. When he was doing the SSL test, he said that the results were ONLY about 10% slower on the 64 bit version. Now I might be far too much of a graphics programmer.... but I would consider 10% to be a rather significant slowdown.
The other thing that bothered me of course was when he said that the file sizes were only 50% bigger in some cases... sure, code is never all that big, but... still...
Read the fucking article. He didn't use hello.c for the benchmarking.
Fuck I hate slashdot idiots who complain about people who don't THINK before they post.
Practice what you preach, asshole.
Your "analysis" may be valid, but it's really not applicable. The title of the story is, "Are 64-bit Binaries Really Slower than 32-bit Binaries?" The author takes a 64-bit machine, compiles a few programs, and tests the resulting binaries to see which is faster. I'd say that the review is aptly titled and an interesting point to think on. Certainly he didn't compile every open source program known to mankind, as it sounds like he missed some pet app of yours. OpenSSL might be kind of arbitrary, but gzip and MySQL seem like reasonable apps to test. Like the last page says (you *did* RTFA, right?), if you don't like his review, go write your own and get it published.
Good luck, you'll need it.
--- I do not moderate.
Then 16bit binaries should be even faster then 32.
And why stop there?
8bits should really scream.
I can see it now: 2GHz 6502 processors, retro computing. The 70's are back.
This article sounds completely stupid. Someone didn't know that pulling 64-bits across the bus( reading/writing can take longer than 32-bits? Never thought of the caches?
Just read the GCC Proceedings, there's explanations and benchmarks of the why/how/when of x86-64 in 32 vs 64-bit mode, both speed of execution and image size.
Belief is the currency of delusion.
The main product I work on, which was designed in a freaking vacuum, is so tightly tied to wintel that I've had to spend the greater part of a year gutting int and making it portable. Kind of. We currently use 1.5 gig of for the database cache. If we go any higher, we run out of memory. /3gb switch, but we kept having very odd things happen.
We tried win2k3 and the
This database could very easily reach 500 gig, but anything above 150 gig and performance goes in the toilet.
My solution...
Get a low-to-midrange Sun box that can handle 16+g and has a good disk subsystem. But that's not a current option. Like I said, this thing was designed in a vacuum. The in-memory data-structures are the network data structures. That are all packed on 1-byte boundaries. Can you say SIGBUS? A Conversion layer probably wouldn't be that hard, if it weren't build as ONE FREAKING LAYER!
Sorry, I had to rant. Anyway, a single 64 bit box would enable us to replace several IA32 servers. For large databases, 64bits is a blessing.
Matt
The point of a 64-bit architecture boils down to two things really, memory and data size/precision.
An architecture with 32-bits of address space can directly address 2^32 or approximately 4 billion bytes of memory. There are many applications where that just isn't enough. More importantly, an architecture whose registers are 32-bits wide is far less efficient when it comes to dealing with values that require more than 32 bits to express. Many floating point values use 64 bits and being able to directly manipulate these in a single register is a lot more efficient than doing voodoo to combine two 32-bit registers.
So, if you have an problem where you're dealing with astronomical quantities of very large (or precise) values, then a 64-bit implementation is going to make a very big difference. If you're running a text editor and surfing the web then having a wider address bus and wider registers isn't going to do squat for you. Now that doesn't mean that there may not be other, somewhat unrelated, architectural improvements found in a 64-bit architecture that a 32-bit system is lacking. Those can make a big difference as well, but then you're talking about the overall efficiency of the design, which is a far less specific issue than whether 64-bits is better/worse than 32.
Lee
Muslim community leaders warn of backlash from tomorrow morning's terrorist attack.
As it needs to be said for any benchmarking story:
...and benchmarks.
There are 3 types lies. Lies. Damned Lies.
First, anyone with half a brain already knows what his "scientific" results prove. Second, anyone with two thirds of a brain has already performed similar (but probably better) tests and come to the same conclusion.
And third, OpenSSL uses assembly code hand-crafted for the CPU when built for the 32-bit environment (solaris-sparcv9-gcc) and compiles C when built for the 64-bit environment (solaris64-sparcv9-gcc). Great comparison, guy.
Apples, meet Oranges (or Wintels).
Mark
It makes absolutely no sense. Operations concerning large integers were MADE for 64 bit.
Hell, if they made a 1024 bit processor, it'd be something like OpenSSL that would actually see the benefit of having datatypes that bit.
Something is wrong, horribly wrong with these benchmarks. Either OpenSSL doesn't have proper support for 64 bit data types, this fellow compiled something wrong, or some massive retard published benchmarks for the wrong platform in the wrong place.
Or maybe I'm just on crack.
/^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$/i
... one me and a friend get a Sun Ultra (dunno what one yet) in a few days. Hoperfully, we can set up somthing that requires a lot of number cruntching. Perhaps a 64-bit SETI@home, I dunno.
The cool thing is for programmers, is that with the right macros and functions, one can use a single 64-bit integer as (get this!) 2 32-bit integers. This will come in handy for games and sutch where two sets of common numbers are accesed frequently.
We've got an Itanic box at work that has WinXP 64bit edition on it so we can build & test some 64bit Windows binaries.
It's the slowest box in the place! Open a terminal (oops, command shell, or whatever they call it on Windoze) and do a 'dir' - it scrolls so slowly that it feels like I'm way back in the old days when I was running a DOS emulator on my Atari ST box.
Pretty much everything is _much_ slower on that box. It's amazingly bad and I've tried to think of reasons for this: Was XP 64bit built with debugging options turned on when they compiled it? But even if that were the case it wouldn't account for all of it - I'd only expect that to slow things down maybe up to 20%, not by almost an order of magnitude.
please, read the rest of the article, he just uses that as an example to show that the arguements he was passing to the compiler really were having an effect on the output, although I don't see why he ahd to do that considering what he does afterwards, that part was not the benchmark...
Actio personalis moritur cum persona. (Dead men don't sue)
Short answer: No.
..does not get any faster just because of the size of the register the single bit is contained in. It's still bound by the clockspeed. Programmers can rewrite algorithms to do certain things in parallel, but it's probably not unless it's a big memory operation, multimedia app, game or graphics package. For those it will be much better.
Medium answer: If you're not a programmer, yes. Expect about the same speed, but maybe slightly less.
Long answer: Direct comparisons like this are in no way valid because the code is identical. It's the same algorithm running at the same clockspeed. Your compiler can't program. Think about this: There's only so much space taken up by a logical operation. The question:
"is this bit set to one? if yes, do this.. if no, do that"
Which is why Intel is more concerned with clockspeed than number of bits.
score 0 insightful? this deserves a +5 at least!
Do you even lift?
These aren't the 'roids you're looking for.
No one had tested it before to my knowledge, so predicting the outcome was impossible.
yes, right. we predict only on things we've seen someone else do in the past.
you've got the right idea, mate...
That has to be the most stupidest compiler test I've ever read.
And it's not the number of bits that is important: it's the size of them.
A message from the system administrator: 'I've upped my priority. Now up yours.'
The info on the Slashdot page should read more like an abstract or executive summary of the article. What we have here reads much more like an advertisement for an article.
Yeah, I could and should RTFA, but I object to posts on the front page of Slashdot being "teasers" for other people's news sites. The info, please.
When you're referring to Metal Oxide Semiconductor Field Effect Transistors, most CMOS circuits use the Ohmic region when at steady state. The saturation region is when it draws current, doh!
Well considering that manufacturers have been working like crazy to produce both 64 bit hardware and software applications, one could see that there is still some stuff to be done in the field.
What most of the posts are considering and the test itself are "concluding" is that it has to be slower over all and even in the end when 64 bit computing finally reaches it's true breadth. However when the bottlenecks of the pipeline (in this case the cache) and the remaining problems are removed you can actually move that 64 bit block in the same time it takes to move a 32 bit block.
Producing to 32bit pipes takes up more space then creating a 64bit pipe in the end, no matter which way you look at it and no matter what kind of applications or processes its used for.
However the big thing that could change this theory is Hyper Compressed Carbon chips, that should replace silicon chips within a decade. (that's fairly conservative estimate.
For the most part, there's little need for the extra bits, so you are just wasting computer time processing unnecessary bits.
Maybe you should all concentrate on making things more efficient, rather than relying on faster processors to make your crappy bloatware look fast.
I don't care if you are from the GPL camp or even Microsoft, everything out there from both camps is bloatware!
Computers in 2004 should actually be faster than computers from 1995. From all I've seen, because of the constant bloatware, this is not even the case, and may actually be the opposite.
From the article:
[...] you'll likely end up in a position where you need to know your way around a Makefile.
Well duh. What a surprise: compiling for a different platform might requires Makefile tweaking.
Am I the only one to think that was a dummy article wasting a spot for much more interesting articles about 64 bit computing?
A message from the system administrator: 'I've upped my priority. Now up yours.'
Why are we comparing mature 32-bit software with 64-bit software in its infancy?
but more to the point, why would you advertise your sexuality in a technical post? I mean, if a straight person (and believe me, there is no such thing as "bi", you're either straight, queer, or in denial about being queer) were to post "I love vagina" in every post, you'd rightfully make fun of him.
But we're supposed to care that you consider a man's ass a sexual input? Stop looking for so much attention and lets talk computers.
My understanding of low level languages may not be comprehensive, however I am aware that for (lets use the simplest example I am familiar with) MIPS there are a number of registers for the storing of data that will be 'saved' and returned to the caller function, these registers are commonly known as $S0 - $S7, these registers have to be saved in the subroutine in order not to loose the volatile information stored therein.
... ..
.. ..
.. ...
for example:
sw $T0, 0($S0)
having more registers would allow you to bypass this step of writing the data to the mem address of $T0, you could use one of the new registers that are not volatile and store it there, thus removing the need for perhaps 5 instructions at a time on each return from a subroutine alone.
rather than the Store Word instruction (SW) you could just:
addi $T1, $ZERO, $S0
which would not be lost in the return to the calling function.
further to this, and i'm not sure that the intel x86 performs the same way, when you wish to load a
large number, i think in MIPS its >8bit into a register (16 bit register size) you have to infact perform TWO operations to load ONE number.
basically you load the first (largest significant bits) first
number = xFFFF FFF0
LUI $T0, xFFFF #load to the upper half of the
# register, because the address
# space only allows for 8 bit size
ADDI $T0, cFFF0 # add the second portion of the
# number.
on the basis that x86 shares some of these things, then 64 bit must be faster GIVEN even ground with compilers and so forth, these are assumed (EVEN THO THAT IS NOT THE CASE) because otherwise its all pissing in the wind.
if this has errors, forgive me, this is not my area of specialty by a long stretch.
-Archfile
Your "analysis" may be valid, but it's really not applicable. The title of the story is, "Are 64-bit Binaries Really Slower than 32-bit Binaries?" The author takes a 64-bit machine, compiles a few programs, and tests the resulting binaries to see which is faster.
How can you be certain that this isn't simply comparing the efficiency of the compilers - and not the resulting binaries???
I'd rather be a conservative nutjob than a liberal with no nuts and no job.
But what's the fsking point?
News flash: 64-bit apps are, usually, slightly slower than 32-bit ones. Duh. Any developer who's been around 64-bit environments for more than a few weeks knows this. It's not like there's some subtle magic going on here; bigger pointers means more data to schlep around.
I think your parent's complaint is that is sort of like a cursory analysis indicating that triangular wheels aren't quite so good as round ones. If you really needed to be told this, you aren't in the audience that the article sounds like it's trying to address.
Certainly, many applications need 64 bits to operate. That doesn't mean it's the best tool for all jobs. The tone of the article sounds like it's exploring some big question that nobody's thought about before, and that's just silly.
Are you kidding? This guy is a genius. Not only did he actually figure out that the UltraSPARC-II processor is 64-bit, but he can actually use the file and time utilities! Most of the "linux admin" types I know who buy old Sparcs for the novelty factor end up putting linux on them anyway..."This Solaris stuff is too hard".
Shutting down free speech with violence isn't fighting fascism. It IS fascism!
64-bit binaries run slower than 32? That's certainly the dogma in the x86 world, where 64-bit is in its infancy. That was the belief about Solaris/Sparc and the HP/AIX equivalents FIVE YEARS AGO maybe.
Running benchmarks of 32 vs. 64 bit binaries in a 64 bit Sparc/Solaris environment has shown little or no difference for us, on many occasions. If the author had used Sun's compiler instead of the substantially less-than-optimal gcc, I expect that his 20% average difference would have disappeared.
"People who do stupid things with hazardous materials often die." -- Jim Davidson on alt.folklore.urban
Because he used the same compiler, in 32-bit and 64-bit mode???
A deep unwavering belief is a sure sign you're missing something...
OK, I'll do that. The compiler does make a difference. I'm just thinking that we don't take the whole picture into account enough here on slashdot.
C|N>K
In 1981 I worked at a place that built multiprocessor business computers with an interpreted langauge (I take the blame for "CADOL III") using 8-bit computers.
Word came down from management/marketing that we "have to go 16-bit", never mind that we demstrated we had a faster system built out of 8 bit stuff, marketing had to be able to sell 16 bit systems to run our langage that was based on an 8 bit stack model.
64 bit makes sense if you're rampaging through memory 64 bits at a time. But nearly 10K for a "hello world!" program? Oh, help. This was like 600 bytes on a PDP-11 in C, much less in assmbly.
I've seen things like Quark go from a 2 meg distro to a 300MB one going from 16 bit to 32 bit.
A 64 bit MS world terrifies me.
Need Mercedes parts ?
I benched MySQL4 on a dual Athlon-MP system and it ran about 32% faster in 64-bit mode. Try it yourself is all I can say.
It was a sweet upgrade as I had been using the server in 32-bit mode the first couple of months having it.
It all depends on the specific architecture you're dealing with. neither is inherently faster. 64-bits just requires more transistors to deal with.
Repeal the DMCA!
If the app dealt with numbers that need 64 bits to natively represent, and dealt with them primarily(like some sort of numerics program) then a 64 bit binary will probably win.
Both 32bit and 64bit binaries running on the same processor get the same data paths and the same amount of cache on many processors. But, for one thing, 64bit binaries use up more cache memory for both code and data. So, yes, if you run 32bit binaries on a 64bit processor with a 32bit mode, then the 32bit binaries will generally run faster. But the reason why they run well and all the data paths are wide is because the thing is a 64bit processor in the first place--that's really what "64bit" means.
64bit may help with speed only if software is written to take advantage of 64bit processing. But the main reason to use 64bit processing is for the larger address space and larger amount of memory you can address, not for speed. 4Gbytes of address space is simply too tight for many applications and software design started to suffer many years ago from those limitations. Among other things, on 32bit processors, memory mapped files have become almost useless for just the applications where they should be most useful: applications involving very large files.
Which proves nothing. Since they are different platform, the code might not be as optimized in both of them.
between precision and speed.
It's not surprising that 64-bit processors are rated much slower than 32-bit ones. The fastest 64-bit AMD is rated 2.0ghz while the fastest AMD 32-bit is 2.2ghz.
If you use a shovel you can move it very fast to dig a hole. If you use a backhoe you're going to move much slower but remove more dirt at a time.
Using modern technology to build a 386 chip would result in one of the highest clock speeds ever but it would be practically useless. Using 386 era technology to build a 64 bit chip would be possible but it'd be massive and horribly slow.
I'm still debating whether or not to go with 64-bit for my next system. I'd rather not spend $700 on a new system so I can have a better graphics card and then have to spend several hundred more shortly after to replace the CPU and MB again. But then again, 64-bit prices are still quite high and I'd probably be able to be productive on 32-bit for several more years before 32-bit goes away.
Ben
Work Safe Porn
GCC uses the same code generator for both Sparc32 and Sparc64.
A deep unwavering belief is a sure sign you're missing something...
I read the article and its another short article that is basic and cursory making some kind of grandiose conclusion from some tests that are laughable. Typical of OSNews.
And no I'm not going to write an article for you to enjoy.
The article mentions tweaking the LD_LIBRARY_PATH...
I was told a long time ago by a number of people I considered to be Solaris gurus -- not to mention in a number of books, Sun docs, etc. -- that the LD_LIBRARY_PATH variable was not only heading towards total deprecation, but introduced a system-wide security issue.
In its stead, we were supposed to use the "crle" command to set our library paths.
On all of my boxes I use crle and not LD_LIBRARY_PATH and everything works as expected.
Any pro developers or Solaris technical folks that can comment on this?
I put NetBSD on most of my Sparc hardware. Because then I can run and build from the same exact source tree of packages as I use on my Intel boxes. And run a kernel built from exactly the same source.
Which brings up a point: both NetBSD/Sparc and NetBSD/Sparc64 will run on an Ultra 1, which is a 64 bit machine. Why doesn't somebody install each NetBSD port on two seperate Ultra 1 machines. Then the benchmark comparision can be between the normal apps that build on both systems, running in parallel on two identical systems. Its exactly the same codebase except for the 32 or 64 bittedness.
---
BTW, -fbranch-probabilities looks interesting (gcc 3.2)
C|N>K
Yeah, I know what you mean...I thought Solaris was too hard as well.
News flash: 64-bit apps are, usually, slightly slower than 32-bit ones. Duh. Any developer who's been around 64-bit environments for more than a few weeks knows this. It's not like there's some subtle magic going on here; bigger pointers means more data to schlep around.
That is the sort of "obvious" conventional wisdom that the article is questioning. In fact, 64-bit architecture means a lot more than pointer size, and merely counting bits is no way to estimate performance.
When we get solid state hard drives and if they're reliable and fast as regular ram then ram will be gone and the SSD will take over. So in essence your machine may just allocate itself a huge chunk of the drive as it's own memory space..
Imagine a machine that can grab 16g for it's memory usage and your video card having a huge chunk for itself also. Along with your terrabits of information space if things pan out well enough.
Shouldn't a fair benchmark take advantage of 64-bit-only optimizations?
Somewhat related to what you said, the UltraSparc-II processor was 64 bits *and* the OS was still 32 bits (Solaris 2.5.1). It was only really starting in the world of Solaris 7 that people were given the option to compile 64bit code for a 64bit OS. And they ran into a very small performance hit on the application. (Well know, at least, among system administrators for Solaris boxes.)
Of course, this may be obvious to you, so I point this out for the benefits of others.
BTW... no significant speed loss has been seen on the 64bit Solaris versions vs the 32bit versions. (And they'll let you downselect a 32bit install if that's what you want, even on 64 bit hardware. So at the OS level, it doesn't seem to make much difference.)
Sure send an email to "postmaster@fbi.gov" and ask to host your child porn in their server farm.
You'll be amazed at how quickly they provide you with support. In fact, if you're lucky they'll send some of their field sales reps straight to your house to work out a deal.
LK
"Hi. This is my friend, Jack Shit, and you don't know him." - Lord Kano
Nice try, but no, the article is indeed retarded.
They've at best proved a supposition about a single architecture/process/compiler family. They have not proved a general case. Did they test on amd64? Alpha? Mips? No? Then why are they making unwarranted generalizations? Ah, they're retarded.
The fact is as true as it was then: some applications are going to run faster just because 32-bit compilers are more 'mature'. Once the newer method becomes mainstream, you will see either the same speed, or a gain in speed.
Needless to say, the guy in the other post who stated an anology with an abacus had it right- something small is obviously going to execute faster. We arent switching to 64-bit processors so we can run
10 print "64-bit is k3wl"
20 goto 10
The more complex applications of the future generation, as well as the ability to move large amounts of data from memory to cpu, is what is driving the move.
Manipulate the moderator system! Mod someone as "overrated" today.
Adding more/more complex features to a cpu rarely speed it up by itself, however, it might allow the next generation of CPU to scale beyond the current generation.
Both in terms of direct CPU performance and for the software that runs on it.
This has happened a bunch of times during history. Remember the introduction of MMUs for instance? Definately slows down the software running on the machine, but without an MMU we all know that it was virtually impossible to do stable multitasking.
1/2 GB of memory basically the standard these days with XP.
A lot of people are buying home computers with 1 GB or more.
Dell in Japan (where I live) has a special offer these days on a lattitude D600 with 1GB of ram. That is, they expect to sell this thing in quantities.
I think a fair amount of PC users will hit the 4GB limit within a few years. Personally, I already swear about having just 1GB in my desktop at times when I have a handful of images from my slide scanner open in photoshop + the obvious browsers/mail programs and maybe an office program or 2 open.
Introducing 64bit does not make todays HW any faster than their counterparts, but it will make it possible to continue making machines better, faster and capable of handling increasingly more complex tasks.
If AMD market their Athlon 64 3200 as being 10% faster than a generic 32bit Athlon 3200, where is the speed difference coming from? This article seems to imply the only advantage to having 64 bits is being able to sport a greater quantity of RAM.
Depends mainly on what data the test is using. If it's floating-point heavy, and uses double, then it always was 64-bit. On 64-bit hardware it'll gain the full-width data path and will be able to load/store 64-bit floating-point numbers faster, all things being equal. If it uses ints (not longs), it is and will stay 32-bit, there will be no difference unless the hardware is capable of loading two 32-bit numbers at once, effectively splitting the memory bus in two (HP-PA RISC can do it, his old Sun cannot, newest Suns can, I don't know if Opterons can). Finally, if the test uses data types which convert from 32 to 64 bits it will become slower, but only if it does enough math on these types. The later is important, since every half-complicated program uses pointers, explicitly or implicitly, but not every program does enough pointer arithmetics compared to other operations to make a difference. However, if it does, then it'll copy pointers in and out of main memory all the time, and you can fit half as many 64-bit pointers into the cache.
That's where the slowdown comes (plus some possible library issues, early 64-bit HP and Sun system libraries were very slow for some operations).
If your process resident memory size is the same in 64 and 32-bit mode, you should not see any slowdown. If you do, it's an issue with the library of the compiler (even though the compiler in this case is the same, the code generator is not, and there may be some low-level optimizations it does differently). If resident size of 64-bit application is larger, you are likely to see slowdown, and the more memory-bound the program is the larger it'll be.
Ha! Shows what you know! Atari-STs came with a built-in 3.5" drive - not a 5.25" so I say nya! to your feeble attempt at computer critic criticism.
PERMIT TOMORROW!?!?!?
seriously guys, this is a joke. Take a computer architecture class, and then shut the fuck up.
Why does slashdot have such a bad rep? Cause of dumb shit like this making it to the front page. Good Job at playing scientist guys, really, let me be the first to congratulate you.
lar lar
I figured I would post a comment about AMD and their 64 bit chip benchmarks. Then I realised I was already beaten to it by about eleventy billion other people. Guess I should at least do a FIND through the comments before posting in future!
And now we see evidence of why people with a clue leave /. They go away, wide-eyed and wondering, and discover how many useful sites there are on the 'net!
Goodbye, Ninwa, and good luck! You shall be missed!
*grin*
On SPARC, there are no 64-bit-only optimizations. The only reason to use 64-bit math is either if you need 64-bit integers, or use 64-bit pointers. Since none of the benchmarks can use either (the MySQL benchmark could, but the machine only had 256MB of RAM).
A deep unwavering belief is a sure sign you're missing something...
Not to mention 8bit wonders such as Boulder Dash, Kokotoni Wilf ...
the tests were all run on a 64-bit machine. The argument is not so much about whether 32-bit or 64-bit binaries run faster, but which is the faster architecture. I'm pretty sure we don't have any apples-to-apples test platforms for that one, though
8 whole bits??!!! geez, things sure have gone to pot since we got away from the perfectly servicable 4-bit Intel 4004 with 12 bit addressing. Who the heck needs more than 4kb of addressable memory? Cruft mongers, that's who. And look at the transistor count bloat, 2,300 in the 4004 to how many freaking millions in these 32+ bit hogs.
How exactly did you get 2 x 32bit processors running 64bit code?
Normal people worry me!
By what method is a processor judged to be 16,32 or 64 bits?...
The 6502 had 8bit data, but 16 bit address bus, and was considered an '8 bit'
68000 had 16 bits data, 32 bits address - this was a 16 bit
So, why can't we just increase the address bus size of processors, to 64,while keeping the databus size at 32bits. have some new 64 bit addressing modes. The processor can still be 32 bit data, so the size of programs can stay the same....Store program code in the first 4Gigs of memory, (zero page!) , and the pointers within that code can remain 32bits, but have the option of using bigger 64bit pointers to point at data in the remaining 2^63 bytes. This should give best of both worlds of 32vs64 bit.
Address Windowing Extensions (AWE) really are a good solution for your problem.
If you're doing Win32, but really want 64-bit, then consider Win64. There are several OEMs providing it.
If your response is "can't afford it", then your .5 Terabyte database project is probably underfunded and likely to fail.
How could you possibly run a 64 bit binary on a 32 bit cpu?
Score:
Matt 1 - J.Lo 0
Hehehe
Whilst variable length instructions were frowned upon by all those RISC folk (they liked all instructions to be the same size), given that current memory buses are so much slower than the CPU, all the complexity of popular instructions being shorter isn't really much of a disadvantage, it's a bit like instruction compression to improve bandwidth.
Also smaller code fits in cache better.
Solaris isn't any harder. It's just closed source and there isn't anywhere near as much free software avaiable for it. There certainly aren't as many 'guide for the clueless' websites as there are for Linux, needless to say. That can sometimes be a positive thing. To run free software packages, you can try to coerce the Zoularis thing and build software from the NetBSD pkgsrc tree on it, I guess. The interface between 'free software' and Solaris just has a lot more rough edges, in my experience, than running a Free OS on it from the start. I run Solaris on my SS10sx, because there's no free-software X Server for it that supports 24 bit color on it's dual cgfourteen framebuffer, but other than the ability to 'boast' about running Solaris at home, there's not much other reason to run it. I guess that's a status thing, or something.
---
Port it to AMD64?
Get a 64 bit O/S and run your stuff in 32 bit so it gets most of its 4GB and the O/S still has its own space for caching etc.
OK, I'll do that. The compiler does make a difference. I'm just thinking that we don't take the whole picture into account enough here on slashdot.
You're right about everything, except trying to imply, if that's what you were doing, that he used the 'wrong' compiler. In order to test execution speed of 32-bit vs 64-bit binaries, you need to use the same compiler to build the binaries.
See, it gets complicated when you use different compilers. Yes, GCC is likely to build better-optimized binaries for 32-bit. Yes, GCC has a reputation for not optimizing binaries very well in the first place. But if he didn't use the same compiler for both binaries, the results would have been seriously skewed in answering the question. The results would have called into question why he used different compilers, whether or not the different compilers were equal, and so forth.
To answer the question, he needed a compiler that could build both types of binaries to the same level of optimization, no matter how shitty. He wasn't trying to build the fastest binaries on earth, he was trying to build binaries that could be compared to one another in execution speed, using the same source code, and a compiler that would produce the same shitty executable.
That's all. :)
Like what I said? You might like my music
Is there a a rule of thumb type process for 64 bit overhead? i.e. if I build my application (C. source, 2 GB memory) in 64 bit mode what can I expect to pay in CPU time? My guess is that the loss will be real but trivial, my applications ,like most, are bound by disk etc.
I expect the 64 bit OS will be the determing factor, i.e. does it cache disk better etc.
This program was made possible by a grant from the Ultra-Humanite, and viewers like you.
While the article is a curious comparison of a small set of programs on SPARC{32,64}, I think it is far too limited with only one architecture. A better stab at this issue would include AMD64 vs x86, as well as MIPS{32,64}. The AMD64 ABI under Linux shines especially well in this kind of comparison, as the improvements do a good job of offsetting the extra cost associated with 64 bit pointers and longs.
We were finding the damn things in the ventilators for weeks afterward.
And the brethren went away edified.
You've been trolled. Twice in a row.
Three times, now. Except this time I'll post it as A.C. because it's not worthy of anybody reading it. Like your comment, this one will stay at 0 until the thread is archived, when it'll disappear. The first two times I think I said at least a few things of interest to readers here.
The 64 million dollar question is:
Why does slashdot link to OSNews articles? Is it some perverted NeoCalvinistic response to feelings of inadequacy and post-adolescent guilt?
Cruft mongers . . . that's a beautiful term.
Still, the word size of the processor is not a major factor in now fast a CPU is. Finding fater ways to process instructions, caches, and how fast you run the CPUs at make more of a difference. I am probably leaving out a lot of other major factors. Oh well.
The article is a bit interesting although it seems very amateurish. Just my personal opinion.
In fact the same logic means that with all else being equal an 8 bit processor is slightly faster than a 16 bit processor and a 16 bit processor is slightly faster than a 32 bit processor. But of course all else is never equal so things are usually the other way around.
Has anyone heard of a set (family?) of processors that were exactly the same EXCEPT for the processor's word size?
Losing faith in humanity one person at a time.
I guess you didn't have the "pleasure" of using near, far and huge pointers in DOS compilers. In your model, every library function would have to have two versions - one that takes 32 bit pointers and one that takes 64 bit.
Uniform and simple is good...
Regardless of whether 32-bit binaries are faster then 64-bit binaries, we'll have to move over as we deal with more and more data.
It doesn't matter that 32-bit is faster, programs are going to get large, required RAM is going to grow, and generally whether 64-bit is better right now is irrelevant because in the evolution of computers, it doesn't matter
Why would we have moved from 16-bit to 32-bit in the first place? And 8 to 16 before that. It's just the evolution of computers as we deal with more and more data.
It's irrelevant and pointless to spend time discussing the speed differences now between 64-bit and 32-bit.
Most architectures have separate (64-bit or wider) floating point registers. (For example, IA-32 has 80-bit FP registers.) They never have to use use their general purpose (integer) registers for FP values. So a 64-bit architecture does nothing for FP. It's only important for manipulating 64-bit integer values. You may say "no one will ever need to count beyond 4294967295" but (a) someone does and (b) pointers are integers, and 64-bit pointers are one of the great advantages of a 64-bit architecture. Previously you needed (as you say) voodoo to combine two 32-bit registers and odds are the architecture didn't really have any support for addressing memory that way. Now with a 64-bit architecture you can stick it in a register and do normal operations with it.
Gates' Law: Every 18 months, the speed of software halves.
No, it's not a test of whether 32 or 64 bit is faster. It's a test of whether an obsolete architecture whose fastest younger siblings are still outperformed by IBM, Intel, and AMD.
The results tell you nothing about whether you should seriously consider 64 bit, nor where you should actually be using a 64 bit setup.
Maybe someone can post the performance results for Doom running on a new AMD 64 bit box with a top-end ATI or NVidia card. It'd be about as relevant as the performance of a SPARC5 is to making a purchase decision.
I do not fail; I succeed at finding out what does not work.
If solid state memory as fast as you described ever came out, then there wouldn't be a distinction between memory and storage. In a perfect world they would be one in the same, check out brix-os.sf.net for more info.
Real programmers can write assembly code in any language. -- Larry Wall
At a TV station I used to work at, we used to send people on searches for "Liquid Video". Pretty much the same results! It's amazing the people that get hired at TV stations. Mr. Blinker-Fluid would be a genius compared to some in my industry of choice.
:)
At the station I'm at now, they send PA's to ask the engineers for the "ChromaKey for the Genlock".
It's mandatory to wash your hands before returning to the land of Dairy Queen.
In following your logic, a 128k Duron will run out of cache way before a 2mb Xeon? Making the Duron a better CPU? =)
From excellent karma to terible karma with a single +5 funny post...
It'd be a pain the ass trying to play Karateka at those speeds. :)
nuf said
In reality, that's just a problem with how we currently handle memory addressing. Harddrives, for instance, are up to about 320GBs, and your 32bit PC can handle it.
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
A CPU with a lot of slow transistors
is s worse than a CPU with fews quick transistors, so, short paths are better.
The page-translation in long mode is very slow!!!
In Opteron: 4-level for 64 bits VS 2-level for 32 bits, 512*512*512*512-4KiB vs 1024*1024-4KiB, so, legacy mode is quicker than long mode.
And, the cache penalization is a little high:
With 1 MiB of L2 cache, an array of 10'000'000 longs is a bit slower than an array of 10'000'000 ints.
And for building like-LEG0-machines, is better with AthlonXP than with the expensive Opteron.
open4free
Why go through all the trouble to make it 64-bit anyway? Other than sex appeal, what other reasons are there for 64-bit?
Alrighty, if someone can show me some Hot Babes who think running 64-bits gives me more sex appeal then I'll run out and buy a 64-bit system right now. I'll take a half dozzen. Hell, make them 128-bit and don't bother me with benchmarks!
-
- - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
They get that performance increase on legacy code that doesn't even know about the extra registers by putting the memeory controller on-chip, instead of on the northbridge of the motherboard. This is something that RISC CPUs have been doing for a decade or so. Rumour has it that the Opteron is architecturally very similar to the Alpha.
Stick Men
You are correct, although the issues are more subtle than your examples (not hard).
A benchmark is useless without interpretation. The people at OSNews have failed to give us any technical background information on the SparcV chip (penalties running in 64-bit as well as benefits), a proper breakdown of the type of math done by the example programs, as well as analyses of bottlenecks in the benchmarks (MySQL, for instance, is possibly I/O limited).
They've given us raw numbers, with no thought behind them. This is what makes a bad article.
It's all in benchmark. It doesn't matter what you benchmark, only what you benchmark with ;)
;)
But there are several points
1. The results for openssl are no good because openssl for sparc32 has critical parts written in asm, while for sparc64 it is generic C.
2. The results would be much better if you did it with Sun's cc, which is much better optimised for both sparc32 and sparc64.
3. The results, even if they were accurate, are good only for sparc32 vs sparc64. Basically, sparc64 is the same processor as sparc32, only wider
I don't know what's the case for ppc32 vs ppc64, but when you look at x86 vs x86-64 (or amd64 as some prefer to call it) you have to take into account much larger number of registers, both GP and SIMD.
As a matter of fact, x86 is such a lousy architecture that it really doesn't have GP registers -- every register in x86 processor has its purpose, other than the rest. It looks better in case of FP and SIMD operations, but it's ints that most of the programs deal with. Just compile your average C code to asm and look how much of it deals with swapping data between registers.
(well, full symmetry of registers for pure FP, non-SIMD operations was true until P4, when Intel decided to penalize the use of FP register stack and started to ``charge'' you for ``FP stack swap'' commands, which were ``free'' before, and are still free on amd processors)
x86-64 on the other hand in 64bit mode has twice more registers with full symmetry between them, as well as even more SIMD registers. And more execution units accessible only in 64bit mode.
But, from this chaotic notes you can already see, that writing good comparission of different processors is a little bit more than ``hey, I've some thoughts that I think are important and want to share''. And the hard work starts with proper title for the story -- in this case it should be ``Are sparc64 binaries slower than sparc32 binaries?''.
Robert
Bastard Operator From 193.219.28.162
When we get solid state hard drives and if they're reliable and fast as regular ram then ram will be gone and the SSD will take over.
I remember predicting that SSDs would take over "next year" every year from 1980-1985. Then I gave up smoking.
Sent from my ASR33 using ASCII
If the 32-bit app fits nicely in cache, but the 64-bit app doesn't, then the 32-bit app could be faster -- for certain problem sets.
"Freedom means freedom for everybody" -- Dick Cheney
Which brings up a point: both NetBSD/Sparc and NetBSD/Sparc64 will run on an Ultra 1, which is a 64 bit machine. Why doesn't somebody install each NetBSD port on two seperate Ultra 1 machines. Then the benchmark comparision can be between the normal apps that build on both systems, running in parallel on two identical systems. Its exactly the same codebase except for the 32 or 64 bittedness.
Hey there's a good idea. Is NetBSD absolutely exactly the same between the sparc and sparc64 flavours, minus the compile options of 32 vs 64bit?
I use OpenBSD. I would be willing to install OpenBSD/SPARC64 onto my Ultra 10 run some benchmarks, then install OpenBSD/SPARC and run the same benchmarks, then compare.
Anyone know if OpenBSD SPARC64 and SPARC are close enough in code difference to make this worthwhile?
I recently put OpenBSD 3.4 -stable/SPARC64 onto my 333MHz Ultra 10 and compared it with OpenBSD 3.4 -stable on my old 300MHz G3 iBook. The iBook was about 3 times faster than the Sun in ubench, and benchmarks of md5, sha1 and ssl.
Making the OpenBSD -stable release on the G3 (192Mb RAM and slow 6Gb noteboot HDD) took about 6 hours whereas on the Ultra 10 (128MB in one bank) it took about 8-9 hours.
Disk transfer rate is kinda crappy too. I have two 20Gb Seagate IDE drives, one in the Ultra and one in my Thunderbird 700. The Thunderbird gives me a transfer rate of 28Mb/s whereas the Ultra gives 12Mb/s (they are exactly the same model of drive). I realise there might be a lot involved in limiting the IDE performance on this particular Ultra 10, but I would have thought that both systems could saturate the bandwidth available from these old'ish drives. I've heard that the Ultra 5/10's IDE controller sucks, perhaps there is truth to that. Someone claimed that replacing with SCSI is like having a whole different (faster) machine.
You have now intrigued me into trying OpenBSD/SPARC. I am, however, loathe to remove Solaris 9, now that I finally got it installed again. It takes *ages* to install, even from DVD-ROM. It seems that after every thing it installs, it pauses for a set 90 or sometimes 30 seconds. Meaning I have two options, sit there and click to continue, or go away and wait ever longer.
War crimes, torture, lies, illegal spying... Would someone give Bush a blowjob, already, so he can be impeached?
You have now intrigued me into trying OpenBSD/SPARC.
Actually, forget it. I've just realised, OpenBSD/SPARC probably won't have a kernel that supports the devices in a typical SPARC64 system.
For all I know, OpenBSD/SPARC64 is compiled 32-bit. I shall have to investigate this. I can't beleive this Ultra 10 has been flogged by my little G3.
War crimes, torture, lies, illegal spying... Would someone give Bush a blowjob, already, so he can be impeached?
Besides, you couldn't possibly run a 6502 at 2GHz with today's technology. The chip is not pipelined, so there is way to much logic to complete in each 500ps clock cycle.
Even if you made a new 6502-compatible design that runs at 2GHz, it only had one 8-bit general-purpose register, so to do any useful math would require a lot of arithmetic instructions and a hell of a lot of spills. And where would these spills go? The stack is only 256 bytes.
Want to multiply two 32-bit numbers? Be prepared to do 16 individual multiply operations. And guess what? 6502 has no multiply instruction, so each of those 16 multiplies requires a series of shifts and adds. Plus, the result of each multiply is 16 bits long, so it won't fit in your one GPR. You either need to keep part of the result in memory or in one of the index registers, and either of those is painful. I remember tuning an 8 x 8 -> 16-bit multiply and getting it down to about 200 clock cycles, and that was the best I could do. So, you can expect your 32 x 32-bit multiply to take something like 3000 clock cycles and involve dozens or hundreds of loads and stores.
The moral of the story is, almost every architectural difference between the 6502 and modern CPUs exists for one reason: speed. Still think a 2GHz 6502 would be a screamer?
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
IANAM, is there a mathematical term for the shape of a Pringle?
IAAM, it would be called a saddle.
I've got a fever and the only prescription is more COBOL.
It's absurd because there is no generic "64 bit" or "32 bit" binary... whether they are faster or not is up to individual architectures and implementations.
On a sparc capable of running 64 and 32 bit binaries, sure, it's a valid test, but irrelevant anywhere else. ON an Atlhon-64, the opposite might be true, or they might be the same.
IT shoudl be titled "ARe sparc 64 bit binaries under solaris faster or slower than equivalent 32 bit binaries?"
The biggest fault I can see with this test depends upon sizeof(int) -
I don't know about Sun, but in some other environments in which a 32 bit and a 64 bit model exist, the compiler will always treat an int as 32 bits, so as not to cause structures to change size. Hell, even on the Alpha, which was NEVER a 32 bit platform, gcc would normally have:
sizeof(short) = 2
sizeof(int) = 4
sizeof(long) = 8
Now, consider the following code:
for (int i = 0; i 100; ++i)
{
frobnicate(i);
}
IF the compiler treats an int as 4 bytes, and IF the compiler has also been informed that the CPU is a 64 bit CPU, then the compiler may be doing dumb stuff like trying to force the size of "i" to be 4 bytes, by masking it or other foolish things.
So, the question I would have is, did the author run a test to insure that the compiler was really making int's and unsigned's be 64 bits or not?
www.eFax.com are spammers
68000 had 16 bits data, 32 bits address - this was a 16 bit
the 68000 has 16 bits data, 24 bits address and 32 bits internal, hundreds of usenet flamewars couldnt determine which one to use.
id go with 32 bit because thats how many it got from the programmers perspective.
Actually I rembember developing software on a system with non-volitile main memory. It was a DEC PDP-11, back in the late 70's. It's called 'core' memory. It's slow, certainly by today's standards.
Knowing OSNews, though, it wouldn't be an Atari ST at all...
would need it for the most part.
Close to reality physics in a game engine
requires some pretty large numbers. 64bit will
help realize this.
Now if we only had REAL video systems instead
of this playground videocard shit they keep feeding us
(At outrageous prices no less)
We would be doing some serious app development.
Back when the initial tests were done. .45 WAS better than .9mm in handguns.
The bullets they used were both hardball.
64bit hasn't gone through the ridiculous end-user
rack yet. Ammo changed, so will 64-bit.
how is this offtopic ?
he claimed the article was trash and I disagreed.
And what if the compiler sucks/has no optimizations for 64-bit binaries?
/me wonders if the "AntisLash" is there purposefully, or just a coincidence...
What kind of a shitty processor is this 333 MHz UltraSparc IIi? 20 ops/s is RIDICULOUS! Were you to increase clock frequency by one order of magnitude you would still be left with 200 ops/s, which is still ridiculous: a modern 1.5 GHz Madison processor does 6,000 ops/s.
If this the best UltraSparc can do, Sun might just as well shut down its fabs.
DG Unix was more usable than Solaris to me. Sun jujst doesn't play well with others. All of your prior *nix knowledge doesn't apply and all of your favorite commands don't work or are renamed. I can't wait until they fail.
Duh. Any developer who's been around 64-bit environments for more than a few weeks knows this. It's not like there's some subtle magic going on here; bigger pointers means more data to schlep around.
Apparently not. Look at all of the 64 bit hype running around! Apparently people do think that 64 bit will magically be faster. This is NOT confined to naive desktop users. I see it all the time amongst scientists who believe that their computational application in FORTRAN will become faster just by running at 64 bits.
I've had a 64-bit desktop for years now, ever since we got our shiny new DEC Alphas back in 199x. We were very happy to finally have native 64-bit floating point so we could use double precision and finally get a correct answer quickly for matrix multiplies where the matrix was bigger than 16x16. 32-bit floating point is just plain useless due to roundoff error and 64-bit is quite a lot better.
On expensive systems, you routinely use 96-bit extended precision or even 128-bit if you actually have matrices that exceed 4096x4096 elements, because even with 64-bit, you lose precision.
So here's to the (new, haha) 64-bit desktops. We'll see a use for the extended precision immediately, since we scientists have seen the need for it for the 10 years it's been around. It may even be useful for gamers...
P4 is a 386 chip. So is Athlon.
To take advantage of the higher number of transistors available nowadays, you alter the architecture. That is all Intel and AMD have done.
If you made a 386 "transistor for transistor" I don't believe it could break 300MHz.
Dr Superlove 300ml. I use my powers for awesome
I'm not an expert, but I would think, yeah, you get twice the number of wires.
:)of data for pointers. etc. etc. etc.
So that would mean twice the data per clock tick.
But don't froget the apps are going to need the extra 32 bits (32+32=64
I can't help but chuckleing to myself when I herd all about these "new" 64 bit units. If I rember correctly, "chip people" could have made a 64 bit processors when hardware manufactures could barly keep up with 8 bit units.
If you want to assess the general advantages (or not) of 64-bit computing, then you do the following:
1. Choose functions that take advantage of 64-bit, like floating-point calculations.
2. Choose applications that have been optimized for 64-bit.
3. Compile with a compiler that will optimize for 64-bit.
4. Test on more that one 64-bit platform.
But this author:
1. Chose functions that do _not_ involve a lot of 64-bit operations.
2. Chose functions that have _not_ been optimized for 64-bits.
3. Chose a compiler that is _not_ optimized for 64-bits.
4. Tested on only _one_ 64-bit platform.
And then he concluded that 64-bit computing sucks.
Gee, what a surprise.
Some of the fastest supercomputers and server farms in the world today are using 64-bit platforms. But they're being used for applications that benefit from 64-bit computing, like rendering the graphics in Hollywood movies.
The author talks about the weaknesses of his benchmarks at the end of the article, and I think he's probably being sincere.
At the same time, I am well aware of the fact that Microsoft has been trying to FUD the idea of 64-bit computing. Microsoft knows that Windows is not ready for 64-bit (it may have been released, but the performance sucks), and that anyone who chooses 64-bit hardware these days will end up running Linux or Unix.
With 64 bits you need to make the critical path of the processor wider (the ALUs, the register file etc...). The makes your circuits slower. You have the lost opportunity cost of frequency -- You could have made the processor run at a higher frequency with a 32 datapath - For those of you with a background in computer arithmetic, think about computing the carry-bit in an adder.
64 bit addressing means that your pointers now take up twice as much space as your old 32 bit pointers. This exerts more pressure on your memory system and makes your L1 and L2 caches appear smaller. It also makes your memory bandwidth appear narrower since you need to ship more data across the bus between the processor and DRAM. You can't say something like - oh the 64 bit processor's bus is twice as wide because you could have made the 32 bit processor's bus twice as wide. In other words, the width of the bus is orthogonal to the instruction set supporting 64 bits. Similar arguments hold for caches.
The benefit of x86-64 is that it provides more registers. This means that there are fewer loads and stores which are used to spill and fill stack variables. This reduces the number of instructions required. It doesn't necessarily speed up the processor because stack variables would live in the store-forwarding buffer and would bypass the cache lookup, but you do have fewer instructions which means less pressure on your instruction cache and lower instruction-fetch bandwidth requirements.
There is a benefit for applications that naturally use 64-bit data types. This is a minor effect because many apps do not need the full 64 bit integer arithmetic. This benefit is also offset by the opportunity cost of frequency described above.
The other big benefit is that you can now address a 64-bit virtual address space which is convenient for accessing large data structures. This is the main reason why processor architects want to move to 64 bits.
Yes.
OSNews... Useful?
;)
He'll be back.
and merely counting bits is no way to estimate performance.
If you only have room for 16k of data in your L1 cache and all your size_t, pointers, and in most cases longs too take twice as much memory at worst it is like you have only 8k of cache now compared to the 32bit version!At best it is going to make no difference, but at worst it is like your system now has only half the cache and half the memory bandwidth. Seems to me that by counting bits you can estimate your performance will be between 100% and 50% of the 32bit version, all other things equal.
A noteable exception would be when you need a 64bit value and are forced to emulate that.
If they do NEED the 64 bit precision, then yes... Obviously, it's going to be faster because they'll have to do about half the clycles to complete the same amount of work.
It's not going to be faster because 64bit computers are some sort of magical beast. That's stupidity.
Of course, most everyone dosen't need a 64bit computer right now. Most motherboards can't even address 2GB of ram, let alone 4, and the smallish-servers that DO use have than 4GBs generally don't use *that* much more, if they even need it.
If 64 bit computers were relegated to huge server farms and supercomputers for a few more years, I wouldn't be dissappointed.
Other than sex appeal, what other reasons are there for 64-bit?
This is a man who is NEVER going to get laid.
-1 Uncomfortable Truth
What this tested was 32-bit applications recompiled in 64-bit mode. Which means that the CPU is constantly executing extra code to truncate 64-bit numbers down to 32-bits, which is one-half the reason it is slower and larger. The other half the reason it is larger is because the default size for constants etc. is now twice as big (as well as the space taken for anything actually declared "int", and not wrapped in some compatibility-layer "int32" typedef). And the other half the reason it is slower is because larger code takes longer to load into the instruction cache, and thrashes the cache more.
So the lesson here is, "Don't blindly recompile your 32-bit applications in 64-bit mode, they'll get slower." I have some apps here that make heavy use of the "long long" type that I'd love to try on a 64-bit machine someday. But since that machine in 32-bit mode will probably handle long-longs quickly, 32-bit *may* still be a win.
I just want to point out that you can put 1 gig ram in an Ultra 5 not 512 ram that author claims. There 4 slots for 2 banks of 2 DIMM. You can put 4 256 DIMM in there.. Sun part X7032A..
We are now in a transitional period, when 64-bit CPUs exist and can process 32-bit code "natively" (by setting a bit in a CPU control register, without explicit affixes to each opcode or operand specification saying what the operands are 32-bits wide). Yet at the same time, in a 32-bit executable we can still do 64-bit math at the native speed of the 64-bit processor by using operand-size affixes.
In such an environment, expect 32-bit applications compiled into 64-bit executables to be slower, since C programmers (due to C's weakness as a systems programming language) tend to make "int32" typedefs that are processed more slowly in 32-bit mode due to affixes and possibly the need for sign-extension/truncation.
If you have a native 64-bit program (such as a cryptography program that makes extensive use of "int64" ("long long" in 32-bit gcc)) then it may show a speed improvement when a 64-bit executable.
Down the road, expect post-transition CPUs that can only process "int32"s with a slight speed hit.
Benchmarking 16-bit versus 32-bit on today's top end processors would be interesting, though it is important to distinguish between "16-bit applications" and "32-bit applications compiled in 16-bit mode".
If the compiler sucks, then it would suck equally for 32-bit and 64-bit binaries! They use the same code generator!
A deep unwavering belief is a sure sign you're missing something...
That can sometimes be a positive thing
Not necessarily. I remember when i was starting out I'd get really confused because different guides would have conflicting (or at least inconsistent) advice
OpenBSD's sparc64 port is 64-bit but the last time I tried it there was still a lot of work to do. It worked for the most part (it was just a pf box) but a lot of little problems caused me to drop it in favor of Debian/sparc64 again.
And it was a PITA to install since there's no tftp bootable images like there are for Linux.
"And for building like-LEG0-machines, is better with AthlonXP than with the expensive Opteron."
I don't think you can call the Athlon 64 3000+ "expensive"...
Regards, Anon. Coward
Agreed. There is a very valid plae for 64 bit. Of course, the ones I was referring to explicitly just wanted to speed up their current application (which was happily running on 32 bit machines, using single floats).
Many actual motherboards only support up to 2 x 512 MiB DDR400 or 2 x 1024 MiB DDR400 using 2 DIMM slots. :(, i don't know why!!!
But, if it's 3 or more DIMM slots then it down to DDR266
open4free
The x86 is register starved- severely so. The AMD64 architechture in 64-bit mode adds a bunch more GP Registers for computational purposes. It's enough to make a boost of up to 30% right there. There's a couple of other things within the architechture which add their own contributions.
All in all, AMD64 is more of an exception than the norm. Normally 64-bit code should be expected to be as fast as 32-bit code at best and only slightly slower at worst. When people talk about the stuff being dramatically slower, they're referring to the increase in memory bandwidth- and they don't take into account that memory is set up in a manner that is not byte-wide. It's typically 32-bits wide and possibly 64-bits wide on some designs. This would translate into a small to moderate performance hit in some designs and none in others.
I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
apples -- to -- apples
Why not save space by using 0.5 instead of 1?
Depends on the size of the bus as to whether or not it'd take longer for 64-bits instead of 32-bits.
If your bus is 64-bits, it won't take any longer than the 32-bits would on a 32-bit machine.
I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
If 64bit is slower than 32bit, then...
32bit is slower than 16bit, therefore...
16bit is slower than 8bit, so...
8bit is slower than 4bit, which means...
4bit is slower than 2bit, in which case...
2bit is slower than 1bit, which concludes that...
1bit is faster than 64bit!!!
OpenSSL might be kind of arbitrary
Actually, I'd say it's a good choice. After all, it does involve mathematical operations on large numbers. Many websites use it on their servers for sales, passwords, and account access, as well as other security concerns.
The computing power demands of one SSL connection might not be much, but when you get into hundreds of connections, this gets to be a major strain on hardware. If going 64 bit reduces the number of cycles needed to process a thread, it can reap major benefits.
This seems to be a better choice than gzip and MySQL, as gzip is often assosiated with fetching things from off-board storage (network/HD/something else slow), And MySQL is often memory limited as you're messing with massive amounts of data. So your left with doing complicated things to a data set that will fit in main memory. I think that what you're doing, and what kind of data you're working on, being a large factor in the results (if the data's 32bit or smaller, the 32bit app might take the lead, where if the data's larger than 32 bit, the 64bit app takes the lead).
I don't read AC A human right
This is modded Insightful?
You've completely missed the entire point of the test. This has nothing to do with your next purchase decision -- it's purely designed to test whether or not the common claim that using 64-bit values decreases performance due to memory latency is true. This test makes no claims whatsoever that it has anything to do with whether or not you should be using a 64-bit setup. RTFA.
The "obsolete architecture" is one of the few where 64-bit and 32-bit operations have no inherent performance advantage on the processor, unlike the Opteron and Itanium processors where 64-bit mode has several advantages over 32-bit mode (extra registers or not being emulated). This makes it a perfect testbed for evaluating this claim. The speed of the processor has absolutely no relevance to the question at hand (with the exception of testing memory access starvation on system with a greater CPU to bus clock difference).
It's a shame you're too wrapped up in a "buy, buy, buy" mindset to consider the value of curiosity and of testing commonly held beliefs.
If it's for-profit but free, you're not the customer -- you're the product (e.g., the Slashdot Beta's "audience").
Well I'm not sure if it's any sort of relevant benchmark, but running the OGR benchmark of dnetc on the Ultra10 I used to use at work showed it could crunch gnodes *almost* as quickly as an AMD K6 500. I was surprised at that since it actually felt slower to use (the AMD setup was running Linux, BTW)...
Code, Hardware, stuff like that.
:::It's irrelevant and pointless to spend time discussing the speed differences now between 64-bit and 32-bit.:::
Unless you need to make a decision on which machines to deploy in the near future...
It does matter, just not to most people, and probably not even to the people who are most rabidly touting 64-bit computing **(cough)**Athlon64fanboys**(cough)**.
---
According to the latest ruleset, this post should be modded as Vorpal Flamebait +5.
Funny, isn't that the idea behind IBM AS(OS)/400 ?
Not exactly a novel idea.
oh wait now its redundant. even though noone else disagreed with him. wtf are the moderators smoking ?
Is this dude sponsored by Intel? Why would someone go through these kind of efforts to find out a 64bit app is a tiny bit slower as the 32bit version? Maybe some people were shocked to find Opteron on 64bit was a lot faster as when running in 32bit mode. Why would someone create FUD about 64bit being slower as 32bit when Opteron currently is pulling _all_ bricks out of Intel's backyard??
:
:
remember this quote?
"Windows [n.]
A thirty-two bit extension and GUI shell to a sixteen bit patch to an eight bit operating system originally coded for a four bit microprocessor and sold by a two-bit company that can't stand one bit of competition."
(Anonymous USEnet post)
Here's another one
"Itanium [n.]
a.k.a. Itanic. An incompatible sixty-four bit extension to a thirty-two bit Pentium 4 CPU created by a company who's previous CPU was called Pentium 5 and presumably also cannot count upwards in performance."
Robert
How do I feel? Like I'm stuck in a bad Christopher Lambert movie.
wow, that's REALLY bad!!
Hey, I've still got my Atari ST, and it cool. They most certainly do not suck.
Amiga's do suck though.
.-.--
I disagree with the analysis.
Specifically, I tested my own sparcv9 OpenSSH on sparcv9 OpenSSL compiled with GCC-3.2.1. I got different results, and I have a different conclusion. My needs are to scp 8GB database dumps from one host to several others. I was interested in two factors. First, if run time was significantly different, that would be a compelling factor. That factor accounts for economy in scarce maintenance window time which I can safely saturate the network. Second, I was interested in the amount of system resources consumed during the process.
My results were different from the OSNews guy, but then again I am not intimidated by gcc make bootstrap. My sparcv9 OpenSSL libcrypto was carefully configured and compiled to use as much of the sparcv9 assembly code as the OpenSSL project provided. Both of my hosts were Sun Fire V880s with 4x750MHz Ultrasparc-III CPUs running 64 bit Solaris 8. SSH was configured to use the Blowfish symmetric cipher because it is not as CPU expensive as 3DES or AES in software. The disk volumes of the source and sink were EMC Symmetrix volumes over 1Gbps FC SAN using Emulex 900x host adapters. Over several trials, neither 64 nor 32 bit SSH could claim a run time advantage. However, the user-CPU time (number of cpu cycles spent executing SSH code as opposed to time spent waiting for IO service times or kernel/syscalls to complete) for the Sparcv9 SSH was slightly more than half of the same statistic for the Sparcv7 binary.
Admittedly, there is a difference in my benchmark test cases aside from instruction or data word length. Sparcv7 has no integer multiply or integer divide instructions, so it must rely on long-division and repeated adding to achieve the results of those operations. I needed a sparcv7 binary because I still have one old sparcv7 box in service that needs SSH. Because of the cache hit ratio when doing that kind of math, and the fact that Blowfish does not rely heavily on those operations, I doubt that these instructions would account for the difference in CPU cycles required to process this 8GB file.
Confirming this is difficult because the sparcv8 ABI was a 32 to 64bit crossover architecture in which the OS would interperet and optimize some operations if they were running on a 64 bit Solaris (2.7 or 8) kernel. Thus, a 32 bit sparcv7 binary running on sparcv9 hardware and a sparcv9 kernel may actually be partially executed in 64 bit mode, and be efficient at it. If there are any sparc ABI guru comments to enlighten us on the specifics, those comments would be appreciated. I believe a more rigorous test would verify that the kernel running the 32 bit userland software was a 32 bit kernel.
I conclude that the large data word size allowed SSH to encrypt the 8GB file in nearly 50% fewer CPU cycles, and therefore is faster for this type of operation. Aside from that, note that RSA/DSA operations on the sparcv7 binaries are SIGNIFICANTLY slower, probably due to the multiplication and division required by those algorithms. I'm sorry I wasn't doing this benchmark for publication or I would have saved my data/results.
--- Nothing clever here: move along now...
It depends on what you need to do. On a 64 bit processor you MUST sling around 64 bits per clock cycle. If you need to, like in encryption applications or in a well written database engine, then it will be faster. If you only need 32 bits, then you are slinging around 32 bits of junk on every tick and will see no benefit. Simply recompiling an application which uses 32 bit aligned structures as 64 bit will cause a performance decrease. But redesigning those structures top be 64 bit alligned will cause a performance increase. Another point of confusion is that most 64 bit processors introduce some form of VLIW (Very Long Instruction Word). VLIW lets the compiler, rather than the processor, sort out the dependancies between instructions and decide how to load the pipelines. The effectiveness of this is compiler dependent and will improve as the compiler technology grows to fit the new processors. VLIW is not inherently a feature of 64 bit processors, but the 64 bit processors are better able to handle the long instructions,