What Improvements Will 64-Bit Processors Bring?
RyanG asks: "Everyone always looks at numbers (MHz, RAM, HD) when they're considering buying a new computer. Recently, more users have been eyeing bits, as in 64-bit processors, namely the Itanium and to a lesser extent the G5. A lot of people remember the performance increases that were seen when moving from 16 to 32-bit processors and some people seem to think similar performance increases will be realized when moving from 32 to 64-bit pocessors. From what I've read this isn't going to be the case given that 64-bit percision isn't needed in all but a few cases and that moving around that extra data can actually hurt the performance of 64-bit processors when compared to 32-bit processors. Anyone care to comment?"
This makes question #11 on my Architecture midterm today. . .
The jump from 32 to 64 bits isn't about speed or precision, it's about the amount of useable address space on a given architecture. For whatever reason (call it functionality, call it bloat, whatever), the amount of address space that programs require is going up by .5 to 1 bit per year. Have you noticed that a lot of people are starting to complain that their PC's are maxed out at 4GB, especially for things like heavyweight apps like db servers, simulation programs or MSWord? Or that there's been a lot of work on Linux or NT to allow the user to access more of the 4GB on the box? Guess what? The 80386 came out 16 years ago.
So the jump now is mostly to allow us to continue to grow for another 32 years. Most processor manufactures tried to get the migration started early - the SPARC, MIPS and Power(PC) chips have all supported 64-bit operation for some time now. The Alpha was origionally designed as a 64-bit processor 10 years ago. Intel and AMD are actually rather late to the game.
It's been said that the only thing that killed the PDP-11 from DEC was its small (16-bit) address space - Users were very happy with it, but when they needed more room for their programs, the PDP just couldn't be expanded to handle them. This is probably why DEC started migrating everyone to the Alpha 10 years ago. The origional release of the Alpha only used a 34-bit address path (so it could access 16GB of RAM - the rest is reserved). If you want the details check out chapter 5 of Computer Architecture, A Quantitative Approach by Patterson & Hennessy.
-"Zow"
Mainly it seems people are talking about the register width, precision, and of course address space.
;;
Keep in mind the first Itaniums have a 64-bit virtual adddress space, while the physical space is limited to 52-bits I think.
The Hammer series processors are really just an x86 extension. They offer no where near the capabilities of Intel's fresh start with IA64.
Here are some of the features of the IA64:
-> Heavy use of ILP (Instruction Level Parallelism) - speaks for itself.
-> Predication - less branches taken and hence stalling. The conditional handling is done through a controlling predicate, rather than jumping. look at this C code:
if (!eax) ebx=VALUEB; else ebx=VALUEA;
Now the i386 code:
testl %eax,%eax
jz 1f
movl $VALUEA,%ebx
jmp 2f
1: movl $VALUEB,%ebx
2:
Now the IA64 code:
p2,p3 = cmp.ne r5,0
(p2)ld8 r4=$VALUEB
(p3)ld8 r4=$VALUEA
/* last two statements run in parallel */
Now whereas the i386 code jumps all over the place, stalling the CPU, the IA64 code uses the controlling predicate registers to decide (p2,p3)
->Huge register sets
r0-r127 are the general 64-bit registers, compare this to eax,ebx,ecx,edx,esi,edi,ebp?
p0-p63 are the predicates
As well as 128 82bit floating registers f0-f127
->Speculation
Normally you can't reschedule a load to run before a store because the addresses can overlap
*ptr=b;
some_code_that_does_not_touch_b_c_ptr_ptr2();
c=*ptr2;
Previously, you couldn't move c=*ptr2 prior to the start of this code because ptr could overlap the same memory as ptr2.
Now you can - basically the load (using the "advanced load" instruction) is performed anyway, which allocates an entry in an internal table, then the store, and _if_ that store overwrites the load the load is performed again. Hopefully though, this shouldn't happen often. And its more flexible and powerful than this, this was just a simple example.
-> Remappable registers - the registers can be mapped kind of in a way that memory can be paged - that way when calling a new function the stack is not necessarily needed to push and pop various registers.
->"Modulo" loop scheduling
The beginning of the next iteration of a loop before the last one has finished - the remappable registers "rotate" to give each iteration a new set of the virtual registers
->An interesting way of handling paging
Which reduces TLB flushes on task switches by tagging an entry in the page tables with a unique ID specific to a process - I'm not fully sure on the details on this since I've never looked at IA64 system programming.
Sorry I'm sounding like an Intel brochure, but it really is quite amazing if your coming from x86 programming background - IA64 is a lot more than doubling data unit sizes. I suggest if you're familiar with assembly programming read the IA64 manual at developer.intel.com.