What Improvements Will 64-Bit Processors Bring?
RyanG asks: "Everyone always looks at numbers (MHz, RAM, HD) when they're considering buying a new computer. Recently, more users have been eyeing bits, as in 64-bit processors, namely the Itanium and to a lesser extent the G5. A lot of people remember the performance increases that were seen when moving from 16 to 32-bit processors and some people seem to think similar performance increases will be realized when moving from 32 to 64-bit pocessors. From what I've read this isn't going to be the case given that 64-bit percision isn't needed in all but a few cases and that moving around that extra data can actually hurt the performance of 64-bit processors when compared to 32-bit processors. Anyone care to comment?"
The ClawHammer and SledgeHammer processors are going to be 64bit processors as well, don't forget to include the competition!
501 Not Implemented
Part of the Playstation 2's bus is 2560 bits wide. Must be one dang fast machine...
The computer industry really needs some new standardized, best-buy friendly means of identifying real processing speed - not just for the processor but for the entire system (as this is what matters to actual people). And it needs to be something that Sun can't fudge the results of.
Of course, nobody actually wants this. It's what Scott Adams refers to as a confusopoly. They can all stay in business by making sure nobody understands how fast their products are.
Let's not stir that bag of worms...
I've been dabbling in 64-bit computing since my business bought me a SiliconGraphics Indigo2 workstation with a MIPS R10000 CPU. Granted the machine could only address 768 MB of RAM (a decent amount for '95... but modern workstations, like the Octane, can address 8 GB or more), but it was quite neat to work with 64-bit pointers, larger datasets, and huge files. For me, the jump to 64-bit computing simply allowed me to work with big numbers without having to revert to tricks and other hacks. For others, it was the introduction of 6 GB datasets in RAM, being able to address each and every element with clean code. These days 64-bit computing is an assumed must on large scale systems (where 64 GB - 1024 GB RAM [on Origin 3000] on a single system [not a cluster] can be considered normal). But on the already-fast, one to two CPU desktop computer, it's not much more than a marketing blurb. And, for most, it's not even too needed, at least until 2+ GB of RAM becomes the norm. 32 bit computing already allows for some pretty darned big pointers and addressing.
If you ever find out how to calculate this one magic number, you'll be quite famous). Don't you think that all the benchmarking is for a reasons? There is no single number that measures the "Fastness" of a PC. Thats why each benchmark that is worth looking at puts multiple Numbers for each candidate out (Not that this stops everybody from takeing one of those and saying "This one is the fast-value")
On older machines, this was either an absolute hard limit (64K, period) or kludged in some way (Apple //c had a special bit to bankswitch in one or the other 64K memories, but both couldn't be used at the same time.)
The IBM PC had its segment registers and so could address 1MB, but it was far from transparent. There was no way to declare, say, a 200K array of strings. The programmer's data structures had to be tied quite closely to the peculiarities of the architecture. We spent as much time working around the limitations as we did writing useful code.
When 32 bit computing came along, bam! What a change! Want to declare a 20MB data structure? Go for it! In terms of articifial restrictions, there just weren't any practical limits to run into, or around, day in and day out.
The reason 64 bits won't be as revolutionary as 32 bits is that, for the most part, 32 bits is still good enough. Even with the bloated software we have these days, 4 billion is still plenty when it comes down to most things. Take time_t; that's still not going to overflow for another 30 years. 4 billion is a lot.
A 64-bit CPU working with 32-bit data is being slightly inefficent, but don't worry too much about a slowdown from that, as they'll tend to be inherently faster over time, which should more than make up for it.
So, basically, you heard right. I think.
--
I don't want to rule the world... I just want to be in charge of mayonnaise.
You're right of course.
But surely we could agree on some standard - a set of benchmarks evaluating the machine's performance at representative tasks.
And you're right, the information is out there already - but because there's no friendly standards body the information seldom makes it to consumers.
If suppliers or OEMs wanted to they could provide more information... And they'd stick with the standard until they find out they didn't win.
Case in point: Sun is pulling out of TPC-C (all hands to battle stations!)
Let's not stir that bag of worms...
About a year ago, my engineering frim replaced almost the entire computing network with Win2K based PCs and servers, most running Pentium III CPUs @ 933 MHz.
Why does this matter? Because in the years before the PCs, I had *four* different 64-bit workstations on my desk. Over a period of about six years I had a Sun Ultra 2 (64-bit UltraSPARC II), SGI Indigo2 (64-bit MIPS R10K), and two AlphaStations (64-bit Alpha 21164 and 21264).
How are things now, with our 32-bit PCs? Faster, for the most part. 64-bit computing is about the size of numbers that can be used. With 64 bits, you can address more RAM, address more disk space, and put more in each address. It really has nothing to do with performance (unless you're doing a task with big numbers that is a slow kluge under a 32-bit CPU).
We haven't had too many growing pains. The servers were a nightmare for awhile, but that was mostly due to software. Thankfully our visualization room is still running off an old Onyx2 (I don't know of any easy way to run 4 full screen projectors off a single PC and still have good gfx performance). Several of our senior engineers still use their old dual CPU Octanes, too. We're saving money buying new PCs versus buying new UNIX workstations. Thankfully, most PC components are now capible enough for our tasks and software, but really, everything about PC hardware is *crap* even the so-called "professional" components. But, given the low price we're able to buy one of everything and find out what works.
*shrug*
There are other things that 64 bits gets you than just bigger numbers. You also can scan, process, or copy data in much bigger chunks. Functions such as memcpy, strcmp, and strlen can be significantly sped up with a 64 bit boost.
Care about electronic freedom? Consider donating to the EFF!
This makes question #11 on my Architecture midterm today. . .
The jump from 32 to 64 bits isn't about speed or precision, it's about the amount of useable address space on a given architecture. For whatever reason (call it functionality, call it bloat, whatever), the amount of address space that programs require is going up by .5 to 1 bit per year. Have you noticed that a lot of people are starting to complain that their PC's are maxed out at 4GB, especially for things like heavyweight apps like db servers, simulation programs or MSWord? Or that there's been a lot of work on Linux or NT to allow the user to access more of the 4GB on the box? Guess what? The 80386 came out 16 years ago.
So the jump now is mostly to allow us to continue to grow for another 32 years. Most processor manufactures tried to get the migration started early - the SPARC, MIPS and Power(PC) chips have all supported 64-bit operation for some time now. The Alpha was origionally designed as a 64-bit processor 10 years ago. Intel and AMD are actually rather late to the game.
It's been said that the only thing that killed the PDP-11 from DEC was its small (16-bit) address space - Users were very happy with it, but when they needed more room for their programs, the PDP just couldn't be expanded to handle them. This is probably why DEC started migrating everyone to the Alpha 10 years ago. The origional release of the Alpha only used a 34-bit address path (so it could access 16GB of RAM - the rest is reserved). If you want the details check out chapter 5 of Computer Architecture, A Quantitative Approach by Patterson & Hennessy.
-"Zow"
Think about what the average home/office user is doing on the computer and how much processing power it really takes to make that cursor blink. The simple fact is that for a typical office suite and web browser, current technology is overkill. Some people like to play audio, video, or games on their computers and that takes some more processing power, but it's nothing that pushes the limits of modern hardware (you gamers who say you can tell the difference between 100 and 125 FPS are lying... that's 1.5 to 2 times your monitor's refresh rate).
People are going to get the hot new toys because they're hot new toys and then be really disappointed when everything they've been doing doesn't get any better.
Somebody somewhere might develop the killer app that makes a 64-bit processor make sense for home and desktop users, and I can think of a few things that have the potential to take off like that, but until then the new hardware will basically be a "my dick is bigger than yours" type of thing. I honestly hope that killer app comes sooner rather than later because whatever it is, it'll be killer.
I like to play children's songs in minor keys.
"We're all sons of bitches now." --J. Robert Oppenheimer
The move from 32 to 64 bits isn't so much about performance as it is about ...size, I guess. The ability to hold huge databases and datasets in a flatly addressable space. The ability to do maths with larger and/or more accurate numbers. That kind of thing.
As I recall, a 286 was slightly faster than a 386 at the same clockspeed. The 486 was the first x86 that was actually designed to go fast. The big deal about the 386 was that it did memory management properly, and had the multitasking abilities, and to do that it needed a large addressable flat memory space (hence 32-bit pointers). The 32-bit registers were that size mainly so they could hold pointers and offsets and things. (Yes, I'm simplifying, I know.)
64-bit CPUs will be faster at a few things, like copying memory and crunching RC5, but most performance benefits in future CPUs will have nothing to do with word-size (clock speeds, cache sizes, clever pipelines, etc.).
At work I've been using iMovie on a nice dual processer G4 heavily for the past few weeks and have been amazed at what kind of difference there is between that box and my G3 iMac... most consumers won't get crazy for video, but it is something that more and more people are doing.
Mainly it seems people are talking about the register width, precision, and of course address space.
;;
Keep in mind the first Itaniums have a 64-bit virtual adddress space, while the physical space is limited to 52-bits I think.
The Hammer series processors are really just an x86 extension. They offer no where near the capabilities of Intel's fresh start with IA64.
Here are some of the features of the IA64:
-> Heavy use of ILP (Instruction Level Parallelism) - speaks for itself.
-> Predication - less branches taken and hence stalling. The conditional handling is done through a controlling predicate, rather than jumping. look at this C code:
if (!eax) ebx=VALUEB; else ebx=VALUEA;
Now the i386 code:
testl %eax,%eax
jz 1f
movl $VALUEA,%ebx
jmp 2f
1: movl $VALUEB,%ebx
2:
Now the IA64 code:
p2,p3 = cmp.ne r5,0
(p2)ld8 r4=$VALUEB
(p3)ld8 r4=$VALUEA
/* last two statements run in parallel */
Now whereas the i386 code jumps all over the place, stalling the CPU, the IA64 code uses the controlling predicate registers to decide (p2,p3)
->Huge register sets
r0-r127 are the general 64-bit registers, compare this to eax,ebx,ecx,edx,esi,edi,ebp?
p0-p63 are the predicates
As well as 128 82bit floating registers f0-f127
->Speculation
Normally you can't reschedule a load to run before a store because the addresses can overlap
*ptr=b;
some_code_that_does_not_touch_b_c_ptr_ptr2();
c=*ptr2;
Previously, you couldn't move c=*ptr2 prior to the start of this code because ptr could overlap the same memory as ptr2.
Now you can - basically the load (using the "advanced load" instruction) is performed anyway, which allocates an entry in an internal table, then the store, and _if_ that store overwrites the load the load is performed again. Hopefully though, this shouldn't happen often. And its more flexible and powerful than this, this was just a simple example.
-> Remappable registers - the registers can be mapped kind of in a way that memory can be paged - that way when calling a new function the stack is not necessarily needed to push and pop various registers.
->"Modulo" loop scheduling
The beginning of the next iteration of a loop before the last one has finished - the remappable registers "rotate" to give each iteration a new set of the virtual registers
->An interesting way of handling paging
Which reduces TLB flushes on task switches by tagging an entry in the page tables with a unique ID specific to a process - I'm not fully sure on the details on this since I've never looked at IA64 system programming.
Sorry I'm sounding like an Intel brochure, but it really is quite amazing if your coming from x86 programming background - IA64 is a lot more than doubling data unit sizes. I suggest if you're familiar with assembly programming read the IA64 manual at developer.intel.com.
there are *so* many factors that affect performance its scary ... theres no clear answers in this field ... its so complicated, as part of the coursework for my BS in CS, I had to take a course on performance evaluation!
... Of course, after they're booted the athlon rocks the socks off the celeron ... The answer is: The celeron has a faster hd then the athlon, but can you buy a computer based on HD speed? Could a Best Buy salesman even *tell* you the speed of a machines hd? :)
Case in point: My Celeron 500 boots win2k about 30 seconds faster then my Athlon 1700
Free Techno/Jazz/DNB/MI Music by guys obsessed with monkeys!
...32 more bits.
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
The fact that it's difficult to assess performance (and the fact that computer resellers are generally incompent) is the strongest argument for standardized benchmarking.
Consumers need a body that can give them a number for how quickly a computer will perform a certain kind of task. I know its not going to be terribly precise (or accurate for a specific task), but its going to be better than the current knowledge a consumer has (which is pretty much "How many of MegaHurts's does it have?")
That said, I don't think such a standard will be set, for the reasons in the original post.
Let's not stir that bag of worms...
I am unable to walk and chew gum simultaneously. And when I try to think too hard, I fart.
malloc(-1) != NULL
(that's assuming, of course, that size_t is still 2^32)
HIV Crosses Species Barrier... into Muppets
The main advance will be throwing off twenty years of cruft.
As one example, AFAIK the PIV still uses 20 bit addressing which is mapped to 32 bit by the MMU. This really isn't a big deal in terms of performance, but it is an example of the way the x86 architecture has grown into a rigged-up hack on a hack on a hack.
Another big advantage is (again AFAIK) 128 64 bit general purpose registers . If used in a smart way, these could save boatloads of expensive (in terms of speed) memory ops.
I'm really going out on a limb here, but I think it means a larger instruction set without having to resort to multi-word instructions. Love to hear comments on this from someone who actually knows what he's talking about!
-Peter
Some advantages nobody's touched on yet..
1) Easier implementation of large filesystems. 2^64*512 bytes of disk space per filesystem should be good enough for quite a few increments of Moore's Law. Ditto for larger than 4 gig files.
2) 64 bit processors take better to certain implementations of NUMA. SGI's implementation of NUMA gives each processor a range of memory that is local to that processor. If you had a 64 processor NUMA cluster, you'd have 64 megs local to each processor with 32 bit processors. You could have a few gigs per processor with 64-bit addressing.
3) With 64-bit processors, it's easier to map a file to memory again, without needing to map individual chunks. Over the near term, you could map your entire disk drive to memory space.
4) There are cases (i.e. bit packing) that don't take too well to vectorized MMX/SSE/etc. processing but do take well to 64 bit registers.
5) The ability to segment your memory space without creating annoying limitations. As in, you can have the lower 8,388,608 terabytes of RAM reserved for the user and the upper 8,388,608 terabytes of RAM reserved for the kernel. As opposed to Windows 2k, which leaves 2 gigs for the user and 2 gigs for the kernel. With the possibility of 3 gigs for the user, if you are running a higher-end version.
6) The ability to cache a data structure in the RAM attached to a given machine instead of buying solid state disk drives or other such things.
Gentoo Sucks
Check out specbench.org its pretty much the industry standard for performance benchmarking. They have many types of programs that they test with, dbs, compilers ...etc. The two main categories are floating point and integer tests.
Yes but every time I try to see it your way, I get a headache.
If I remember correctly, even the 68000 processor series used single-word instructions.
The 680x0 processor series (used in Sega Genesis, the slightly modified Genesis that was Neo-Geo, pre-PPC Macs, and Palm OS devices) used a 16-bit instruction word followed by 0 to 4 words of address or data depending on the addressing mode. You must be thinking of Thumb (scaled-down version of ARM instruction set used in Game Boy Advance) or KIPS (scaled-down version of MIPS instruction set used in many undergraduate computer engineering projects)
Will I retire or break 10K?
One of the features of 64-bit processing that I've been eyeing for the last 5-6 years is memory-mapped IO. Instead of manually reading files into memory, it's possible to tell the OS to map a huge mutli-gigabyte file into an address space and then access that address space as if the file was already in memory. The OS can then cache and do really cool optimizations in the background, and the program doesn't have to worry about reading and writing blocks of data one chunk at a time (though, Linux as I understand optimized this too with copy-on-write). When a value changes in that address space, the OS takes care of writing it back to the file.
:).
:).
Of course, 32-bit processors can do memory-mapped IO, but not anywhere close to the scale that 64-bit processors can. Practical limits may constrict a 32-bit memory mapping IO implementation to less than 2GB of address space, though PAE might be able to increase that slightly (not totally sure, but it makes sense if it does
Expect possibly more efficient databases that allow the OS to optimize the disk access even more
dat's a 2560-bit wide data *bus*, the integer size of the Emotion Engine is 64-bit I believe.