What Improvements Will 64-Bit Processors Bring?
RyanG asks: "Everyone always looks at numbers (MHz, RAM, HD) when they're considering buying a new computer. Recently, more users have been eyeing bits, as in 64-bit processors, namely the Itanium and to a lesser extent the G5. A lot of people remember the performance increases that were seen when moving from 16 to 32-bit processors and some people seem to think similar performance increases will be realized when moving from 32 to 64-bit pocessors. From what I've read this isn't going to be the case given that 64-bit percision isn't needed in all but a few cases and that moving around that extra data can actually hurt the performance of 64-bit processors when compared to 32-bit processors. Anyone care to comment?"
I've been dabbling in 64-bit computing since my business bought me a SiliconGraphics Indigo2 workstation with a MIPS R10000 CPU. Granted the machine could only address 768 MB of RAM (a decent amount for '95... but modern workstations, like the Octane, can address 8 GB or more), but it was quite neat to work with 64-bit pointers, larger datasets, and huge files. For me, the jump to 64-bit computing simply allowed me to work with big numbers without having to revert to tricks and other hacks. For others, it was the introduction of 6 GB datasets in RAM, being able to address each and every element with clean code. These days 64-bit computing is an assumed must on large scale systems (where 64 GB - 1024 GB RAM [on Origin 3000] on a single system [not a cluster] can be considered normal). But on the already-fast, one to two CPU desktop computer, it's not much more than a marketing blurb. And, for most, it's not even too needed, at least until 2+ GB of RAM becomes the norm. 32 bit computing already allows for some pretty darned big pointers and addressing.
Of course you don't need a 64-bit processor for that, SIMD extensions like AltiVec and SSE2 already do better than those by doing it with 128 bit chunks.
Lars T.
To the guy who modded me down from perfect to terrible Karma - Apple haters still suck
Very well said. However, I did want to make one point.
(you gamers who say you can tell the difference between 100 and 125 FPS are lying... that's 1.5 to 2 times your monitor's refresh rate)
I typically can't stand gamers, but I do understand their desire for framerate. The typical PC game has no mechanism for holding a sustained framerate, nor are things like texture preloading handled with any sort of elegance. Most PC games use weak code and brute force to produce any acceptable output. As such, the machine with the highest average framerate is the machine that's the least likely to get into a situation where the framerate will drop below, say, 30 FPS... perhaps in a complex scene or hitting a spot of particularly bad code.
That said, the nitpicking "this 3D accelerator is better because it's 10.2% faster" blurbs are mostly BS.
Then there's the other end of the spectrum. The company I work for has an SGI-powered RealityCenter for engineering review and presentations. The 30-foot-wide screen is curved and lit by three Barco projectors. It's normally driven as either 3840x1024 super-wide using all three projectors, each driven by a graphics pipe. For more complex scenes, three pipes work in parallel to drive just one projector at 1280x1024. Most of our software is created in house with the help of SGI IRISPerformer and MultiGen-Paradigm Vega libraries. Aside from a few exceptions, the whole setup runs at a locked 60 Hz (60 FPS gfx and projector).
For those that like tech specs, the machine behind the curtain is a Silicon Graphics Onyx2 installed in early 1999. It has 24 MIPS R10000 CPUs each with 8 MB of L2 cache and running at 250 MHz. 48 GB RAM and 1.8 TB of disk via four channels of gigabit fibrechannel. The graphics pipes are three InfiniteReality2 subsystems, each with four Raster Managers (64 MB of dedicated texture ram plus 320 MB of generic graphics ram per pipe). There's a DPLEX module on each pipe to allow all three to work in parallel when needed.
If the bean counters approve, we should have a totally new Onyx3000 system installed by June 2002. After all, our current setup is about 3 years old... ancient by computer terms. Thankfully the projectors, lighting controls, and indeed most of the room (seating, conference table loft, etc) will be reused.
Heavy use of ILP is nothing new. Every modern processor does it, but most do it
it implicitly - superscalar, out-of-order
and speculative execution. IA-64 (and EPIC
in general) needs the compiler to explicitly
encode the ILP in the instruction stream. This
is similar to VLIW, and just as difficult. Predication is useful, but note that its not as easy as you might think to do this in the compiler. Naive use of "if-conversion" and
beyond is actually likely to slow down your
code as speed it up; there are people working on compiler algorithms to sort this out.
The module scheduling support you mention is fine too, helps the compiler alright, but again its not as big a win for general purpose codes as you might imagine. There are people working on it, but I'm not holding my breath for it to be solved.
This is not a snub on the very smart people working on it; people like Dan Lavery who did
a dissertation on modulo scheduling of general purpose codes are smart and working hard..but its a DAMN hard problem.
Modern software tends to be irregular, integer oriented and typically not subject to whole program optimization. This makes the compiler and processors jobs even harder..
remember most codes do not dynamically
have that much ILP (although dynamic ILP processors do have limited windows), finding it in a static compiler is even harder.
Looking at the SPEC scores underscores my point (lets not argue right now how good a set
of benchmarks speccpu2k is); fairly good
FP performance (POWER-4 beats it with far less power consumption, smaller die and #transistors and power-4 has 2 processor cores on die!) the int scores are DREADFUL.
Right now IA-64 is a hot, complex, slow beast that depends on a "sufficiently smart compiler". I work on compilers and let me tell you there is no such thing:-)
Sorry for the rambling..I'm tired...
njd@kamayan.net
Some advantages nobody's touched on yet..
1) Easier implementation of large filesystems. 2^64*512 bytes of disk space per filesystem should be good enough for quite a few increments of Moore's Law. Ditto for larger than 4 gig files.
2) 64 bit processors take better to certain implementations of NUMA. SGI's implementation of NUMA gives each processor a range of memory that is local to that processor. If you had a 64 processor NUMA cluster, you'd have 64 megs local to each processor with 32 bit processors. You could have a few gigs per processor with 64-bit addressing.
3) With 64-bit processors, it's easier to map a file to memory again, without needing to map individual chunks. Over the near term, you could map your entire disk drive to memory space.
4) There are cases (i.e. bit packing) that don't take too well to vectorized MMX/SSE/etc. processing but do take well to 64 bit registers.
5) The ability to segment your memory space without creating annoying limitations. As in, you can have the lower 8,388,608 terabytes of RAM reserved for the user and the upper 8,388,608 terabytes of RAM reserved for the kernel. As opposed to Windows 2k, which leaves 2 gigs for the user and 2 gigs for the kernel. With the possibility of 3 gigs for the user, if you are running a higher-end version.
6) The ability to cache a data structure in the RAM attached to a given machine instead of buying solid state disk drives or other such things.
Gentoo Sucks
One of the features of 64-bit processing that I've been eyeing for the last 5-6 years is memory-mapped IO. Instead of manually reading files into memory, it's possible to tell the OS to map a huge mutli-gigabyte file into an address space and then access that address space as if the file was already in memory. The OS can then cache and do really cool optimizations in the background, and the program doesn't have to worry about reading and writing blocks of data one chunk at a time (though, Linux as I understand optimized this too with copy-on-write). When a value changes in that address space, the OS takes care of writing it back to the file.
:).
:).
Of course, 32-bit processors can do memory-mapped IO, but not anywhere close to the scale that 64-bit processors can. Practical limits may constrict a 32-bit memory mapping IO implementation to less than 2GB of address space, though PAE might be able to increase that slightly (not totally sure, but it makes sense if it does
Expect possibly more efficient databases that allow the OS to optimize the disk access even more