The Battle in 64-bit Land, 2003 and Beyond
An anonymous reader writes "Paul DeMone has an excellent article up at Real World Technologies on the future of 64bit computing. Find out where MIPS, HP, Intel, AMD, Sun, Fujitsu, and IBM are headed."
← Back to Stories (view on slashdot.org)
The article is very detailed on many points, but doesn't seem to have much mention of environmental aspects like heat dissipation. I can remember when this was a big issue with every new CPU, but lately it seems to have been swept under the rug. What's changed?
I'm certainly interested in the speed of CPUs, but heat production in the embedded space happens to be a bigger issue for me.
Could I interest anyone in some toast?
Lousy I/O?
A few weeks ago, I was looking into buying a $35,000 Sun system. I needed a machine with better memory bandwidth than a PC could offer. The machine in question interleaved its memory 8 ways, if you had all of the processers!
Then, I noticed that each bank ran at 75 MHz. Boy, was I shocked. That means that all 8 banks together run at the equivalent of 600 MHz. The new Granite Bay chipsets, with dual DDR 333, give you the equivalent of 666 MHz.
Both systems use PCI to connect to the outside world. The PC has a 533 MHz front-side bus, and an AGP port. I can't think of anywhere that the Sun would have had any better I/O.
Now, when you get into 8-way systems, the I/O between processers is better on the "high end" machines. But before you can come up with more I/O than a modern PC, you have to spend about 6 figures. In other words, two ORDERS OF MAGNITUDE HIGHER!
steve
Oh, you're not stuck, you're just unable to let go of the onion rings.
Double the Intel, Double the Fun
Double the bits
Double the cost
Double the slowness
Double the space requirement
Double the weight
Double the power consumption
Double the heat
Double the failures per unit time
This is one of the effects of and extremely entrenched 32-bit monoculture. A more heterogenous computing universe would evolve more quickly, because there'd be more individual improvements and cross-pollination between areas. Instead of this we have overall hegemony, with self-contained "niche" areas whose contributions aren't typically shared. I'm not really sure whether open-source OSes fully address this problem, but it seems they might help.
Posted AC so as not to seem like a KW...
I presume Digital...Compaq...whoever.. killed it for purely political reasons? Or are there some technical reasons I don't get?
In and of itself, a 64 bit processor with a 64 bit operation system really doesn't mean better performance. You've really got to have application which leverage that kind of platform. And there aren't many. On my SPARC servers (which all have 64 bit CPUs), going from a 32 bit OS to a 64 bit OS so no real improvement or degradation regarding performance in a wide variety of applications. Going 64 bits for most people mean nothing.
The main selling point for SPARC, which most people who aren't dealing with Sun don't understand, is not the CPU itself or the speed of a uniprocessor box.
It is the total package. (Admittedly, the lower part of that is the uniprocessor performance.) On the upside, Sun has some very compelling benefits. Almost all major UNIX programs (commercial) are developed for SPARC, often as the primary development platform. The binary compatibility is awesome. The binary tat I compiled on my workstation (with 5 years old technology that is several CPU generation behind) will containue to run the most modern hardware. There's no recompiling for different/newer architectures (unless you're looking to gain a specific advantage of a new processor and your compiler can do it). And probably one of the best features is an awesome scalability story. If your code does threads, or uses more than a processor at a time, you can scale from a 1 CPU to 100+ CPU configuration. No special programming to worry about clusters or to take advantage of new hardware. Additionally, because the hardware is (majority) single vendor, you gain a great deal of relaibility over platforms which has an incredible amount of diversity (wintel). Okay. That's a double edge sword, admittedly.
That said, it is too bad that Sun just can't keep up in the uniprocessor world. But it has quite a number of real-world advantages beyond performance which keep it afloat, which may surprise people.
There is a good way to decide if the pc outperforms the sun. Write some test programs that stress the systems in the same way as your intended application. I doubt that the pc will win, but if your needs are specific, who knows.
It seems Intel's got a great floating point beast in the Itanium. But is this really that hard to do from a technical stand point?
For example the Power4 can issue 4+1 branch instruction per cycle. If IBM was targetting rendering simulations (BTW with OpenGL2.0 your VPU/GPU will do this instead of you CPU! There is already a plugin for Maya that lets your ATI 9700 do the final rendering instead of yourCPU!) or science work couldn't they simply add additional floating point pipelines to handle 4 instructions per cycle?
It doesn't seem that hard to create a CPU to score well on SpecFP. Just give it lots of bandwidth and FP execution resources. Things like branching and OOOE don't really matter like they do for SpecINT. I know its not that simple, but it seems that a company would find it easier to win SpecFP than SpecINT.
By moving to a 64-bit computer, the address space becomes astronomical - it is 4 billion time larger than the 32-bit addressing space. In the last twenty years, the average amount of memory in a computer has gone from about 512k to 512 megs - it's increased by about a thousand times. At that growth rate, a 64-bit address space would easily last through our lifetimes.
Err... 20 years aren't exactly a "lifetime". What about 3 times that? Whoah, billion times the memory. Also, recall that about 60 years ago, memory was counted in bits.
And there is always possibility for a breakthrough.
I think your prediction is a bit... exaggerated.
The instruction set and register layout is irrelevant. All modern X86 CPUs translate the inctruction stream on-the-fly to an internal RISC-like architecture with multiple parallel execution units. Using register renaming, all modern X86 CPUs have dozens of general-purpose physical registers that can be simultaneously mapped onto the legacy logical registers.
There is no need to expose the internals of any particular CPU generation to the software because the details change with each new design. The CPU's on-the-fly recoding knows how to optimize for the details of its particular internal implementation better than a C compiler. (Exposing the implementation details to the compiler is one reason why I think that the whole Itanium concept is a bad idea in the long run.)
The floating point performance is a function of the target market. If a CPU manufacturer was so inclined, they could create an X86 with world-record FPU performance. It's just not needed for the majority of places where X86's get used today.
sounds like somebody I heard who once said that 64k of memory would be enough memory for any application a person would ever want to use...
for the record, there are a lot of people in the world of biology, radiology, and bioinformatics who would love to have a 128 or 256 or 1024 bit computer. applications like nMRI could then address the individual hydrogen atoms they excite... astronomers could address all of the stars, planets, and meteorites in the sky... historians could address all of the people who have lived in the past and will live in the future... etc. etc. lots of interesting, non-gaming applications become possible with the advent of high-bit processors... (just going to show that Isaac Asimov was way ahead of his time...)
You are correct. This week I was building out a couple compaq^H^H^H^H^H^H hp proliants - 1 64 bit 133mhz pci-x slot, and 2 64 bit 100mhz hotswappable slots. This was on a mere 2 xeon cpu box. The enormous size of the x86 market will push its IO capabilities more quickly than sun can do on its own. With integrated firewire and usb 2.0, soho motherboards need to have great north/south bridge and cpu interconnects.
ostiguy
One thing the article hasn't been updated to mention is that Intel have changed the Itanium roadmap. They will be introducing a dual core processor in 2005 (Montecito), this is no longer a rumour. Intel are playing catchup here, IBM and Sun are already much further along this path. Intel do however have the resources to throw into development to do this successfully, the gains they have made from Itanium-1 to Itanium-2 suggests that catching up is not beyond them.
I wonder how much of the battle for domination in the server market will be decided by economics rather than technology. I suspect that if Intel can kill off AMD (how long can AMD sustain their current losses?) then they could use their dominance in the desktop market to subsidise the development of Itanium and really drive it into the server market, killing off the strugglers like Sun by seriously undercutting them with price/performance. In the long term I think only IBM stands in Intel's way.
One of the charts I think (the bi-modal one) is SPECfp2k on the Y axis and SPECint2K on the X axis.
It is such a shame to see good CPU architechtures die, and crap live on.
The Motorola 68K family were a joy to work with - lots of registers, and a very orthoginal instruction set - you could use any A register for pointers, any D register for data - none of this "ECX is for loops, EDI for destination pointer, ESI for source pointer" crap of the x86.
It's dead now, save for use as a microcontroller.
The Alpha was a ass-kicking, name-taking monster. While I never seriously programmed on it, it was 64 bits long before anybody else knew how to spell it - it had well established software and compiler technology. It is STILL one of the leaders.
But for all intents and purposes, it's dead, Jim. Yet Itanic, with an unproven design concept, is flourishing (sorry, having worked with DSPs that implemented the VLIW idea, I have doubts about the real-world performance of VLIW in a multitasking environment).
As Billy Joel said, "Only the good die young...."
www.eFax.com are spammers
While I agree with the general idea of implementation hiding, if your programming model is siginificantly different from your implementation model, you are going to need extra work or pipeline stages for your instruction decoding/register mapping which will negatively affect your CPU performance.
Not always. You can feed runtime profiling information back into your compiler to optimize branching code and other constructs better than hardware can. This is great for whole classes of applications (such as scientific or engineering simulations). In those cases the C (or Fortran) Compiler definitely knows more about how to optimize for a particular processor than the processor would because it has more resources at its disposal to do so.
On the other hand it's less clear to me how much of a benefit the compiler has in general-purpose computing applications such as databases, MS Office, or windowing system functions where the input is much more random. In those cases, the processor's recent runtime statistics may prove better at branch prediction and instruction re-ordering than
a compiler's prediction on global statistics. I would think each approach has different areas of strength (such as the difference between a global optimizer versus a peephole optimizer).
I wonder, would it be possible to combine the two approaches while keeping most of the EPIC/VLIW goals of low runtime instruction decoding/ordering overhead? For instance, having some way to mark certain branch instructions as having more random statistics over time and requiring the CPU to gather runtime branching statistics to improve branch prediction.
Laissez lire, et laissez danser; ces deux amusements ne feront jamais de mal au monde. - Voltaire
... When will Big Blue buy Sun?
(or is it just too much fun turning the hose on them...?)
Finally someone tells it like it is! Computer architects have known for a LONG time (eg., 10 years) that MIPS and SPARC were horrible architectures (designed by people who clearly misunderstood the whole RISC concept) and that Alpha was a fantastic architecture that got the 801-idea spot on. As IBM Fellow and Turing Award winner John Cocke pointed out, the whole idea was FAST instructions that were simple enough for compilers to generate and optimize. It had nothing at all to do with the number of instruction types or their complexity. Not only was Alpha the first 64-bit architecture, but it's the only one that has legitimately scaled over a 10+ period. While it is a tragedy to see the Alpha die due to incompetent marketing, it is gratifying to finally see an informed article that gives credit where credit is due. Long live the Alpha!
The only thing that the author fails to note is HP's responsibility for the wretched Itanium 1. The first IA64 architecture was designed by HP and Intel in collaboration, and HP was the one who pushed the idiotic EPIC idea.
Unfortunately, none of the current crop of 64 bit processors deliver: the cost of true 64 bit systems (those capable of actually using more than 4 Gbytes of memory) generally starts somewhere upwards of $10000, and for that you do not get anywhere near 10 times the performance of a $1000 PC.
The main reason right now to get a 64 bit system at current prices is because the applications just cannot be shoehorned into a mere 2-4 Gbytes. If AMD can change that equation and deliver comparable bang-for-the-buck to current PCs, with 64 bit addressing being icing on the cake, they have a winner. None of the other players seem to be capable of doing that--they have tried and failed miserably so far.
What about IBM and Power4? What OS (AIX?) and applications run on that platform?
Besides the inhouse IBM OSes (OS/400, or whatver it's called, pOS I think..., and AIX) remember that the PowerPC architecture is a subset of the POWER architecture, therefore a subset of POWER4. At my work we have a test IBM 6xx series box with SuSE PowerPC Linux on it (not going to be used besides testing, not enough third party software really for PowerPC Linux yet). With IBM software already being ported to x86 Linux, I'm sure shortly they'll have a large amount of software soon. I'm assuming NetBSD would run as well without a hitch.
The IO on a server is rarely going to run through an AGP port
No kidding. I just put that to show one of many I/O paths.
The V880 has several PCI busses for all of its PCI slots
I've got a PC with four different PCI busses. You've got to have some pretty serious data to saturate even the "measly" four PCI busses.
Some of the PCI slots are 66MHz 64 bit wide PCI slots
Old news. Those have been around on PC's for a good number of years now. I've got a three-year old, semi-retired PC that has a couple.
How can you possibly saturate that 533MHz FSB on the PC?
Um.... just about any memory-intensive application? Try using an RDBMS where you're potentially scanning through gigabytes of data. (even though that would generally indicate a lack of suitable indexes...)
Try loading up your PC with FCAL adapters, hooking them to smart disk arrays with gigs of write-through cache and see how much IO you can get.
Why would I need an FCAL adapter? I could stick in a couple of U320 SCSI adapters, or any of the hardware RAID controllers, and get just as much through. In either event, the 64/66 PCI bus is going to be the bottleneck, giving you a *sustained* average transfer of around 360-400 MB/s. You can do just as much on a PC. And putting the gigs of RAM into the controller probably isn't as efficient as putting the gigs of RAM into system memory, and letting the OS cache it. If you put it on the controller, your max bandwidth is still limitted by the PCI bus. If it's in the system, then a read (or write) that is cached has all the bandwidth of the system memory, which is going to be a lot faster than the PCI bus.
After all of that, let's look at the cost: A few thousand bucks for the PC I'm talking about (including a nice hardware RAID array), vs. $100,000 to $130,000 for a decked-out V880.
Yep, the V880 is a lot of iron. There aren't any PC's that will keep up with it. I'd expect nothing less, since the V880 you're talking about is at least a FULL ORDER OF MAGNITUDE MORE EXPENSIVE, for crying out loud. If the task is at all parallelizable, then guess what: For the same money as you'd spend on the Sun, you could deck the room out with enough clustered PC's to run circles around the V880. That's why all of the largest supercomputers now are... clusters of PC's. Of course, not all applications are parralelizable, hence the market for Suns, Alphas, IBM's, etc..
There are, of course, other benefits to owning a high-end Sun than just the processing power. Being able to have a motherboard burn out without taking down the system is one of them. Just be prepared to open the wallet and bend over when you buy one. My employer's largest competitor plonked down a couple of *million* dollars on Sun servers. I convinced my employer to plonk down about $25,000 on PC-based servers. While they were still in business, our site could dish out far more database-driven traffic than they could, and had better uptimes to boot. Of course, since they burned up all of their cash early and went out of business while we went on to become profitable, that's sort of a moot point.
steve
Oh, you're not stuck, you're just unable to let go of the onion rings.