Posted by
michael
on from the double-the-pleasure-double-the-fun dept.
msolnik writes: "Over at RealWorldTech they've published an article on the future of 64-bit performance. This article covers the different technology from Sparc to Hammer. Its a great read if you are looking for information on up-and-coming products from Intel, AMD, Sun, and Compaq."
Shrinkage
by
CaptainAlbert
·
· Score: 5, Informative
Impressive though 64-bit processors might be, I'm not convinced that the performance improvement is going to be as big as people are expecting.
Remember that the components in any digital system - and I'm not just talking about your windoze desktop PC, but servers, mainframes and embedded systems too - have to talk to each other in order to do anything remotely useful. Last time I looked, most PCI devices din't utilise the provision for 64-bit data bus operation.
There's a perfectly good reason for this, of course... in order to attach a chip to a circuit board, you need an array of pins (or solder balls) that are macroscopic, so they can be soldered and handled without too much risk of accidental damage. Additionally, PCB tracks can only go so small (and so close together) without undesirable electrical effects and again, an inability to work with it in a production environment.
The "more bits" phenomenon has been sustained by improvements in VLSI and the advent of true System-on-a-chip design, but this too has its limits. If you compare a P4 motherboard with, say, a 386 mobo circa 1995, you'll see the chip count is drastically reduced. But fewer interconnected components means less repairability, upgradability, and interoperability. My old 486 had a VLB EIDE hard disk controller, which I swapped in after the last one failed. If my controller failed today, I couldn't do that; I'd either need to buy a new mobo or start replacing chips on the old one (which is just as expensive).
Don't get me wrong - I'm all for progress! And I expect we'll see more and more 64/128-bit chips springing up inside custom devices (e.g. 3D cards, routers) where the local interconnect can be made as fat as necessary. But the PC will remained shackled by slow frontside busses for a while yet, I reckon.
Re:Intel learning from their mistakes
by
nusuth
·
· Score: 4, Informative
IA64 is an incompatible and new instruction set, intel is not adding anything to their x86 ISA.
Hammer does not have an 3MB L3 but it has an integrated memory controller, that would drastically reduce latencies of cache misses.
Assuming amd will go fro bigger than 32 kb L1 cache, and will not succeed in making cache hits as fast as mckinley (speculation based on current offerings) picture is a bit complicated:
Watch it: hammer and mckinley asks for an instruction/piece of data, both hit, mckinley wins, but a more probable scenerio is mckinley misses and hammer hits - a clear win for hammer, a still more probable scenerio is that both misses. If data is in the L2, mckinley is faster, it has lower miss penalty and can fetch from L2 faster but it is more probable that it is in hammer's cache, but not in mckinley's cache, that would benefit hammer . If L2 misses too, but mckinley scores an L3 hit, mckinley wins, if it suffers from an L3 miss, it has to suffer both L3 miss latency and memory latency, but hammer suffers no L3 miss latency and its memory latency is probably much lower, so with huge data processed in not-so-tight loops hammer wins hands down, while for medium sized data that could fit into L3 mckinley wins hands down.
Although mckinley is a server product and hammer is not (or so it is said), an integrated memory controller benefits hammer in multiway systems so much that it may as well be positioned as a server product. No more asking the chipset to fetch a piece of data and wait until chipset serves other processors' requests, just go and grab it!
Finally, some of the hammer line will have L3 caches and hammer line will have a higher clockrate than mckinley. If Amd can deliver what they have promised, they have a clear winner overall. But I'm still a bit scpetical.
--
Gentlemen, you can't fight in here, this is the War Room!
Re:Now we can wait for software support...
by
dunstan
·
· Score: 4, Informative
Even with a reference application (oracle 8.1.6) on a reference OS (Solaris 8), the patch levels for the 64 -bit version were 3 revs behind those for the 32-bit version when I last looked. What bothered me was that the bug I'd run into was fixed in the 32-bit version but still there in the 64-bit version. Guess which version I ran.
Dunstan
--
The last scintilla of doubt just rode out of town
Re:AMD's gonna win
by
tap
·
· Score: 4, Informative
SCSI's disconnect ability looks good in theory, but in practice it's not such a great advantage. With SCSI you can attach up to 15 devices to a single channel, and effectively access them all the the same time. With IDE you can attach up to two devices to a single channel, and only access one at a time. Sounds like SCSI is lots better, but only if you have a single IDE/SCSI channel and more than one drive. If you put each IDE drive on a seperate channel, and you can buy IDE controllers with 8 channels, then there really is no advantage to SCSI's disconnect/reconnect ability.
Re:AMD is deceiving you
by
SQL+Error
·
· Score: 5, Informative
Bullshit.
AMD has stated *explicitly* that the Hammer is an evolutionary rather than revolutionary design. They've said all along that it is an Athlon with 64-bit extensions and some minor tweaks (SSE2, extended pipeline). They haven't deceived anyone.
Now, as to the relative performance of the two architectures (x86-64 vs. IA-64): the Athlon XP 1900+ achieves a SpecInt2000 score of 701 (peak) while the 800MHz Itanium manages... 314. On floating point the Itanium does rather better: 645 vs. 634 for the Athlon. (The current leader is the IBM Power4, which gets 814 SpecInt and 1169 SpecFP.)
Having 128 64-bit registers is good, but remember that the Athlon and Hammer have far more physical registers than are presented in the programming model, and automatically map them according to the requirements of instructions in the pipeline. And the predicates and wide issue of the Itanium are balanced against the ability of the Athlon to *automatically* issue instructions speculatively and re-order the instruction queue to improve ILP.
And on the subject of manipulating multiple values with a single instruction: ever heard of MMX? 3DNow? SSE? Athlon has all of these, and Hammer will add SSE2. What do you think these are for?
As to the value of 64-bit addressing: I've programmed for machines (Suns and Compaq Alphas) with as much as 64GB of memory. While you *can* address that much with a 32-bit CPU, it means that you have to constantly re-map your view of memory, which is a royal pain. Moving to 64 bit addressing makes the problem disappear. And with current memory prices, even small commodity servers could make good use of more than 4GB of memory.
And 64-bit integer registers are good for a lot of things, and while you can certainly use 64-bit integers on a 32-bit CPU, making them faster won't hurt.
So, Athlon currently has a huge performance advantage over Itanium on integer apps, and a huge price/performance advantage (with comparable absolute performance) on FP apps. AMD's aim with Hammer is to extend Athlon cheaply and effectively into the 64-bit realm.
Intel's aim with Itanium appears to be to crush all competition; unfortunately, they've placed a *huge* bet on improvements in compiler technology that just hasn't paid off yet, resulting in a high-end chip that lags behind not just the high-end RISC chips like Alpha and Power, but low-cost desktop chips. To achieve commercial success, the Itanium needs integer performance somewhere in the vicinity of their competitors, but they currently trail the pack by a huge margin. Even SGI do better, and they all but shut down their CPU design efforts years ago.
Maybe McKinley will be the answer - but it doesn't look like it, given that the promised speeds have dropped to 1GHz. IA-64 is an interesting architecture which may even have a future, but so far it just don't fly.
Impressive though 64-bit processors might be, I'm not convinced that the performance improvement is going to be as big as people are expecting.
Remember that the components in any digital system - and I'm not just talking about your windoze desktop PC, but servers, mainframes and embedded systems too - have to talk to each other in order to do anything remotely useful. Last time I looked, most PCI devices din't utilise the provision for 64-bit data bus operation.
There's a perfectly good reason for this, of course... in order to attach a chip to a circuit board, you need an array of pins (or solder balls) that are macroscopic, so they can be soldered and handled without too much risk of accidental damage. Additionally, PCB tracks can only go so small (and so close together) without undesirable electrical effects and again, an inability to work with it in a production environment.
The "more bits" phenomenon has been sustained by improvements in VLSI and the advent of true System-on-a-chip design, but this too has its limits. If you compare a P4 motherboard with, say, a 386 mobo circa 1995, you'll see the chip count is drastically reduced. But fewer interconnected components means less repairability, upgradability, and interoperability. My old 486 had a VLB EIDE hard disk controller, which I swapped in after the last one failed. If my controller failed today, I couldn't do that; I'd either need to buy a new mobo or start replacing chips on the old one (which is just as expensive).
Don't get me wrong - I'm all for progress! And I expect we'll see more and more 64/128-bit chips springing up inside custom devices (e.g. 3D cards, routers) where the local interconnect can be made as fat as necessary. But the PC will remained shackled by slow frontside busses for a while yet, I reckon.
These sigs are more interesting tha
Hammer does not have an 3MB L3 but it has an integrated memory controller, that would drastically reduce latencies of cache misses.
Assuming amd will go fro bigger than 32 kb L1 cache, and will not succeed in making cache hits as fast as mckinley (speculation based on current offerings) picture is a bit complicated:
Watch it: hammer and mckinley asks for an instruction/piece of data, both hit, mckinley wins, but a more probable scenerio is mckinley misses and hammer hits - a clear win for hammer, a still more probable scenerio is that both misses. If data is in the L2, mckinley is faster, it has lower miss penalty and can fetch from L2 faster but it is more probable that it is in hammer's cache, but not in mckinley's cache, that would benefit hammer . If L2 misses too, but mckinley scores an L3 hit, mckinley wins, if it suffers from an L3 miss, it has to suffer both L3 miss latency and memory latency, but hammer suffers no L3 miss latency and its memory latency is probably much lower, so with huge data processed in not-so-tight loops hammer wins hands down, while for medium sized data that could fit into L3 mckinley wins hands down.
Although mckinley is a server product and hammer is not (or so it is said), an integrated memory controller benefits hammer in multiway systems so much that it may as well be positioned as a server product. No more asking the chipset to fetch a piece of data and wait until chipset serves other processors' requests, just go and grab it!
Finally, some of the hammer line will have L3 caches and hammer line will have a higher clockrate than mckinley. If Amd can deliver what they have promised, they have a clear winner overall. But I'm still a bit scpetical.
Gentlemen, you can't fight in here, this is the War Room!
Even with a reference application (oracle 8.1.6) on a reference OS (Solaris 8), the patch levels for the 64 -bit version were 3 revs behind those for the 32-bit version when I last looked. What bothered me was that the bug I'd run into was fixed in the 32-bit version but still there in the 64-bit version. Guess which version I ran.
Dunstan
The last scintilla of doubt just rode out of town
SCSI's disconnect ability looks good in theory, but in practice it's not such a great advantage. With SCSI you can attach up to 15 devices to a single channel, and effectively access them all the the same time. With IDE you can attach up to two devices to a single channel, and only access one at a time. Sounds like SCSI is lots better, but only if you have a single IDE/SCSI channel and more than one drive. If you put each IDE drive on a seperate channel, and you can buy IDE controllers with 8 channels, then there really is no advantage to SCSI's disconnect/reconnect ability.
Bullshit.
AMD has stated *explicitly* that the Hammer is an evolutionary rather than revolutionary design. They've said all along that it is an Athlon with 64-bit extensions and some minor tweaks (SSE2, extended pipeline). They haven't deceived anyone.
Now, as to the relative performance of the two architectures (x86-64 vs. IA-64): the Athlon XP 1900+ achieves a SpecInt2000 score of 701 (peak) while the 800MHz Itanium manages... 314. On floating point the Itanium does rather better: 645 vs. 634 for the Athlon. (The current leader is the IBM Power4, which gets 814 SpecInt and 1169 SpecFP.)
Having 128 64-bit registers is good, but remember that the Athlon and Hammer have far more physical registers than are presented in the programming model, and automatically map them according to the requirements of instructions in the pipeline. And the predicates and wide issue of the Itanium are balanced against the ability of the Athlon to *automatically* issue instructions speculatively and re-order the instruction queue to improve ILP.
And on the subject of manipulating multiple values with a single instruction: ever heard of MMX? 3DNow? SSE? Athlon has all of these, and Hammer will add SSE2. What do you think these are for?
As to the value of 64-bit addressing: I've programmed for machines (Suns and Compaq Alphas) with as much as 64GB of memory. While you *can* address that much with a 32-bit CPU, it means that you have to constantly re-map your view of memory, which is a royal pain. Moving to 64 bit addressing makes the problem disappear. And with current memory prices, even small commodity servers could make good use of more than 4GB of memory.
And 64-bit integer registers are good for a lot of things, and while you can certainly use 64-bit integers on a 32-bit CPU, making them faster won't hurt.
So, Athlon currently has a huge performance advantage over Itanium on integer apps, and a huge price/performance advantage (with comparable absolute performance) on FP apps. AMD's aim with Hammer is to extend Athlon cheaply and effectively into the 64-bit realm.
Intel's aim with Itanium appears to be to crush all competition; unfortunately, they've placed a *huge* bet on improvements in compiler technology that just hasn't paid off yet, resulting in a high-end chip that lags behind not just the high-end RISC chips like Alpha and Power, but low-cost desktop chips. To achieve commercial success, the Itanium needs integer performance somewhere in the vicinity of their competitors, but they currently trail the pack by a huge margin. Even SGI do better, and they all but shut down their CPU design efforts years ago.
Maybe McKinley will be the answer - but it doesn't look like it, given that the promised speeds have dropped to 1GHz. IA-64 is an interesting architecture which may even have a future, but so far it just don't fly.