Posted by
michael
on from the double-the-pleasure-double-the-fun dept.
msolnik writes: "Over at RealWorldTech they've published an article on the future of 64-bit performance. This article covers the different technology from Sparc to Hammer. Its a great read if you are looking for information on up-and-coming products from Intel, AMD, Sun, and Compaq."
AMDs going for a slightly different track, AMD
is the only one trying to put 64-bit on the
desktop. Now for us linux freaks SUSE Linux
and NetBSB will be fine for a 64-bit desktop,
but if AMD want to lock up some of the market
into x86-64, they really need a mainstream OS.
Unfortainately that means Windows, and "if
we build it they will come" doesn't necessary
work if they is no competition. Still in the
mean time, Crawhammer will be a damn fine 32-bit
chip as well, and Sledgehammer will bring
high-end servers right down to mid range prices.
link to Full article
by
mESSDan
·
· Score: 5, Interesting
is here: That way you only have to wait a longass time for it to load once, instead of a longass time for each of the 5 or 6 pages.
--
-- Dan
Compaq and Alphacide
by
Anonymous Coward
·
· Score: 4, Interesting
There is an interesting discussion over in comp.arch on Usenet about
Compaq, Alpha, and the Itanium. The thread is called
Alphacide. Interesting stuff. It appears
that Compaq drank the Koolade.
By the way, Pricewatch is quoting about $3K for the lowend Itaniums running at about 700 Mhz.
No thanks.
Shrinkage
by
CaptainAlbert
·
· Score: 5, Informative
Impressive though 64-bit processors might be, I'm not convinced that the performance improvement is going to be as big as people are expecting.
Remember that the components in any digital system - and I'm not just talking about your windoze desktop PC, but servers, mainframes and embedded systems too - have to talk to each other in order to do anything remotely useful. Last time I looked, most PCI devices din't utilise the provision for 64-bit data bus operation.
There's a perfectly good reason for this, of course... in order to attach a chip to a circuit board, you need an array of pins (or solder balls) that are macroscopic, so they can be soldered and handled without too much risk of accidental damage. Additionally, PCB tracks can only go so small (and so close together) without undesirable electrical effects and again, an inability to work with it in a production environment.
The "more bits" phenomenon has been sustained by improvements in VLSI and the advent of true System-on-a-chip design, but this too has its limits. If you compare a P4 motherboard with, say, a 386 mobo circa 1995, you'll see the chip count is drastically reduced. But fewer interconnected components means less repairability, upgradability, and interoperability. My old 486 had a VLB EIDE hard disk controller, which I swapped in after the last one failed. If my controller failed today, I couldn't do that; I'd either need to buy a new mobo or start replacing chips on the old one (which is just as expensive).
Don't get me wrong - I'm all for progress! And I expect we'll see more and more 64/128-bit chips springing up inside custom devices (e.g. 3D cards, routers) where the local interconnect can be made as fat as necessary. But the PC will remained shackled by slow frontside busses for a while yet, I reckon.
Re:Shrinkage
by
CaptainAlbert
·
· Score: 5, Interesting
> Perhaps your 486 MB was the first of its kind,
> but modern motherboards with integrated devices
> have the ability to disable them so that can be
> replaced by cards in slots.
True, but that presupposes the existence of spare slots;-)
I hear what you're saying about trashable chips, but I think the real phenomenon is the "trashable board". Think about it - if your mobo dies and your warrantee has run out, you go buy a replacement and ditch the old board. If it happens still to be under its manufacturer's warrantee, most likely you just take it back to the shop and swap it for a working one. What happens to the old one? Most likely, they throw it away. The cost of postage, packing, an engineer's time to find the problem, repairs, parts... it's more than the damn thing retails for anyway.
I think this is missing the point anyway. The integration idea goes like this: with today's technology, you could put the equivalent of an early Pentium processor, plus hard disk and graphics controllers, BIOS chipset, etc. onto a single piece of silicon. Pretty much all you'd be left with off-chip would be (a) RAM and (b) I/O circuitry, because they're both harder to integrate. So your computer is about four or five chips. This is approximately the case in palm-tops now.
The point is that you've lost all ability to choose your own components. That graphics block/macrocell has probably been chosen by the manufacturer becuase it was the best value for money (i.e. the cheapest they could find). If you're lucky, they will give you expansion ports so you can plug your own stuff in. But that costs money, and if they think you'll pay for the lesser product then they'll make that instead.
Does it matter? Probably not to the average user. But I think it would matter to the industry. The whole point of having standard architectures like PCI, SCSI, EIDE (and before them, ISA et al.) is that many vendors can compete to produce compatible products, which drives innovation and generally provides a good deal for the consumer.
But if the minimisation continues and the busses become subsumed into the very chips themselves, then the chances are the manufacturers will cut corners. They won't wait for the not-quite-standard-yet SuperBus2005 architecture... they'll design their own and make you buy their proprietary upgrades. Again, the economics work out such that you the consumer probably get a good deal. But trading off good deals today against innovation tomorrow is dangerous.
So, it would be much better to keep all those busses outside the individual components, right? But that's exactly what is keeping the PC architecture slow at the moment (which was the point of my previous post. I think.).
Re:So why do I need 64bits?
by
4im
·
· Score: 5, Interesting
One word: addressing. With those 32 bits,
you can
typically address up to 2 gig files on your
machine - which is a limit easily encountered when
you start working with video, for instance.
It took hacks to get 4 gig of RAM working on x86
with the linux kernel.
Go 64 bit, and that limit vanishes. You keep your
linear addressing, none of those ugly segments like
in the unfamous real-mode of PC-XT times.
I don't see what's really new about it all though,
we've had 64 bit since Alpha, and there's several
64 bit architectures around. It may not be
mainstream yet, but will IA 64 or Hammer really
change that (soon)? Allow me to have doubts.
Intel learning from their mistakes
by
jazzyjez
·
· Score: 5, Insightful
Much as I hate to say it, the Intel McKinley looks like a very well designed piece of kit, and it appears Intel have learned from their mistakes with the P4 by including a big, fast 3-level cache on the McKinley. It's also good to see them reducing their pipeline size, which means it may finally be able to compete with the G4 in terms of efficiency. However, this is of course going to kick them in the teeth in terms of competing on processor speed, which they have been pushing so hard recently in their marketing.
The same can't be said of AMD's offering, although in fairness the Hammer is not directed at the server market unlike the McKinley. The pipeline is longer than both their previous design and the McKinley, which is going to give them a performance hit. We can only hope that their cache is as good as Intel's.
What amazes me is that they can still keep adding instruction extensions without too much of a performance hit. Anyone looked at the latest instruction set documentation for these processors? Eugh! The pain of backwards compatibility...
Re:Intel learning from their mistakes
by
nusuth
·
· Score: 4, Informative
IA64 is an incompatible and new instruction set, intel is not adding anything to their x86 ISA.
Hammer does not have an 3MB L3 but it has an integrated memory controller, that would drastically reduce latencies of cache misses.
Assuming amd will go fro bigger than 32 kb L1 cache, and will not succeed in making cache hits as fast as mckinley (speculation based on current offerings) picture is a bit complicated:
Watch it: hammer and mckinley asks for an instruction/piece of data, both hit, mckinley wins, but a more probable scenerio is mckinley misses and hammer hits - a clear win for hammer, a still more probable scenerio is that both misses. If data is in the L2, mckinley is faster, it has lower miss penalty and can fetch from L2 faster but it is more probable that it is in hammer's cache, but not in mckinley's cache, that would benefit hammer . If L2 misses too, but mckinley scores an L3 hit, mckinley wins, if it suffers from an L3 miss, it has to suffer both L3 miss latency and memory latency, but hammer suffers no L3 miss latency and its memory latency is probably much lower, so with huge data processed in not-so-tight loops hammer wins hands down, while for medium sized data that could fit into L3 mckinley wins hands down.
Although mckinley is a server product and hammer is not (or so it is said), an integrated memory controller benefits hammer in multiway systems so much that it may as well be positioned as a server product. No more asking the chipset to fetch a piece of data and wait until chipset serves other processors' requests, just go and grab it!
Finally, some of the hammer line will have L3 caches and hammer line will have a higher clockrate than mckinley. If Amd can deliver what they have promised, they have a clear winner overall. But I'm still a bit scpetical.
--
Gentlemen, you can't fight in here, this is the War Room!
Now we can wait for software support...
by
green+pizza
·
· Score: 5, Interesting
Once we get the 64-bit hardware, we still have the MMOS (minor matter of software) to worry about.
Cases in point:
Silicon Graphics machines with MIPS R4400 (and up) CPUs were 64-bit, but the additional address and pointer space wern't utilized until IRIX 6.0 in 1994 -- over 18 months later. (And, of course, certain SGIs still run in 32-bit mode due to RAM concerns -- 64-bit requires more RAM -- all Indys, all Indigos, all O2s, and R4400 Indigo2s).
Sun machines with UltraSPARC CPUs were 64-bit, but again, the additional address and pointer space had to wait for software support. (Multi-stage transition to 64-bit, starting with Solaris 2.5 and finally complete with Solaris 7 in 1998).
Then there's application optimization. Many apps can get slight speedups by processing data in larger (say, 38-bit or even 64-bit chunks). Sometimes the difference is huge, many times it's small. But, lots of little speedups can add up across an entire system. Still, someone has to make these changes to apps and compilers. It takes time, testing, and adoption. In better times, SGI did several such overhauls... they got some insane speed out of Netscape Enterprise and Netscape FastTrack web servers during the Everest project. One of their engineers also did some cool (but nonstandard) hacks to Apache, including the very first pure, clean 64-bit port/mod.
Newer, faster, wider, more-torque hardware is always great. But don't forget the software.
Re:Now we can wait for software support...
by
dunstan
·
· Score: 4, Informative
Even with a reference application (oracle 8.1.6) on a reference OS (Solaris 8), the patch levels for the 64 -bit version were 3 revs behind those for the 32-bit version when I last looked. What bothered me was that the bug I'd run into was fixed in the 32-bit version but still there in the 64-bit version. Guess which version I ran.
Dunstan
--
The last scintilla of doubt just rode out of town
Re:AMD's gonna win
by
tap
·
· Score: 4, Informative
SCSI's disconnect ability looks good in theory, but in practice it's not such a great advantage. With SCSI you can attach up to 15 devices to a single channel, and effectively access them all the the same time. With IDE you can attach up to two devices to a single channel, and only access one at a time. Sounds like SCSI is lots better, but only if you have a single IDE/SCSI channel and more than one drive. If you put each IDE drive on a seperate channel, and you can buy IDE controllers with 8 channels, then there really is no advantage to SCSI's disconnect/reconnect ability.
Re:AMD is deceiving you
by
SQL+Error
·
· Score: 5, Informative
Bullshit.
AMD has stated *explicitly* that the Hammer is an evolutionary rather than revolutionary design. They've said all along that it is an Athlon with 64-bit extensions and some minor tweaks (SSE2, extended pipeline). They haven't deceived anyone.
Now, as to the relative performance of the two architectures (x86-64 vs. IA-64): the Athlon XP 1900+ achieves a SpecInt2000 score of 701 (peak) while the 800MHz Itanium manages... 314. On floating point the Itanium does rather better: 645 vs. 634 for the Athlon. (The current leader is the IBM Power4, which gets 814 SpecInt and 1169 SpecFP.)
Having 128 64-bit registers is good, but remember that the Athlon and Hammer have far more physical registers than are presented in the programming model, and automatically map them according to the requirements of instructions in the pipeline. And the predicates and wide issue of the Itanium are balanced against the ability of the Athlon to *automatically* issue instructions speculatively and re-order the instruction queue to improve ILP.
And on the subject of manipulating multiple values with a single instruction: ever heard of MMX? 3DNow? SSE? Athlon has all of these, and Hammer will add SSE2. What do you think these are for?
As to the value of 64-bit addressing: I've programmed for machines (Suns and Compaq Alphas) with as much as 64GB of memory. While you *can* address that much with a 32-bit CPU, it means that you have to constantly re-map your view of memory, which is a royal pain. Moving to 64 bit addressing makes the problem disappear. And with current memory prices, even small commodity servers could make good use of more than 4GB of memory.
And 64-bit integer registers are good for a lot of things, and while you can certainly use 64-bit integers on a 32-bit CPU, making them faster won't hurt.
So, Athlon currently has a huge performance advantage over Itanium on integer apps, and a huge price/performance advantage (with comparable absolute performance) on FP apps. AMD's aim with Hammer is to extend Athlon cheaply and effectively into the 64-bit realm.
Intel's aim with Itanium appears to be to crush all competition; unfortunately, they've placed a *huge* bet on improvements in compiler technology that just hasn't paid off yet, resulting in a high-end chip that lags behind not just the high-end RISC chips like Alpha and Power, but low-cost desktop chips. To achieve commercial success, the Itanium needs integer performance somewhere in the vicinity of their competitors, but they currently trail the pack by a huge margin. Even SGI do better, and they all but shut down their CPU design efforts years ago.
Maybe McKinley will be the answer - but it doesn't look like it, given that the promised speeds have dropped to 1GHz. IA-64 is an interesting architecture which may even have a future, but so far it just don't fly.
Re:AMD is deceiving you
by
SurfsUp
·
· Score: 4, Insightful
You don't have a clue. Let me just pick out a couple of the grossly wrong items...
Why do we need 64bit processors? Addressing? Nah, current processors can address enough space.. with 386 processors FAR addressing was introduced, which expanded allocatable address space drastically. (those silly DS, SS,.. registers) And newest processors can deal with them with same ease as with non-far addressing.
Sheesh, where are you coming from? You can address 64 Gig of physical memory with an x86 now, but you can only address 4 Gig (at most!) of it linearly. 32 bit address registers, get it? Gosh, and far addressing was introduced with 386's was it? Give me a break, try 8086's.
AMD's 64bit solution currently has no real value.. except for huge data storage (could work faster with 64bit data blocks) and probably some heavy encryption. x86-64 compiled Quake3 would make minimum use of 64bit registers.. and would probably be just a margin faster than IA32 compiled version.
Right, and I'm supposed to believe you on this, given your performance above. Um, you seem to have ignored the value of being able to crunch 8 byte integers, or pixels 8 bytes at a time, nicely matching the width of the MMX registers. For starters. Repeat this to yourself: "sledge hammer". "sledge hammer". Good, that's more like it.
Is IA64 better? Yes it is. IA64 has 128 usable 64bit registers, predicates... But that is not all.. in single 64bit register you can store 4 16bit values(common integer). (or 8 8bit or 2 32bit)
Um, and guess how many 16 bit values you can store in a 64 bit sledgehammer register? Ah, and guess how many fp/mms instructions sledge can retire per cycle?
Clawhammer will be better for a year or so.. but soon it will hit the ceiling. Intel will be able to get better performance from 1/2 clocked IA64.
You don't have any idea why it's called itanic, do you. Moderaters, take a look above. Remember, that's what 'random' looks like. Yes, I've got mod points right now. No, I won't waste them on you.
is the only one trying to put 64-bit on the
desktop. Now for us linux freaks SUSE Linux
and NetBSB will be fine for a 64-bit desktop,
but if AMD want to lock up some of the market
into x86-64, they really need a mainstream OS.
Unfortainately that means Windows, and "if
we build it they will come" doesn't necessary
work if they is no competition. Still in the
mean time, Crawhammer will be a damn fine 32-bit
chip as well, and Sledgehammer will bring
high-end servers right down to mid range prices.
is here: That way you only have to wait a longass time for it to load once, instead of a longass time for each of the 5 or 6 pages.
-- Dan
By the way, Pricewatch is quoting about $3K for the lowend Itaniums running at about 700 Mhz. No thanks.
Impressive though 64-bit processors might be, I'm not convinced that the performance improvement is going to be as big as people are expecting.
Remember that the components in any digital system - and I'm not just talking about your windoze desktop PC, but servers, mainframes and embedded systems too - have to talk to each other in order to do anything remotely useful. Last time I looked, most PCI devices din't utilise the provision for 64-bit data bus operation.
There's a perfectly good reason for this, of course... in order to attach a chip to a circuit board, you need an array of pins (or solder balls) that are macroscopic, so they can be soldered and handled without too much risk of accidental damage. Additionally, PCB tracks can only go so small (and so close together) without undesirable electrical effects and again, an inability to work with it in a production environment.
The "more bits" phenomenon has been sustained by improvements in VLSI and the advent of true System-on-a-chip design, but this too has its limits. If you compare a P4 motherboard with, say, a 386 mobo circa 1995, you'll see the chip count is drastically reduced. But fewer interconnected components means less repairability, upgradability, and interoperability. My old 486 had a VLB EIDE hard disk controller, which I swapped in after the last one failed. If my controller failed today, I couldn't do that; I'd either need to buy a new mobo or start replacing chips on the old one (which is just as expensive).
Don't get me wrong - I'm all for progress! And I expect we'll see more and more 64/128-bit chips springing up inside custom devices (e.g. 3D cards, routers) where the local interconnect can be made as fat as necessary. But the PC will remained shackled by slow frontside busses for a while yet, I reckon.
These sigs are more interesting tha
One word: addressing. With those 32 bits, you can typically address up to 2 gig files on your machine - which is a limit easily encountered when you start working with video, for instance.
It took hacks to get 4 gig of RAM working on x86 with the linux kernel.
Go 64 bit, and that limit vanishes. You keep your linear addressing, none of those ugly segments like in the unfamous real-mode of PC-XT times.
I don't see what's really new about it all though, we've had 64 bit since Alpha, and there's several 64 bit architectures around. It may not be mainstream yet, but will IA 64 or Hammer really change that (soon)? Allow me to have doubts.
Much as I hate to say it, the Intel McKinley looks like a very well designed piece of kit, and it appears Intel have learned from their mistakes with the P4 by including a big, fast 3-level cache on the McKinley. It's also good to see them reducing their pipeline size, which means it may finally be able to compete with the G4 in terms of efficiency. However, this is of course going to kick them in the teeth in terms of competing on processor speed, which they have been pushing so hard recently in their marketing.
The same can't be said of AMD's offering, although in fairness the Hammer is not directed at the server market unlike the McKinley. The pipeline is longer than both their previous design and the McKinley, which is going to give them a performance hit. We can only hope that their cache is as good as Intel's.
What amazes me is that they can still keep adding instruction extensions without too much of a performance hit. Anyone looked at the latest instruction set documentation for these processors? Eugh! The pain of backwards compatibility...
Once we get the 64-bit hardware, we still have the MMOS (minor matter of software) to worry about.
Cases in point:
Silicon Graphics machines with MIPS R4400 (and up) CPUs were 64-bit, but the additional address and pointer space wern't utilized until IRIX 6.0 in 1994 -- over 18 months later. (And, of course, certain SGIs still run in 32-bit mode due to RAM concerns -- 64-bit requires more RAM -- all Indys, all Indigos, all O2s, and R4400 Indigo2s).
Sun machines with UltraSPARC CPUs were 64-bit, but again, the additional address and pointer space had to wait for software support. (Multi-stage transition to 64-bit, starting with Solaris 2.5 and finally complete with Solaris 7 in 1998).
Then there's application optimization. Many apps can get slight speedups by processing data in larger (say, 38-bit or even 64-bit chunks). Sometimes the difference is huge, many times it's small. But, lots of little speedups can add up across an entire system. Still, someone has to make these changes to apps and compilers. It takes time, testing, and adoption. In better times, SGI did several such overhauls... they got some insane speed out of Netscape Enterprise and Netscape FastTrack web servers during the Everest project. One of their engineers also did some cool (but nonstandard) hacks to Apache, including the very first pure, clean 64-bit port/mod.
Newer, faster, wider, more-torque hardware is always great. But don't forget the software.
SCSI's disconnect ability looks good in theory, but in practice it's not such a great advantage. With SCSI you can attach up to 15 devices to a single channel, and effectively access them all the the same time. With IDE you can attach up to two devices to a single channel, and only access one at a time. Sounds like SCSI is lots better, but only if you have a single IDE/SCSI channel and more than one drive. If you put each IDE drive on a seperate channel, and you can buy IDE controllers with 8 channels, then there really is no advantage to SCSI's disconnect/reconnect ability.
Bullshit.
AMD has stated *explicitly* that the Hammer is an evolutionary rather than revolutionary design. They've said all along that it is an Athlon with 64-bit extensions and some minor tweaks (SSE2, extended pipeline). They haven't deceived anyone.
Now, as to the relative performance of the two architectures (x86-64 vs. IA-64): the Athlon XP 1900+ achieves a SpecInt2000 score of 701 (peak) while the 800MHz Itanium manages... 314. On floating point the Itanium does rather better: 645 vs. 634 for the Athlon. (The current leader is the IBM Power4, which gets 814 SpecInt and 1169 SpecFP.)
Having 128 64-bit registers is good, but remember that the Athlon and Hammer have far more physical registers than are presented in the programming model, and automatically map them according to the requirements of instructions in the pipeline. And the predicates and wide issue of the Itanium are balanced against the ability of the Athlon to *automatically* issue instructions speculatively and re-order the instruction queue to improve ILP.
And on the subject of manipulating multiple values with a single instruction: ever heard of MMX? 3DNow? SSE? Athlon has all of these, and Hammer will add SSE2. What do you think these are for?
As to the value of 64-bit addressing: I've programmed for machines (Suns and Compaq Alphas) with as much as 64GB of memory. While you *can* address that much with a 32-bit CPU, it means that you have to constantly re-map your view of memory, which is a royal pain. Moving to 64 bit addressing makes the problem disappear. And with current memory prices, even small commodity servers could make good use of more than 4GB of memory.
And 64-bit integer registers are good for a lot of things, and while you can certainly use 64-bit integers on a 32-bit CPU, making them faster won't hurt.
So, Athlon currently has a huge performance advantage over Itanium on integer apps, and a huge price/performance advantage (with comparable absolute performance) on FP apps. AMD's aim with Hammer is to extend Athlon cheaply and effectively into the 64-bit realm.
Intel's aim with Itanium appears to be to crush all competition; unfortunately, they've placed a *huge* bet on improvements in compiler technology that just hasn't paid off yet, resulting in a high-end chip that lags behind not just the high-end RISC chips like Alpha and Power, but low-cost desktop chips. To achieve commercial success, the Itanium needs integer performance somewhere in the vicinity of their competitors, but they currently trail the pack by a huge margin. Even SGI do better, and they all but shut down their CPU design efforts years ago.
Maybe McKinley will be the answer - but it doesn't look like it, given that the promised speeds have dropped to 1GHz. IA-64 is an interesting architecture which may even have a future, but so far it just don't fly.
You don't have a clue. Let me just pick out a couple of the grossly wrong items...
.. registers) And newest processors can deal with them with same ease as with non-far addressing.
Why do we need 64bit processors? Addressing? Nah, current processors can address enough space.. with 386 processors FAR addressing was introduced, which expanded allocatable address space drastically. (those silly DS, SS,
Sheesh, where are you coming from? You can address 64 Gig of physical memory with an x86 now, but you can only address 4 Gig (at most!) of it linearly. 32 bit address registers, get it? Gosh, and far addressing was introduced with 386's was it? Give me a break, try 8086's.
AMD's 64bit solution currently has no real value.. except for huge data storage (could work faster with 64bit data blocks) and probably some heavy encryption. x86-64 compiled Quake3 would make minimum use of 64bit registers.. and would probably be just a margin faster than IA32 compiled version.
Right, and I'm supposed to believe you on this, given your performance above. Um, you seem to have ignored the value of being able to crunch 8 byte integers, or pixels 8 bytes at a time, nicely matching the width of the MMX registers. For starters. Repeat this to yourself: "sledge hammer". "sledge hammer". Good, that's more like it.
Is IA64 better? Yes it is. IA64 has 128 usable 64bit registers, predicates... But that is not all.. in single 64bit register you can store 4 16bit values(common integer). (or 8 8bit or 2 32bit)
Um, and guess how many 16 bit values you can store in a 64 bit sledgehammer register? Ah, and guess how many fp/mms instructions sledge can retire per cycle?
Clawhammer will be better for a year or so.. but soon it will hit the ceiling. Intel will be able to get better performance from 1/2 clocked IA64.
You don't have any idea why it's called itanic, do you. Moderaters, take a look above. Remember, that's what 'random' looks like. Yes, I've got mod points right now. No, I won't waste them on you.
Life's a bitch but somebody's gotta do it.