Slashdot Mirror


Is IBM's Power4 A Threat To Alpha, Sparc, IA-64?

HiyaPower writes: "There is an interesting discussion here about the IBM Power4 chip. While it is most directly compared with the upcoming Alpha, it also has ramifications for the penetration of the IA-64 and/or Sledgehammer into the server market. Conclusion drawn is that the Alpha, etc., may be in for some very tough sledding. Now if only Apple could be persuaded to use these instead of what the article terms its "embedded controller chips...""

2 of 103 comments (clear)

  1. Re:Processor design... by ToLu+the+Happy+Furby · · Score: 5

    I think the big advantage that VLIW instruction sets will have is strictly architectural, and I'm not sure how IBM's approach fits in yet, but it looks interesting. Throwing more chips at the problem is one approach, but remember that your competitors can do that too, *and* make the chips do more as well...

    Not sure how IBM's approach fits in yet?? Read the article.

    Amongst other things, the POWER4 is *not* VLIW, it's straight-ahead modern RISC at its finest. With massively gigantic buffers, bandwidth and execution resources (8 functional units/core * 8 cores = wow), this chip'll do quite nicely on IPC/core, not to mention combined IPC for all 8. While presumably not quite as elegant, the design for the individual cores bears a lot in common with the archetype of perfect RISC cores, the Alpha 21264, and it has even more aggressive resources.

    Essentially what this means is, assuming this design is as good as it appears, the only way the competition will be able to catch up (without going the way IBM has and deciding on a prohibitively expensive 8-in-one design and packaging) will be through the use of innovative design tricks. The upcoming P4 has a few of those, incidentally, but the big one--and the one the P4 *doesn't* have--appears to be SMT, Simultaneous MultiThreading. Alpha has an 8-way SMT core coming out in a bit, and it ought to compete well with IBM's much more expensive 8-way SMP design here. And AMD appears ready to do 2-way SMT (or something similar) with the Sledgehammer in about 15-18 months. And Sun is rumored to have SMT in the USV design due in several years. But the POWER4 looks to lead in the "big bad" category for quite some time to come.

    (As for Intel's EPIC, the VLIW-like design strategy for their IA-64 chips, at the moment it's looking like a rather poor competitor to SMT. A quick explanation of why:

    There are exactly two ways to make an MPU run faster: 1) increase the clock speed, or 2) increase the IPC (instructions per clock). Unfortunately, the best we've been able to do so far in the IPC realm is about 1.4 IPC on SPEC benchmarks (Alpha EV6x). IPC on a P3 runs about 40% lower. Now, these IPC numbers are despite the fact that the Alpha can theoretically retire 8 instructions/clock, and the P3 5 (5 internal ops, not 5 x86 ops). Furthermore, simulations show that as far as attacking the IPC problem by adding more functional units, we're nearing the point of diminishing returns.

    The problem is, in order to run lots of instructions in parallel, you have to be able to safely extract parallelism from your code. And the problem with this is, you can't run instructions in parallel if they have dependencies, etc. And furthermore, nowadays all this parallelism has to be safely extracted in real-time by special hardware in the MPU itself; this makes your chip more complicated, and means you need to build a big buffer to hold instructions in flight so you can pick and choose which ones you want to run each clock.

    So many many years ago, HP had the idea, which it later sold to Intel (and which wasn't really there idea at all but has indeed been used in DSP chips for years and years), of getting rid of all that complex insruction-level parallel-finding logic on the MPU and doing it all at compile-time instead. This is the basic idea behind EPIC, the philosophy of Intel's IA-64 line.

    It sounds very nice, especially because in theory it means simpler chips (no complicated control logic), and simpler chips means faster chips. Heh heh heh. See it turns out that the amount of instruction-level parallelism which can be safely discovered at compile-time is way way less than the amount that could be found in the chip at run-time (which, as we recall, is too small already). Thus EPIC was modified to allow the compiler to just place "hints" in the code. Well, this means you still need all that complicated control logic back in place, because you still don't have deterministically scheduled instructions. But following the "hints" and other changes to the ISA ends up making everything *more* complicated, not less. This, in a nutshell, is why Itanium is 3 years late, way over budget, unable to meet its very modest clock speed goal of 800 MHz, and fitted with a laugh-enducing 96kb of on-die cache, lower even than the lowliest Celeron: all this added complexity means bigger, slower, more complicated chips that don't have the room for cache or the elegance for high (or even adequate) clock speeds. Plus we have very strong evidence that compiler technology is still not nearly good enough to make the kinds of insightful IPC-giving "hints" which are necessary to even make the damn fool scheme work. Thus the only benchmark Intel has "released" for the Itanium is that of an RSA-encryption--a routine simple enough to be hand-tuned in assembly. Meanwhile they have made the patently ridiculous claim that the SPEC benchmarks--directed precisely at the mid-cost server/workstation market which Itanium is aiming for--are "not relevant" to Itanium's market.

    A completely opposite approach is SMT, which uses a relatively small number of core changes to allow not just instruction-level parallelism to be gleaned, but also thread-level parallelism. In other words, the chip will run several threads in parallel, confident in the fact that their instructions will not have dependencies on each other, and thus be able to use much more of its full execution capabilities. Early indications are that SMT can improve IPC by remarkable amounts, like on the order of 2x the performance on otherwise similar cores!

    Unfortunately, it is too early to tell whether SMT will be as easy a design enhancement as is being claimed. Furthermore I've heard tell that SMT on IA-64 will be a lot more difficult than on a RISC MPU, so Intel could be missing out on a huge speed-up with this technique.)

    However, IBM will have to make sure people design their apps with more than one processor in mind, which will be a Good Idea for the future, since more people might have multiprocessor computers.

    These chips are not to be confused with PowerPC chips. They are server chips only, intended for seriously expensive machines.

  2. "If only Apple could be persuaded to use these..." by ToLu+the+Happy+Furby · · Score: 5

    Heh! If only Apple would use these, the new iMacs wouldn't exactly be quite able to hit their price points. Paul (the author of the article) and some others were involved in a thread over on the tech forum at Ace's about (amongst other things) the expected cost of one of these puppies.

    To quote Paul's response:

    Maybe another way of looking at it is perhaps the price of four POWER4 known good die and the ceramic substrate and metal carrier totals $3000 (although I suspect that a tested and 100% functional ceramic substrate itself might approach or exceed $3000 in cost).

    The real question is the cost of a fully assembled and tested, 100% functional, POWER4 8-way module? After all what are the chances one of these can be reworked if even just one of the 20,000+ solder ball joints was bad?


    So for one of these 8-way on a chip jobs (unsure if they'll be offering 4-way configurations too or if those were just a prototype) it's looking like upwards of $10,000 just for IBM to fab, package, and test the darn things. Add in a system capable of feeding it the tremendous bandwidth it requires to run up to its full potential--8 GB/s to DRAM and a phenomenal 84 (!) GB/s I/O--and...ok, so I know Hemos was just joking when he made that comment about Apple, but you get the idea. These are MPUs you use to fold proteins and run gigantic dynamic-content websites, not surf the web and edit the home video of your kid's elementary school graduation.

    On a related note, man these things oughtta show Intel a thing or two about how to marry clever instruction scheduling to brute-force functional units--forget about Itanium; it's gonna take a several-way McKinley system to even take a swing at this these. And it oughtta show Sun a thing or two about the dangers of resting on the laurels of your marketing success when designing new chips. And, as Paul notes in the article, it really oughtta make Alpha engineers worry that for the first time, having the most elegant design may not guarantee the best performance. Compaq has an 8-way SMT Alpha core on the way as well (EV8); too bad the Alpha group's customary position in the world--stepped on and neglected by their corporate masters--means they haven't got the money or manpower to bring it to market until well after POWER4.