Slashdot Mirror


Is IBM's Power4 A Threat To Alpha, Sparc, IA-64?

HiyaPower writes: "There is an interesting discussion here about the IBM Power4 chip. While it is most directly compared with the upcoming Alpha, it also has ramifications for the penetration of the IA-64 and/or Sledgehammer into the server market. Conclusion drawn is that the Alpha, etc., may be in for some very tough sledding. Now if only Apple could be persuaded to use these instead of what the article terms its "embedded controller chips...""

9 of 103 comments (clear)

  1. OS support is the real question by tmu · · Score: 3

    In the long run, we all know that what will count is OS support. IBM has a strong, stated linux strategy, but we'll see where it goes.

    Don't get me wrong, I am one of the few who actually like AIX. I think it's a mature, useful operating system with some really cool characteristics (fairly integrated hw support and debugging, excellent logical volume manager (better than veritas, imho)). nevertheless, it remains to be be seen whether IBM can actually bring Linux to their whole server platform (including these bad boys).

    (There have been instruction set changes in the IBM processor line in the past, particularly between the POWER, POWER2 and POWER3 architectures, so I'm interested to see what the differences in this instruction set are...).

  2. To IBM's credit... by _outcat_ · · Score: 4

    To IBM's credit, myself and some geek friends were at a smallish local tech convention and some IBM guys were there talking about their new nomenclature and such in their server lines. For the really big stuff, the S-390's ( I think pSeries and zSeries but I could be wrong) they run stuff that handles HUGE, HUGE payloads ... run AIX ...(isn't there OS-390 too) but for their middle range stuff, we couldn't get the guys to shut up about Linux. One of our guys mentioned it and the talk was all about that the whole time. "Our customers like it because we don't have to package costly licenses. And it's very, very, very scalable and flexible, we can run it on everything. And it's a UNIX so we can integrate it with AIX ... " on and on...

    So just for that IBM's not a bit bad, and their NUMA-Q architecture looks REALLY neato. As for putting Alpha and Sparc out of business...Hey, you build a better mousetrap. Big Blue has always had great R&D and put out some of the best products out there. That doesn't mean Alpha and Sparc and such are going to plummet.

    I say kudos to Big Blue.

    --
    Angry IT woman in big clompy boots. And talking lint!.
  3. great review, but some nits.... by cabbey · · Score: 4
    • IBM isn't leapfrogging anyone, this isn't a new radical change from their current tech, it's just one of the few times they've actually told the industry what's going on inside. The "tour de force" is NOT the technology it's the fact that you, Joe Consumer, are being given a glimpse of the technology which is "ho-hum" inside IBM.
    • the core features "apple would die for" are just integrating existing IBM architectural feats from the S/390 and AS/400 (er, I meant zSeries and iSeries) architectures... which by the way, are routinely bad mouthed. I enjoy the irony in their being admired this time.
    • regarding the 10 to 12 levels of logic, one other case, that is only hinted at here, but was mentioned earlier, is the support for the old POWER instruction set... software trapping ain't cheap.
    • re the clock speed: let's all remember that it wasn't too long ago that the RISC camp decided to get rid of the gloves and step up to the CISC bigots clock rate == length of manhood competition. Before then lots of RISC machines operated at significantly lower clock speed than CISC machines. i.e. I've got a Power2 that runs at 66Mhz and smokes a Pentium II at 300Mhz for a LOT of stuff. Now that someone has thrown off the gloves and said "ok Intel, we'll see your 1Ghz" things will get REAL interesting in the PowerPC vs x86 world. I've got to thank Digital, er Compaq, for entering into the contest first on this one.
    • nobody said there weren't delays... even IBM isn't THAT good, least of all the managers. But after being through the antitrust crucible there is one sterling rule at IBM, you NEVER announce something UNTILL IT'S DONE - there are LOTS of procedures to ensure that, and lots of managers are employed just to conduct that stuff.
    • if the Alpha does show to be the worst hit by the power4 competition it will be a sad day indeed, Sun on the otherhand.....
    • it already runs Linux; too bad Linux doesn't scale up to as many processors as they'll be putting in systems as well as AIX and OS/400 do. (note, not a troll or a flame, that's FACT that even the kernel folks don't disagree with. Linux DOES NOT handle 24 processors just now... we need to fix this!)
    • as someone else has already noted, to REALLY see the benefit of this you need your applicaiton to be well behaved, and preferably well threaded. this is sadly harder said than done, and the number of skilled engineers capable of writing this type of code at the application level are slim to none because this isn't the kind of stuff that interests applicatiton people. Instead it interests infrastructure types, and when they put out a drop dead gorgeous infrastructure to build your next generation application on, too many idiots refuse to climb the learning curve needed to fully exploit it and the accolades of those who did aren't enough to keep imbecilic pointy haired managers from killing off the infrastructure. (who, me bitter? no....)
    • the site is either slashdotted, or really slow
    1. Re:great review, but some nits.... by Tet · · Score: 3
      There is no explicit limit to the # of CPUs Linux kernel can handle.

      Actually, yes there is. Linux currently uses a bitmask to specify certain CPU operations, so the number of CPUs is limited to the word length. In other words, Linux supports up to 32 CPUs on 32-bit platforms, and up to 64 CPUs on real machines (Sparc64, Alpha, Itanium (and MIPS64?)). Of course the fact that it supports that many CPUs doesn't mean that it scales linearly, but it looks like the 2.4 kernel will be good for at least 16 CPUs before performance starts dropping off. Various people (Dave Miller, Ralf Baechle and others) are working to remove the bitmask, and allow more CPUs than the word length. SGI in particular need Linxu to be able to support more than 64 CPUs for some of their machines.

      --
      "The invisible and the non-existent look very much alike." -- Delos B. McKown
  4. Steve Jobs is as much to blame as others. by mr · · Score: 4

    According to Robert Morgan who runs Apple Recon

    Steve Jobs said in a visit to Motorola
    "It will be great in two years when we arn't using your chips."

    After this statement is when Motorola publicly started calling the PPC line 'embedded'

    How often in YOUR relationships can you walk up to your relationship partners and tell them 'to hell with you, I'll be leaving in 2 years.' and NOT expect said partner to keep giving a damn about you.

    Apple then made the problem WORSE by pubically calling altivec 'the future' and spent hours about how wonderful altivec is. Apple will have a hard time leaving AltiVec with all the statements about how wonderful altivec is.

    Jobs ego put Apple in the place Apple is. Motorola only reacted to the actions Jobs took. It is not like Motorola NEEDS Apple, and took actions to protect Motorola's investment.

    Jobs wants to be the 'saviour' of Apple, fine. Then Jobs must also take the mantle of the person who helps kill Apple also. Amazing how the history of Jobs repeats.

    --
    If it was said on slashdot, it MUST be true!
  5. Re:Processor design... by ToLu+the+Happy+Furby · · Score: 5

    I think the big advantage that VLIW instruction sets will have is strictly architectural, and I'm not sure how IBM's approach fits in yet, but it looks interesting. Throwing more chips at the problem is one approach, but remember that your competitors can do that too, *and* make the chips do more as well...

    Not sure how IBM's approach fits in yet?? Read the article.

    Amongst other things, the POWER4 is *not* VLIW, it's straight-ahead modern RISC at its finest. With massively gigantic buffers, bandwidth and execution resources (8 functional units/core * 8 cores = wow), this chip'll do quite nicely on IPC/core, not to mention combined IPC for all 8. While presumably not quite as elegant, the design for the individual cores bears a lot in common with the archetype of perfect RISC cores, the Alpha 21264, and it has even more aggressive resources.

    Essentially what this means is, assuming this design is as good as it appears, the only way the competition will be able to catch up (without going the way IBM has and deciding on a prohibitively expensive 8-in-one design and packaging) will be through the use of innovative design tricks. The upcoming P4 has a few of those, incidentally, but the big one--and the one the P4 *doesn't* have--appears to be SMT, Simultaneous MultiThreading. Alpha has an 8-way SMT core coming out in a bit, and it ought to compete well with IBM's much more expensive 8-way SMP design here. And AMD appears ready to do 2-way SMT (or something similar) with the Sledgehammer in about 15-18 months. And Sun is rumored to have SMT in the USV design due in several years. But the POWER4 looks to lead in the "big bad" category for quite some time to come.

    (As for Intel's EPIC, the VLIW-like design strategy for their IA-64 chips, at the moment it's looking like a rather poor competitor to SMT. A quick explanation of why:

    There are exactly two ways to make an MPU run faster: 1) increase the clock speed, or 2) increase the IPC (instructions per clock). Unfortunately, the best we've been able to do so far in the IPC realm is about 1.4 IPC on SPEC benchmarks (Alpha EV6x). IPC on a P3 runs about 40% lower. Now, these IPC numbers are despite the fact that the Alpha can theoretically retire 8 instructions/clock, and the P3 5 (5 internal ops, not 5 x86 ops). Furthermore, simulations show that as far as attacking the IPC problem by adding more functional units, we're nearing the point of diminishing returns.

    The problem is, in order to run lots of instructions in parallel, you have to be able to safely extract parallelism from your code. And the problem with this is, you can't run instructions in parallel if they have dependencies, etc. And furthermore, nowadays all this parallelism has to be safely extracted in real-time by special hardware in the MPU itself; this makes your chip more complicated, and means you need to build a big buffer to hold instructions in flight so you can pick and choose which ones you want to run each clock.

    So many many years ago, HP had the idea, which it later sold to Intel (and which wasn't really there idea at all but has indeed been used in DSP chips for years and years), of getting rid of all that complex insruction-level parallel-finding logic on the MPU and doing it all at compile-time instead. This is the basic idea behind EPIC, the philosophy of Intel's IA-64 line.

    It sounds very nice, especially because in theory it means simpler chips (no complicated control logic), and simpler chips means faster chips. Heh heh heh. See it turns out that the amount of instruction-level parallelism which can be safely discovered at compile-time is way way less than the amount that could be found in the chip at run-time (which, as we recall, is too small already). Thus EPIC was modified to allow the compiler to just place "hints" in the code. Well, this means you still need all that complicated control logic back in place, because you still don't have deterministically scheduled instructions. But following the "hints" and other changes to the ISA ends up making everything *more* complicated, not less. This, in a nutshell, is why Itanium is 3 years late, way over budget, unable to meet its very modest clock speed goal of 800 MHz, and fitted with a laugh-enducing 96kb of on-die cache, lower even than the lowliest Celeron: all this added complexity means bigger, slower, more complicated chips that don't have the room for cache or the elegance for high (or even adequate) clock speeds. Plus we have very strong evidence that compiler technology is still not nearly good enough to make the kinds of insightful IPC-giving "hints" which are necessary to even make the damn fool scheme work. Thus the only benchmark Intel has "released" for the Itanium is that of an RSA-encryption--a routine simple enough to be hand-tuned in assembly. Meanwhile they have made the patently ridiculous claim that the SPEC benchmarks--directed precisely at the mid-cost server/workstation market which Itanium is aiming for--are "not relevant" to Itanium's market.

    A completely opposite approach is SMT, which uses a relatively small number of core changes to allow not just instruction-level parallelism to be gleaned, but also thread-level parallelism. In other words, the chip will run several threads in parallel, confident in the fact that their instructions will not have dependencies on each other, and thus be able to use much more of its full execution capabilities. Early indications are that SMT can improve IPC by remarkable amounts, like on the order of 2x the performance on otherwise similar cores!

    Unfortunately, it is too early to tell whether SMT will be as easy a design enhancement as is being claimed. Furthermore I've heard tell that SMT on IA-64 will be a lot more difficult than on a RISC MPU, so Intel could be missing out on a huge speed-up with this technique.)

    However, IBM will have to make sure people design their apps with more than one processor in mind, which will be a Good Idea for the future, since more people might have multiprocessor computers.

    These chips are not to be confused with PowerPC chips. They are server chips only, intended for seriously expensive machines.

  6. Where is AMD's Hammer? by NortonDC · · Score: 3

    The article is very strong, but it would have been enhanced if it also touched on AMD's upcoming 64-bit offerings, the Hammer family of chips.

    Hammer does not have a track record in the marketplace, but neither does Itanium, and it's odd to ignore an architecture that in all likelihood will sell in much greater volume than several of the chips profiled here. Even if AMD's 64-bit implementation turns out less than ideal, it will probably outsell the Power, Alpha and Sparc offerings by virtue of the vastly larger market it targets.

    A simulator for a Hammer chip has been released. A comparison, or at least an acknowledgement, would have made the article more valuable.

  7. Re:Drop Motorolla like a hot potato by Detritus · · Score: 3
    If you want a "real processor", then you better be prepared to cough up "real money".

    PPC chips are optimized for cost, POWER chips are optimized for performance, screw the cost.

    --
    Mea navis aericumbens anguillis abundat
  8. "If only Apple could be persuaded to use these..." by ToLu+the+Happy+Furby · · Score: 5

    Heh! If only Apple would use these, the new iMacs wouldn't exactly be quite able to hit their price points. Paul (the author of the article) and some others were involved in a thread over on the tech forum at Ace's about (amongst other things) the expected cost of one of these puppies.

    To quote Paul's response:

    Maybe another way of looking at it is perhaps the price of four POWER4 known good die and the ceramic substrate and metal carrier totals $3000 (although I suspect that a tested and 100% functional ceramic substrate itself might approach or exceed $3000 in cost).

    The real question is the cost of a fully assembled and tested, 100% functional, POWER4 8-way module? After all what are the chances one of these can be reworked if even just one of the 20,000+ solder ball joints was bad?


    So for one of these 8-way on a chip jobs (unsure if they'll be offering 4-way configurations too or if those were just a prototype) it's looking like upwards of $10,000 just for IBM to fab, package, and test the darn things. Add in a system capable of feeding it the tremendous bandwidth it requires to run up to its full potential--8 GB/s to DRAM and a phenomenal 84 (!) GB/s I/O--and...ok, so I know Hemos was just joking when he made that comment about Apple, but you get the idea. These are MPUs you use to fold proteins and run gigantic dynamic-content websites, not surf the web and edit the home video of your kid's elementary school graduation.

    On a related note, man these things oughtta show Intel a thing or two about how to marry clever instruction scheduling to brute-force functional units--forget about Itanium; it's gonna take a several-way McKinley system to even take a swing at this these. And it oughtta show Sun a thing or two about the dangers of resting on the laurels of your marketing success when designing new chips. And, as Paul notes in the article, it really oughtta make Alpha engineers worry that for the first time, having the most elegant design may not guarantee the best performance. Compaq has an 8-way SMT Alpha core on the way as well (EV8); too bad the Alpha group's customary position in the world--stepped on and neglected by their corporate masters--means they haven't got the money or manpower to bring it to market until well after POWER4.