Slashdot Mirror


IBM One-Chip Dual Processor Due Next Year

PureFiction writes, "Looks like IBM is going to be scaling processors at the chip-die level. ZDnet has this story about plans for a dual-processor, single-die chip that will operate at upward of 2 gigahertz. It will be called the Power4, will use a .18 micron fab process, and feature on-chip L2 cache (supposedly quite large, though no numbers mentioned), and bus speeds of 500Mhz. I wanna overclock one of these bad boys ..." Better get out your pocketbook, then -- they're slated to power RS/6000 servers rather than consumer PCs, at least for a while. 64 bits, copper interconnects, and plans to move down to a .13 micron fab show that IBM's is thinking long-term. Similar technology may reach your desktop first, though, in products like AMD's Sledgehammer.

12 of 121 comments (clear)

  1. Superscalar vs. on-die SMP by Shaheen · · Score: 3

    When I initially read this, I thought to myself, "Why didn't IBM just do a machine that was super-superscalar?" (Superscalar basically means that the processor takes n instructions at a time, rather than just 1 at a time).

    It would be really interesting to see the results from using on-die SMP versus a chip that is just twice as wide (2n instructions, instead of n).

    Also in question is how the caching is done. Do both cores update the same cache? Or do they operate on separate caches?

    --
    You should never take life too seriously - You'll never get out of it alive.
    1. Re:Superscalar vs. on-die SMP by cperciva · · Score: 3

      When I initially read this, I thought to myself, "Why didn't IBM just do a machine that was super-superscalar?"

      Because of limited instruction level parallelism. Even with a 512 entry reorder window, 256 renaming registers, and a 256-way superscalar architecture, you still won't have ILP beyond about 10 on the gcc component of the spec benchmarks. Furthermore, as you increase the width of a machine, you increase the difficulty of finding all the data dependancies quadratically, since each instruction must be compared with each other instruction. Ultimately it comes down to an issue of decreasing returns, and you find that it is cheaper and faster to run two threads at once than it is to allocated twice as many resources to a single thread.

      As for the question of caching, I'd assume that they share the L2 cache the same way as in any other such system -- they share the bus, write to and read from the same cache, and snoop each other's actions. They of course would have their own internal L1 caches, with lower latency.

  2. Re:Power arch at 500 MHz! by RISCy+Business · · Score: 3

    No, POWER and PowerPC are not finally merging, nor do I think they ever will. The POWER architecture, however, since the POWER3, has fully supported the PowerPC instruction set in 32 and 64 bit implementations.

    Yeah, IBM and Motorola are in bed again. But it's been on again off again for years now. Don't count on it bein a final merging of the two architectures.

    =RISCy Business

  3. Re:What took you all so long ? by Haven · · Score: 3

    What took you all so long ? SMP on a single chip is an obvious advance

    1 terahertz is an obvious advance too. Just because its obvious doesn't make it easier. I'm sure that IBM has had prototypes of dual chips on one die before. They wanted the 7000 series(G4) of the Power PC chips to have a high end model that was 4 processors in the processors core. It is just hard to do. Just like it is hard to write an operating system that will make Non-SMP programs utilize SMP. Windows 2000 has "load-balancing" where it will run processes that are processor intensive on the chip that isn't running the OS.

  4. Overclock? by Haven · · Score: 3

    How would you overclock a "production (by production I mean RS/6000 AS/400 type proprietary machines)" type server? This isn't some BX motherboard with clock speed jumpers. You could "Kryotech" it, but I think there would be vast amounts of cooling already being it 2 chips on one die running at 2 gigahertz even with a .18 micron fabrication.

    Second of all, good luck on coming up with the cash to buy one. Even if where you worked got one they would still keep it under lock and key tighter than Fort Knox (to all you non-US people, Fort Knox is a place owned by the Treasury department where lots of precious metals are stored. It is locked up pretty tight.). I'm a super user for my network at work, and I'm not even allowed near some of the boxes we have.

  5. Re:Already here with current chips? by orz · · Score: 3
    Current chips are superscalar, meaning that they have multiple execution units, but all execution units are working on instructions from the same instruction stream (thread). Complicated hardware analyzes dependancies and tries to translate that single thread into a parrallel mesh of instructions that can be executed simultaneously, but doing that is very difficult, and sometimes impossible.

    This would be different because two threads would be executing simultaneously, so as long as the OS could find two threads that need cpu-time, the hardware would gain a lot of parallelism without having to do more scheduling.

    This approach is good because it offers a way to use the excess die space without requiring too much extra effort from the designers. In the last decade or two the # of transistors per chip has gone up several orders of magnitude, while the # of man-years per chip-designer has not come close to keeping pace. It's also nice because the other common approaches are obviously reaching the point of diminishing return.

    What Compaq is doing is more interesting though... they are processing multiple threads simultaneously... on the same set of execution units! If one thread doesn't have enough parallelism... that's O.K.. The other 7 can pick up the slack!

  6. Better article on Power4 by slyfox · · Score: 3
    There is a good article on Power4 at IBM's web site.

    The article says the system will have 10 GBytes/second of memory bandwidth and a 45 GBytes/second multiprocessor interface. The article estimates the cache sizes as 1.5 MB for the shared on-chip L2, and 32MB for the off-chip L3 cache. Each processor die has 5,500 pins and attach directly to a multi-chip-module (MCM).

    The article also suggests that the system will support up to 32 processors (2 per die x 16), and even more processors using clustering technology.

    Looks like this is going to make for a fast server system.

  7. Power arch at 500 MHz! by Paul+Komarek · · Score: 4

    At one time, not too long ago, the Power 3 architecture was rated (by some) as the second fasted floating point to the Alpha 21264 500MHz. The punchline is that the Power chip was running at 200 MHz!

    In the past, complications with multiprocessor computers has prevented their supremacy of single cpu architectures. I'd love to see IBM succeed with their multicpu chips, as I believe this technology may solve the nagging parallel problems with processor interconnect. And the Power architecture is very nice.

    Does anyone know if the PowerPC and Power architectures will finally become one with this product, as was expected with previous Power revisions? Somehow, I really don't expect to see it ever happen, with the way Motorola and IBM have gotten along.

  8. overclocking by guacamole · · Score: 4

    I wanna overclock one of these bad boys ...

    Enough with overclocking already. This isn't your $70 Celeron toy. When you get to work +$5.000 chips , you are free to overclock them but I doubt it even occurred to anyone to overclock their $9000 UltraSparc cpu or similar. Yep, overclocking is stupid. flame on ..

  9. Re:Starting at 1.1GHz? by Haven · · Score: 4

    "...will operate at upward of 2 gigahertz. It will be called the Power4, will use a .18 micron fab process, and feature on-chip L2 cache (supposedly quite large, though no numbers mentioned), and bus speeds of 500Mhz..."

    Power 4 ::

    2+ gigahertz
    Dual processor on one dies
    500mhz bus
    LARGE L2 cache (I would imagine 2-4mB
    64 bit

    -------------------------------

    x86 CPU's ::

    1+ gigahertz
    One processor on die
    200mhz bus (I don't recall the bus of the willamette)
    512kB-2mB L2 cache
    32 bit

    This not something you will see on Toms Hardware. Clockspeed isn't everything. A 500mhz 21264 DEC Alpha is MUCH faster than a 500mhz PIII. The Power4 is not a desktop processor. Compaq will not ship computers with the Power4 processor in them. People need to understand this! When was the last time you saw a benchmark that was PIII vs. RS/6000? I have only seen it once, and that was the PIII Xeon compared to other server hardware namely from Sun and DEC. That was on Intels site.

  10. interesting details by orz · · Score: 4
    The two processor cores is really cool, and something a lot of people have been hoping for for a long time, although not quite as cool as some of the stuff Compaq/Alpha is doing, but

    This article doesn't mention the most interesting detail I heard about the Power4: They're supposed to come in small rings of about four chips connected by ultra-high frequency 128 bit uni-directional buses that allow multiple chips to share their L2 caches, with fairly intelligent coherency stuff handled in hardware.

    The only bad stuff is that they're really targeting the highend server market, where I want most of that stuff for the low-end too. It's supposed to be 400 mm^2 on a .18 micron process w/ copper, so even after it moves to .13 micron it'll still be too expensive to mainstream use.

    Other tidbits include: 1. It's dropping a few of the more complex instructions from it's instruction set and depending on the OS to emulate them, 2. To simplify instruction scheduling, they're keeping track of packets of instructions instead of individual instructions, and 3. The per chip L2 size is supposed to be 1.5 megabytes.

  11. Explanation - Re:What took you all so long ? by Northern+Hunter · · Score: 5

    > SMP on a single chip is an obvious advance.

    Unfortunately if you multiply the amount of circuitry you are trying to deliver in one fully working device, you cut your yield exponentially. This is a SERIOUS problem if your yields aren't high enough to make the exponential nature a small effect.

    Say on one wafer you have 30 defects bad enough to wreck whatever chip they are on. Now normally you make 100 chips on that wafer. So (first approximations here, I won't actually do the statistics) 70 chips make it, your yield is 70 percent.

    But now you double the size of your chips, so that same wafer now only produces 50. But you still have those same 30 bad defects. Whoops, your yield is now 40 percent. Quadruple the size of your die... Whoops, now you will be lucky to get a handfull of that entire wafer (you're trying to get 25 chips when there are 30 randomly distributed defects... I leave the answer as an excercise for the reader :)

    On the other hand if you do the same rough approximation with only 10 super bad defects per wafer, then you go from a 90 percent yield to an 80 percent yield when doubling the die size. No where near as bad an effect on the economics.

    So, the only reason they are now considering it is that they expect to have defect rates reduced enough to make it reasonably ecomonical.

    -NH

    My apologies for avoiding the statistics and actual mathematics, and my examples above use randomly chosen yields. I have an optoelectronics background that is a few years old, back when production yields at some places for III-V QWH Lasers with simple integration with a few other devices had utterly pathetic yields... Like 10 percent!!