Slashdot Mirror


IBM One-Chip Dual Processor Due Next Year

PureFiction writes, "Looks like IBM is going to be scaling processors at the chip-die level. ZDnet has this story about plans for a dual-processor, single-die chip that will operate at upward of 2 gigahertz. It will be called the Power4, will use a .18 micron fab process, and feature on-chip L2 cache (supposedly quite large, though no numbers mentioned), and bus speeds of 500Mhz. I wanna overclock one of these bad boys ..." Better get out your pocketbook, then -- they're slated to power RS/6000 servers rather than consumer PCs, at least for a while. 64 bits, copper interconnects, and plans to move down to a .13 micron fab show that IBM's is thinking long-term. Similar technology may reach your desktop first, though, in products like AMD's Sledgehammer.

35 of 121 comments (clear)

  1. Microprocessor Report article on power4 by Anonymous Coward · · Score: 2

    IBM announced power4 at the Hot Chips conference last fall. There is an excellent article in Microprocessor Report detailing the procesor. The report can be found on IBM's website here: http://www.chips.ibm.com/news/1999/microprocessor9 9.pdf

  2. Re:OverClock by Russ+Steffen · · Score: 2

    I once overclocked my watch - first time in my life I have every been early for anything.

  3. What took you all so long ? by Forge · · Score: 2

    You == IBM, iNTEL, AMD etc..

    As for this IBM chip. What took you all so long ? SMP on a single chip is an obvious advance. When you vastly increase the amount of circuits on a chip as happens between a Celeron and a P3 without a matching increase in performance something has to give. Why not make that the number of cores on the chip? I hope this isn't patented because it really is obvious.

    This brings up something I have been thinking about with the Cruise. If you can convert 32 bit instructions to 128 bit meta instructions and have the finished product run as fast as on the genuine 32 bit CPU.

    What if the same technique is applied to an SMP setup in such a way that the software sees the processors as a single CPU. Right now this kind of abstraction is handled by the Operating system and except on the Mainframe that is very inefficient. To the point where 2X400MHz CPUs is a whole lot faster than 4X200MHz.

    Now if the whole thing including say 6 CPUs and 2 Megs of cache were put on a single chip at 500MHz to 2GHz, how fast would it be ? My guess is that this could easily be the fastest low end server or workstation chip by a good margin.

    --
    --= Isn't it surprising how badly I spell ?
    1. Re:What took you all so long ? by buysse · · Score: 2
      To the point where 2X400MHz CPUs is a whole lot faster than 4X200MHz.

      Depends on what you're doing, my boy. If you're running 4 different CPU-hungry jobs, a 4X200 may well be faster than a 2X400 -- assuming everything else about the processors is equal.

      --
      -30-
    2. Re:What took you all so long ? by Haven · · Score: 2

      No, when running a process on a Windows 2000 box such as Quake II that doesn't do SMP, Windows 2000 will put the non-SMP program on its own processor. "Load Balancing"

    3. Re:What took you all so long ? by UnknownSoldier · · Score: 2

      > No, when running a process on a Windows 2000 box such as Quake II that doesn't do SMP, Windows 2000 will put the non-SMP program on its own processor. "Load Balancing"

      That is correct. To prove this is the case, you can set the affinity (which cpu a thread is bound to). Task Manager | Process | Right-click on process | Set affinity.
      (This setting doesn't show up on a single cpu.)

      Another quick way to see this is the case is to start up Quake, and look at the cpu utilization. It will be around 50%, meaning the one cpu is taxed, while the other one isn't doing anything.

      One means of burning in a new dual system is to run 2 copies of Prime95: one on each cpu.
      For fun, I left 2 copies of prime95 and one copy of unreal running overnight. The one prime95 hadn't reached as many calculations as the 2nd one.

      Note: Windows NT runs the OS on both processors. It will not run a non-SMP aware process on both cpu's.

      For anyone looking for a cheap dual system, this is what I did:
      $35 cel/366 o/c to 550
      $140 Abit BP6
      Hard to beat the price !

      Cheers

    4. Re:What took you all so long ? by Haven · · Score: 3

      What took you all so long ? SMP on a single chip is an obvious advance

      1 terahertz is an obvious advance too. Just because its obvious doesn't make it easier. I'm sure that IBM has had prototypes of dual chips on one die before. They wanted the 7000 series(G4) of the Power PC chips to have a high end model that was 4 processors in the processors core. It is just hard to do. Just like it is hard to write an operating system that will make Non-SMP programs utilize SMP. Windows 2000 has "load-balancing" where it will run processes that are processor intensive on the chip that isn't running the OS.

  4. Re:overclocking by Tet · · Score: 2
    Do you know any engineers? They overestimate everything, like Scotty telling Kirk how long repairs will take. You bet that you can run that chip faster than it is rated.

    Yes, you can, if you're prepared to take the risk -- that's the whole basis of overclocking. Chips are rated at the speed the manufacturer can guarantee they'll operate as intended. Say you overclock your chip by 15%. You're now encroaching into the safety margin that the engineers and the manufacturer allowed to be sure that all chips will work correctly. Even so, perhaps 98% of all chips will be OK. Do you want to gamble on whether or not you've got one of the 1 in 50 chips that won't work? Personally, I don't like the odds, particularly when the chips cost as much as this one will...

    --
    "The invisible and the non-existent look very much alike." -- Delos B. McKown
  5. Re:overclocking by Tet · · Score: 2
    Overclocking SMP is NOT suicide [...] What's the risk?

    The risk is both damage to the physical hardware and data corruption. The hardware can easily be replaced when it's a cheap Celeron, but not when it's a dual core IBM Power CPU. The data corruption can't be ignored, though. Don't believe me? Maybe you'd like to hear it directly from someone you might trust.

    --
    "The invisible and the non-existent look very much alike." -- Delos B. McKown
  6. Re:overclocking by Tet · · Score: 2
    Overclocking stupid, eh? Actually, many PowerPC processors overclock quite well, I know from personal experience.

    Maybe they do, maybe they don't. You're missing the point though. If you want faster speeds, go buy faster processors (or more of them). Overclocking is only for those who can't afford to do that. People buying these chips aren't going to fall into that category.

    The other point to consider is that overclocking an SMP system is tantamount to suicide, by all accounts. Now maybe that won't be the case here, because the cores are on the same die, and hence will be affected in exactly the same way, but I don't know enough about it to be sure, and I certainly wouldn't risk it.

    --
    "The invisible and the non-existent look very much alike." -- Delos B. McKown
  7. Q: How do I Overclock my Light Bulb? by Guppy · · Score: 2

    I've been trying to overclock my lightbulb, and I thought I'd ask you gurus on Slashdot for some pointers. My bulb says "60W" on it, and I want to get it up to 75 or 100.

    • I'm having a heck of a time getting the heatsink to stay put, it keeps sliding off the top of the bulb. Any suggestions?
    • Microsoft Lightswitch keeps crashing. Do I need to up the voltage to keep it stable? I've got a 220V line that I could try plugging it into.
    • My lightbulb is currently running at 60 Hz. I've heard that when you increase the frequency, the lightbulb will start emmitting ultraviolet or even X-rays. My friends tell me I can protect myself by painting the lightbulb black. I need to know how many coats of paint to use, please help!!!
  8. Not consumer level, thats for certain. by Mr.+Flibble · · Score: 2

    I suspect that it will be some time before this technology ends up in consumer PC's. The fact that its meant for servers aside, most stuff is not coded to support multi-threading.

    Sure, *nix is, BeOS, and NT (2000) are, but the majority of people still run 9X on their desktops.

    Quake 3 and Unreal Tournament support SMP, but there are few consumer level applications that support it. Apparently BeOS can force multithreading, and this is cool, but what we really need are more apps that can take advantage of paralell calculations. Even Carmack states that dual processors running Q3A only increases performance in the most demanding situations.

    Even the guys who maintain the Beowulf-How-to (someone is going to post this...) say that paralell computing is great for crunching data, well, IN PARALELL. Quake is not paralell. Clock speed matters more in 3d shooters than overall crunching power (Unless you *like* a slideshow.)

    Don't get me wrong, I personally would love to have a machine running either Linux or BSD with one of these things in it (or many) but I don't know what the hell I would do with it.

    Until then I will stick with a BP6 and dual-celerons, heck, maybe flip-chips or the new Jalapeno's from VIA/Cyrix.

    I think that this is the way of the future, but we won't see it on the desktop for at least 5 years. (IMHO)

    --
    Try to hack my 31337 firewall!
  9. I wonder... by Graymalkin · · Score: 2

    If Motorola plans to incorporate this into their PPC lines. Taje the 604e for example, from what I understand of its architecture it could have easily been made to do two-chips-on-one-die. I would sorta like to see chips of this caliber in the next generation or so of Mac servers, maybe even non-Mac PPC systems (Linux, BeOS). The benefits of SMP over supercalar is that SMP allows you to have multiple superscalar processing units, if a processor can do n number of processes with a single superscalar processing unit then with SMP it can so xn processes where x is the number of processors. Most people know this already. What really interests me is the high bus speed. Intel and AMD's offerings may be nice for server platforms because of their price but they would get their asses chomped off by the sheer system speed from the Power4. I'm sick of hearing about the Athlon's EV6 bus, the memory (read the entire system besides the processor) only runs at 100mhz. IIRC AMD is going to be using DDR SDRAM with the Sledgehammer to boost its overall system performance and the system will clock at 133, I would still rather have a 500mhz system bus.

    --
    I'm a loner Dottie, a Rebel.
  10. Re:How about hundreds of small processors in one d by Graymalkin · · Score: 2

    The problem there lies in the large datasets, if you were running 16bit code it would be fine but for many applications today (games, graphics, voice recognition, encryption/decryption, ect.) you need more than 16 bits. If you had to emulate 2^n bits higher than 4 you'd have major system slowdown. Having a bunch of identical cores would mean they would need to be small. Small cores mean they won't have the space to have optimized cores. Todays chips have highly optimized cores, like AltiVec that can handle large data sets at high speeds. It's like with Rambus memory, they have really high frequencies but a teeny tiny data bus which means they have lots of latency, sometimes faster is more valuable.

    --
    I'm a loner Dottie, a Rebel.
  11. Re:overclocking by Azog · · Score: 2

    Overclocking SMP is NOT suicide. I know several people with overclocked dual Celerons that work fine. And why not? They are cheap, if one burns up, you throw it away and get another one.

    Heck, throw them away every four months and upgrade anyway. Celerons are cheap as dirt, and when overclocked, are as fast as far more expensive P-III's.

    What's the risk?


    Torrey Hoffman (Azog)

    --
    Torrey Hoffman (Azog)
    "HTML needs a rant tag" - Alan Cox
  12. Re:overclocking by akey · · Score: 2

    Enough with overclocking already. This isn't your $70 Celeron toy. When you get to work +$5.000 chips , you are free to overclock them but I doubt it even occurred to anyone to overclock their $9000 UltraSparc cpu or similar. Yep, overclocking is stupid. flame on ..

    Acutally, when I used to work in Ross (used to manufacture CPUs for Suns) in their modules lab, one of the things that we routinely did was to overclock the CPUs (not to mention other nasty little tricks involving soldering, cutting traces on the MB with an exacto knife, etc.). Mostly it's just a matter of providing proper heat sinks and air circulation. So it did actually occur to at least someone. :-) But you're right in that no serious business customer is going to overclock their high-end workstations and risk invalidating the warranty.

    --

    ---
    "Go Metallica. Die RIAA." -- Linus Torvalds
  13. Enough with the cynicism! This is desktop tech! by xtal · · Score: 2

    Hey guys, are we quick to forget history? The more people get up and proclaim that a given technology is too expensive / not needed / 640k is enough for the desktop, someone goes and proves them flat wrong.

    One of two things happens: Consumer technology just blows away these so called "elite" chips, (anyone want to compare one of those "elite" Alpha 150Mhz machines - once a VERY expensive minicomputer chip - with a 1GHz consumer athlon?). The other is that "poof", it appears.

    There are issues with semiconductor yields as people mentioned preivously. But with celerons going for $70, it won't be too long before someone figures out how to do it cheaply.

    Ahhh, SMP on chip. Long way from the 6502 babyee :)

    Kudos

    --
    ..don't panic
  14. Re:Two sets of register files by be-fan · · Score: 2

    No, its two complete cores. And what do you mean the second proc would be sitting there unused? The stuff that this proc is going to be used for is highly parallel. Even most media stuff is parallel. Load BeOS up on a dual proc box and run a few media apps. You'll see that both procs have pretty high utilization.

    --
    A deep unwavering belief is a sure sign you're missing something...
  15. Re:Performance by be-fan · · Score: 2

    Still wrong. It does process a stream of instructions, but that is exactly what a thread is! Whats to say that it can't process 2 streams of instructions? The guy above is still wrong, the POWER4 is two chips, but multiple threads CAN be done on the proc level. I think (don't quote me, I read it a long time ago on /.) that the Sun MAJC can process two threads. It goes like this. If one thread is say an OpenGL transform thread, while the other is a rasterization thread, whats to keep the transform thread from using the fp units while the raster threads uses the integer units? Or two transform threads sharing 4 fp units? 3D in general is hidously parallel. Again, I'm not quite sure, but I think someone is working on a multithreaded open gl implementation that uses multiple threads. Seriously, though, it makes sense. Whats to stop one proc from doing the matrix multiplys on verticies 1-1000, while the other does it on 1001-2000?

    --
    A deep unwavering belief is a sure sign you're missing something...
  16. Re:Superscalar vs. on-die SMP by orz · · Score: 2
    1. Each core in the Power4 is very superscalar, possibly more-so than any processor shipping today.

    2. I don't think that such a test (superscalar vs. SMP) would be usefull, as the results would be very, very, VERY heavily influenced by the multi-threadedness (or lack thereof) of the benchmarks, and any two processors available will have enough other differences in architecture to invalidate the tests.

    3. Both cores have small (16 or 32 k, I think) L1 caches, but share a large (1.5M or 2M) L2 cache. Furthermore, several chips share L2s via a ring-arrangement of uni-directional 128-bit 500 Mhrz buses, moving things around such that all cached data exists in the L2 of the chip that most recently accessed it, and in no other L2.

  17. On the Desktop? by fcd · · Score: 2

    You will see this technology on the desktop. Beyond the fact that the Power Series is related to the PowerPC series (IBM uses both in their RS/6000 series), multiple cores has been on the PowerPC Roadmap for a while. (Yes I know that is a rumors site. I have seen something similar on Motorola's site I believe, but can't find it right now). Yeah I know the info is a little out of date...but its just a matter of time.

  18. Re:Starting at 1.1GHz? by The+Variable+Man · · Score: 2

    The really interesting design feature of this architecture is that the chips work very well in SMP. 4 chips can be placed together each rotated through 90 degrees so that their fast interconnects align.

  19. OverClock by Nastard · · Score: 2

    I wanna overclock one of these bad boys ...

    Always someone willing to ruin good hardware. Is there *anything* you people wont overclock?

  20. Alpha has similar plans for long while now. by Kernel+Corndog · · Score: 2

    I just so happens I was visiting alphalinux.org today and saw Compaq has "just released" a document detailing the Alpha 21364 EV7 SMP on-chip processor. However this document has been out since I believe the October 1998(?) Microprocessor Forum. However, IBM's proposed 2 GHz at 500MHz FSB is quite intrigueing. I know... I know... Compaq seemes to be letting the Alpha wilt away on it's once strong vine but I'm still rooting for it. I remember when Alpha had reached 600MHz and Intel/x86 were sputtering along at half the speed. It wasn't until after the settlement between Digital and Intel did x86 start speeding up. Hmmm...anyone else smell fish? Well here's hoping that the Alpha can bring itself back to it's glory as speed king. And hopefully before the Merced/Itanium "Marchitecture" infects the corporate world.

  21. Superscalar vs. on-die SMP by Shaheen · · Score: 3

    When I initially read this, I thought to myself, "Why didn't IBM just do a machine that was super-superscalar?" (Superscalar basically means that the processor takes n instructions at a time, rather than just 1 at a time).

    It would be really interesting to see the results from using on-die SMP versus a chip that is just twice as wide (2n instructions, instead of n).

    Also in question is how the caching is done. Do both cores update the same cache? Or do they operate on separate caches?

    --
    You should never take life too seriously - You'll never get out of it alive.
    1. Re:Superscalar vs. on-die SMP by cperciva · · Score: 3

      When I initially read this, I thought to myself, "Why didn't IBM just do a machine that was super-superscalar?"

      Because of limited instruction level parallelism. Even with a 512 entry reorder window, 256 renaming registers, and a 256-way superscalar architecture, you still won't have ILP beyond about 10 on the gcc component of the spec benchmarks. Furthermore, as you increase the width of a machine, you increase the difficulty of finding all the data dependancies quadratically, since each instruction must be compared with each other instruction. Ultimately it comes down to an issue of decreasing returns, and you find that it is cheaper and faster to run two threads at once than it is to allocated twice as many resources to a single thread.

      As for the question of caching, I'd assume that they share the L2 cache the same way as in any other such system -- they share the bus, write to and read from the same cache, and snoop each other's actions. They of course would have their own internal L1 caches, with lower latency.

  22. Re:Power arch at 500 MHz! by RISCy+Business · · Score: 3

    No, POWER and PowerPC are not finally merging, nor do I think they ever will. The POWER architecture, however, since the POWER3, has fully supported the PowerPC instruction set in 32 and 64 bit implementations.

    Yeah, IBM and Motorola are in bed again. But it's been on again off again for years now. Don't count on it bein a final merging of the two architectures.

    =RISCy Business

  23. Overclock? by Haven · · Score: 3

    How would you overclock a "production (by production I mean RS/6000 AS/400 type proprietary machines)" type server? This isn't some BX motherboard with clock speed jumpers. You could "Kryotech" it, but I think there would be vast amounts of cooling already being it 2 chips on one die running at 2 gigahertz even with a .18 micron fabrication.

    Second of all, good luck on coming up with the cash to buy one. Even if where you worked got one they would still keep it under lock and key tighter than Fort Knox (to all you non-US people, Fort Knox is a place owned by the Treasury department where lots of precious metals are stored. It is locked up pretty tight.). I'm a super user for my network at work, and I'm not even allowed near some of the boxes we have.

  24. Re:Already here with current chips? by orz · · Score: 3
    Current chips are superscalar, meaning that they have multiple execution units, but all execution units are working on instructions from the same instruction stream (thread). Complicated hardware analyzes dependancies and tries to translate that single thread into a parrallel mesh of instructions that can be executed simultaneously, but doing that is very difficult, and sometimes impossible.

    This would be different because two threads would be executing simultaneously, so as long as the OS could find two threads that need cpu-time, the hardware would gain a lot of parallelism without having to do more scheduling.

    This approach is good because it offers a way to use the excess die space without requiring too much extra effort from the designers. In the last decade or two the # of transistors per chip has gone up several orders of magnitude, while the # of man-years per chip-designer has not come close to keeping pace. It's also nice because the other common approaches are obviously reaching the point of diminishing return.

    What Compaq is doing is more interesting though... they are processing multiple threads simultaneously... on the same set of execution units! If one thread doesn't have enough parallelism... that's O.K.. The other 7 can pick up the slack!

  25. Better article on Power4 by slyfox · · Score: 3
    There is a good article on Power4 at IBM's web site.

    The article says the system will have 10 GBytes/second of memory bandwidth and a 45 GBytes/second multiprocessor interface. The article estimates the cache sizes as 1.5 MB for the shared on-chip L2, and 32MB for the off-chip L3 cache. Each processor die has 5,500 pins and attach directly to a multi-chip-module (MCM).

    The article also suggests that the system will support up to 32 processors (2 per die x 16), and even more processors using clustering technology.

    Looks like this is going to make for a fast server system.

  26. Power arch at 500 MHz! by Paul+Komarek · · Score: 4

    At one time, not too long ago, the Power 3 architecture was rated (by some) as the second fasted floating point to the Alpha 21264 500MHz. The punchline is that the Power chip was running at 200 MHz!

    In the past, complications with multiprocessor computers has prevented their supremacy of single cpu architectures. I'd love to see IBM succeed with their multicpu chips, as I believe this technology may solve the nagging parallel problems with processor interconnect. And the Power architecture is very nice.

    Does anyone know if the PowerPC and Power architectures will finally become one with this product, as was expected with previous Power revisions? Somehow, I really don't expect to see it ever happen, with the way Motorola and IBM have gotten along.

  27. overclocking by guacamole · · Score: 4

    I wanna overclock one of these bad boys ...

    Enough with overclocking already. This isn't your $70 Celeron toy. When you get to work +$5.000 chips , you are free to overclock them but I doubt it even occurred to anyone to overclock their $9000 UltraSparc cpu or similar. Yep, overclocking is stupid. flame on ..

  28. Re:Starting at 1.1GHz? by Haven · · Score: 4

    "...will operate at upward of 2 gigahertz. It will be called the Power4, will use a .18 micron fab process, and feature on-chip L2 cache (supposedly quite large, though no numbers mentioned), and bus speeds of 500Mhz..."

    Power 4 ::

    2+ gigahertz
    Dual processor on one dies
    500mhz bus
    LARGE L2 cache (I would imagine 2-4mB
    64 bit

    -------------------------------

    x86 CPU's ::

    1+ gigahertz
    One processor on die
    200mhz bus (I don't recall the bus of the willamette)
    512kB-2mB L2 cache
    32 bit

    This not something you will see on Toms Hardware. Clockspeed isn't everything. A 500mhz 21264 DEC Alpha is MUCH faster than a 500mhz PIII. The Power4 is not a desktop processor. Compaq will not ship computers with the Power4 processor in them. People need to understand this! When was the last time you saw a benchmark that was PIII vs. RS/6000? I have only seen it once, and that was the PIII Xeon compared to other server hardware namely from Sun and DEC. That was on Intels site.

  29. interesting details by orz · · Score: 4
    The two processor cores is really cool, and something a lot of people have been hoping for for a long time, although not quite as cool as some of the stuff Compaq/Alpha is doing, but

    This article doesn't mention the most interesting detail I heard about the Power4: They're supposed to come in small rings of about four chips connected by ultra-high frequency 128 bit uni-directional buses that allow multiple chips to share their L2 caches, with fairly intelligent coherency stuff handled in hardware.

    The only bad stuff is that they're really targeting the highend server market, where I want most of that stuff for the low-end too. It's supposed to be 400 mm^2 on a .18 micron process w/ copper, so even after it moves to .13 micron it'll still be too expensive to mainstream use.

    Other tidbits include: 1. It's dropping a few of the more complex instructions from it's instruction set and depending on the OS to emulate them, 2. To simplify instruction scheduling, they're keeping track of packets of instructions instead of individual instructions, and 3. The per chip L2 size is supposed to be 1.5 megabytes.

  30. Explanation - Re:What took you all so long ? by Northern+Hunter · · Score: 5

    > SMP on a single chip is an obvious advance.

    Unfortunately if you multiply the amount of circuitry you are trying to deliver in one fully working device, you cut your yield exponentially. This is a SERIOUS problem if your yields aren't high enough to make the exponential nature a small effect.

    Say on one wafer you have 30 defects bad enough to wreck whatever chip they are on. Now normally you make 100 chips on that wafer. So (first approximations here, I won't actually do the statistics) 70 chips make it, your yield is 70 percent.

    But now you double the size of your chips, so that same wafer now only produces 50. But you still have those same 30 bad defects. Whoops, your yield is now 40 percent. Quadruple the size of your die... Whoops, now you will be lucky to get a handfull of that entire wafer (you're trying to get 25 chips when there are 30 randomly distributed defects... I leave the answer as an excercise for the reader :)

    On the other hand if you do the same rough approximation with only 10 super bad defects per wafer, then you go from a 90 percent yield to an 80 percent yield when doubling the die size. No where near as bad an effect on the economics.

    So, the only reason they are now considering it is that they expect to have defect rates reduced enough to make it reasonably ecomonical.

    -NH

    My apologies for avoiding the statistics and actual mathematics, and my examples above use randomly chosen yields. I have an optoelectronics background that is a few years old, back when production yields at some places for III-V QWH Lasers with simple integration with a few other devices had utterly pathetic yields... Like 10 percent!!