Slashdot Mirror


IBM One-Chip Dual Processor Due Next Year

PureFiction writes, "Looks like IBM is going to be scaling processors at the chip-die level. ZDnet has this story about plans for a dual-processor, single-die chip that will operate at upward of 2 gigahertz. It will be called the Power4, will use a .18 micron fab process, and feature on-chip L2 cache (supposedly quite large, though no numbers mentioned), and bus speeds of 500Mhz. I wanna overclock one of these bad boys ..." Better get out your pocketbook, then -- they're slated to power RS/6000 servers rather than consumer PCs, at least for a while. 64 bits, copper interconnects, and plans to move down to a .13 micron fab show that IBM's is thinking long-term. Similar technology may reach your desktop first, though, in products like AMD's Sledgehammer.

121 comments

  1. Re:Starting at 1.1GHz? by Anonymous Coward · · Score: 0

    Sorry but your specs are wrong too ! 1+ gigahertz (2 Ghz is later, it starts at 1.1 in 2001) Dual processor on one dies 100GB/sec sustainable interprocessor &L2 connect 35GB/sec sustainable interchip connect 10GB/sec sustainable connect to L3 & main memory 1.5MB L2 32MB L3 64 bit ------------------------------- x86 CPU's :: 1+ gigahertz (this year, should be 1.5 to 2 Ghz by the time the Power4 launch) One processor on die 400mhz bus (not 200 Mhz) 512kB-2mB L2 cache (most probably 1 MB, but the Forster will have up to 4 MB since it is a Willamette for servers) 32 bit Telling the Mhz rating for busses is largely usless considering that alot of the busses on the RS/6Ks are ridiculusly wide by PC standards (128-512 bit) The rates listed for bandwith are sustainable, and I doubt an x86 box could reach 100GB/sec even for a burst on any bus. And as for processor speed, a Power3 at 200mhz has about the same SPEC FP performance as a Alpha 21264 at 600Mhz, and they're both 3-4 times what any x86 scores. what do you think a dual processor Power4 at 1-2Gigahertz can do?

  2. Re:overclocking by Anonymous Coward · · Score: 0

    Considering that the AS/RS boxes have a service processor to initialize and monitor the state of the system, convincing the SP to allow overclocking would be quite an, uhm, challenge. So much for you overclocking peoples, yippee skippee..

  3. Re:Starting at 1.1GHz? by Anonymous Coward · · Score: 0

    The wavefront bus looks pretty interesting also..I take it you saw mention of the rotated cores in the Microprocessor Report? That was one good article.

  4. Re:old news by Anonymous Coward · · Score: 0
    Well aside from the fact that this is a recycled story:

    I would like to know the clearly subtle distinctions between flame, flamebait, and troll.

    Cheers

  5. Sun has similar plans for long while now. by Anonymous Coward · · Score: 0

    Where is Sun in all this discussion?

    Seems like AMD, Compaq, IBM & Intel are in the race. But Sun (continues to) get the "tail end charlie" award from The Microprocessor Report (12/99).

    And Sun's SMPs aren't even competitive with the Compaq X86 boxes any more. They have a 12gbyte/s memory subsystem in the UE10000 split between 64 processors, which is a max of 200 mbytes a second (assuming you bought the bandwidth to make use of it). While the compaqs can support 400-800 mbytes/s per processor. And compaq seems to have licked the scaling problem (largest tpc/c numbers ever and by a large margin - http://www.tpc.org).

    It certainly shows up in the benchmarks. Hard to believe that the P3 800s have better floating point performance now than everything but the alphas barely (http://www.specbench.org). They leave Sun in the dirt. But this seems to be what happens when an industry is on a faster Moore's law curve than the earlier more proprietary (lower volume and less aggregate wealth for the industry's participants) incumbants.

    I can't wait to see the Sparc3. Sun started talking about it 5 years ago, so it must be dynamite. 100 specbase95 fp&int or so I'd guess if it's going to keep up with IBM & Intel. And if IBM can do 100gbyte/s to cache for the power4 next year, next year's Sparc3 must be able to do 200gbytes/s. Maybe then Sun can catch up on "computing power per cubic inch". Looks like compaq & ibm are regularly trading this title.

    Neat stuff - in a few years anyone will be able to afford & run an amazon.com from their home.

    Ari

  6. Re:Starting at 1.1GHz? Keep dreaming. by Anonymous Coward · · Score: 0
    I'd be interested in where you're getting those SPECs, last time I checked their website the 575 MHz Alpha 21264 was getting 30.1INT and 44.8FP. And I doubt the same chip runing 25mhz faster would see that big a jump in it's FP performance unless they made some fairly major architecture changes. And I don't consider 100GB/s an unrealistic number for sustained bandwidth between processors and L2 cache considering the theoretical peak burts rate is more than five times that.
    And the Power3 is build on .18um process. how do you think they are going to get to 1Ghz? on the same process. they will need to do less per cycle. which means less performance.
    I think you're REALLY oversimplifing here, there are so many things that determine a process other than minimum geometry that I can't even think of where to start.
  7. Re:overclocking by Anonymous Coward · · Score: 0

    Actually, my SMP system is overclocked just fine... It's a real mongrel too. One processor is a normal PII/450. The other is an overclocked and rewired Celeron 300A. Current uptime of 30 days. I've got 3 other overclocked systems too, tho no other SMP. (On a side note, I have burned out a Celeron before... The smell is awful...)

  8. More on Power3 is PowerPC by Anonymous Coward · · Score: 0

    About a month ago, I downloaded a Microprocessor Reports pdf article on the upcoming Power4. Apparently, starting with the POWER3, the Power series had dumped the old POWER instruction set in favor of the PowerPc ISA.

    From what I gathered, the POWER ISA consisted of extra instructions that were not part of the PowerPC ISA (I think the PowerPC 601 also had the extra POWER instructions). In order to run legacy apps, software traps the old POWER instructions that are not included in the PowerPC ISA and translates them to their PowerPC equivalents.

    Theoretically, I think this means that you could run MacOSX (or BeOS or LinuxPPC) on a Power4 system. Of course, you would need to have the proper drivers and other hardware interfaces...

  9. Re:Fort Knox, phsa! by Anonymous Coward · · Score: 0

    Yeah, Goldfinger took it.... oh wait, he was foiled by James Bond...

  10. Apple needs to MacOS X the RS/6000 line! by Anonymous Coward · · Score: 0

    If Apple would stand behind CHRP to the extent that IBM has, then they would kick ass in the server OS market. Can you imagine an application server that allows you the leverage all the MacOS X APIs on one of these dual processor RS/6000s?

  11. Yeah, _L2 cache_ runs at 500 MHz, not the Power 4 by Anonymous Coward · · Score: 0

    which will, as the guy previously stated, run up to 2 GHz.

  12. Re:Starting at 1.1GHz? by Anonymous Coward · · Score: 0

    I'm not impressed. Sure Intel and AMD have had trouble even getting up to 1GHz, but the power4 should be compared to the upcoming Willamette cpus, which are going to be starting at 1.3GHz -- and that is in the middle of *this* year, not the second half of next year when the power4 will be shipping. Sure it will be cool to have two processors within a single die, sure it will be cool to have a 500MHz bus... but the article makes it sound like the clock speed will be something really great, while in fact it is a little disappointing.

    You also have to realize that the Power series supplies a lot more oomph per MHz, especially in server applications. The way the processor handles branching is particularly telling. Intel's CPUs will throw away any instructions in the pipeline when it hits a branch, where the Power series will take both ends of a branch and determine what needs to be thrown away. The use of multithreading is also more efficient.

    Dr. Frank Soltis is one of the designers of the CPU (as well as the architecture of the AS/400) and there are some articles of his at as400network.com (registration required; not sure whether or not it's free).

  13. Re:Power arch at 500 MHz! by Anonymous Coward · · Score: 0

    No, POWER and PowerPC are not finally merging, nor do I think they ever will. The POWER architecture, however, since the POWER3, has fully supported the PowerPC instruction set in 32 and 64 bit implementations.

    The Power series was done a lot for the benefit of the AS/400 in the early 90s. A 32-bit RISC architecture wouldn't present much of an improvement over the 48-bit CISC architecture, so they added the 64-bitedness to it and called it Power AS

    The interesting thing is that the AS/400 has been updated with an environment which lets AIX applications be hosted natively on an AS/400 using the Power 32-bit mode. Cool.

  14. Re:Power3 is PowerPC by Anonymous Coward · · Score: 0

    I asked this question at an IBM seminar in the fall. The answer was "yes, the POWER and PowerPC are merged into POWER4".

  15. Re:Network cards by Anonymous Coward · · Score: 0

    actually we once underclocked ethernet cards to get signal thru long and poor quality cable. :)

  16. Re:Starting at 1.1GHz? by Anonymous Coward · · Score: 0

    The clock rate itself doesn't convey information about CPU performance. CPU's today are pipelined. A processor clock cycle is only the time needed to complete a standard pipeline stage. A given instruction might have the same execution time on different architectures, but end up taking a different number of clock cycles. Now when companies report that an add operation only takes one clock cycle, that's assuming you have a filled pipeline. An actual add operation will take many clock cycles, but since each stage of a pipeline can work on a different instruction, you get the net effect of having an add operation take 1 clock cycle.

    But here's the big catch: this only happens when you can fill the pipeline! Filling a pipeline is very very hard! Why? Because of data dependencies between instructions and conditional branches.

    Data dependencies force dependant operations to wait for a previous operation to complete. In other words, the pipeline is stalled. And if you have a conditional branch, you won't know which instructions to load after the branch is executed until that branch is evaluated. That stalls the instruction fetch pipeline. A data dependency is encountered once every 5 instructions or so! And what happens if you end up having to wait for a memory load or store operation to complete? Even cache memory take 15 or so cycles to access. And RAM access takes hundreds of clock cycles!

    The main reason that the PowerPC is lagging on the clock rate front is that the pipelines aren't very deep. But given the fact that you have a stall every 5 instructions, you get hardly any gain from a deep pipeline! Sure, you can brag about having high clock rates with a deep pipeline. And you don't have to make your transistors run any faster.

    Take a simple example: suppose you start with a pipeline of 4 cycles for every instruction. And lets say your processor runs at 500Mhz. But you end up stalling on a data dependancy every 5 cycles. Now suppose you increase the pipeline depth to 8 cycles by just subdividing the previous cycles in half. Now you can easily run your processor at 1Ghz. But you still can't fit more than 4 instructions in that 8 cycle pipeline because of data dependancy stall. Now you still end up with 4 instructions executed in the same amount of time. Sure, there is speculative execution and branch prediction. But they don't predict very well. And the techniques have ceased to get any better.

    Now even if you legitimately increase the performance of your electronics, there are still massive roadblocks to real performance improvements. Processor speeds have been increasing exponentially while memory performance (main and cache) has only been increasing at an average of 7% yearly, if that. These days, a processor spends most of its time waiting for I/O. Memory operations are hundreds of times slower than computation. That's why people use games as a benchmarks. Normal programs are so memory dependant that processor speed gains are nearly unnoticeable.

    That's why IBM has invested so much in this single-chip multiprocessor technology. You want to be able to break a program into threads, so that each thread doesn't have local data dependance with each other. And those threads run with their own set of registers on different processors. That way you don't have to spend as much time waiting for loads and stores. No one has yet found a way to make memory fast enough.

  17. Interesting, however... by Anonymous Coward · · Score: 0

    This story was submitted by "PureFiction"? Doesn't help the valdity

  18. Already here with current chips? by Anonymous Coward · · Score: 0

    Don't the current chips have multiple instruction pipelines already?

    1. Re:Already here with current chips? by Deflatamouse! · · Score: 1

      Number of man-years per chip stays relatively constant simply because we are not redesigning everything from scratch. Usually new chips are just a slight modification of the old architecture, i.e. elimination of bottlenecks, addition of newer advanced microarchitectural techniques, etc.

      Of course there comes a time when everything needs to be redesigned, and that usually take a lot longer, for example, the Intel Willamette chip... how long has that chip been in development? The last chip that Willamette team designed was the original Pentiums... that was almost 5-6 years ago! (if not longer...)

      We are seeing many chips coming out in short periods of time because both Intel and AMD have multiple development teams that leapfrog each other to release new CPUs. Intel, for example, has at least 3 teams, the Itanium team, the Willamette team, and the Pentium {II, III} team.

    2. Re:Already here with current chips? by orz · · Score: 3
      Current chips are superscalar, meaning that they have multiple execution units, but all execution units are working on instructions from the same instruction stream (thread). Complicated hardware analyzes dependancies and tries to translate that single thread into a parrallel mesh of instructions that can be executed simultaneously, but doing that is very difficult, and sometimes impossible.

      This would be different because two threads would be executing simultaneously, so as long as the OS could find two threads that need cpu-time, the hardware would gain a lot of parallelism without having to do more scheduling.

      This approach is good because it offers a way to use the excess die space without requiring too much extra effort from the designers. In the last decade or two the # of transistors per chip has gone up several orders of magnitude, while the # of man-years per chip-designer has not come close to keeping pace. It's also nice because the other common approaches are obviously reaching the point of diminishing return.

      What Compaq is doing is more interesting though... they are processing multiple threads simultaneously... on the same set of execution units! If one thread doesn't have enough parallelism... that's O.K.. The other 7 can pick up the slack!

  19. Re:What took you all so long ? by Anonymous Coward · · Score: 0

    The scenario you use as an example is HIGHLY cache architecture sensitive.

  20. Re:Starting at 1.1GHz? by Anonymous Coward · · Score: 0

    There is simply no way to push a CISC design to a clockrate that AMD and Intel have. A CISC ISA no longer implies a CISC implementation.

    RISC systems had better CPI, *and* higher clockspeeds, yes, but they also require more instructions. Of course, none of this directly applies to modern architectures except perhaps for marketing purposes.

  21. Re:Network cards by Anonymous Coward · · Score: 0

    Man, what I really want to overclock is the battery in my laptop. =)

  22. Can't sleep, clown will eat me by Anonymous Coward · · Score: 0

    The clown has lots of money (don't forget that John Wayne Gacy was wealthy). He will be able to buy these chips and overclock them, then run algorithms finding solutions to Hamiltonian Cycle Problems that will lead him to me. I have already disassociated myself with all former friends to eliminate his finding me through solving Clique Problems with his Sun workstations. Once he reads on Slashdot that Perl 5.0631459265358979 is released he will search the internet for all traces of me. My grandfather died last year... he was a member of the Masonic Lodge and my father did not want the Masons to put their satanic shroud on my grandfather's body, but they were able to get close enough to the casket to collect his DNA. I am sure they are working with the clown to use technology developed by Scientologists to clone my grandfather and learn more about me. In case you don't think I am serious, consider that my father was abducted by aliens, who must also be working with the clown, in 1962. This was witnessed by state troopers of a state I will not mention because it would give too many details to the clown. I should stop now, this is giving away too many details and I am very tired.

  23. Re:Starting at 1.1GHz? Keep dreaming. by Anonymous Coward · · Score: 0

    The Power4 designed for commercial applications.
    and its memory bandwidth is quite good.

    but you are dreaming if you think 100GB/s is
    sustained.
    (this number is the L2 cache bandwidth)

    and the power4 will start at ~1Ghz in .18um in 2001.
    (I don't even think they will meet this).

    Just take a look at power2. before it came out IBM
    was saying it would do 500Mhz.
    when it did come out it was at 200Mhz.

    and the 375Mhz (not 200mhz) power3 does not
    have the FP performance of a 600Mhz 21264.

    21264 667Mhz 40 SpecINT95 83 SpecFP95
    21264 600Mhz 35 SpecINT95 74 SpecFP95
    Power3 400Mhz 24 SpecINT95 50 SpecFP95

    And the Power3 is build on .18um process.
    how do you think they are going to get to 1Ghz?
    on the same process. they will need to do
    less per cycle. which means less performance.

    My guess is that each CPU in Power4 will be
    less powerfull than Power3.

    This is not that bad when you are doing
    transaction processing where memory
    latency/bandwidth is a more important factor in
    overall performance.

  24. as stated... by Anonymous Coward · · Score: 0

    its not for the consumer.

  25. Re:On the Desktop? by Anonymous Coward · · Score: 0

    That info is over a year old and not at all reliable.

  26. Re:overclocking by Anonymous Coward · · Score: 0

    Overclocking stupid, eh? Actually, many PowerPC processors overclock quite well, I know from personal experience. My 220 mhz rated G3 is running at 337 Mhz (thats right a 65% increase) right now, and has been, stabily, for over 6 months. You might think the heat would be casusing things to melt but... suprise, it only runs at 31 C (87 F).

  27. Re:Network cards by Anonymous Coward · · Score: 0

    There are analog sides to this as well. Doubling the clock speed won't solve everything. Certain components will need to be redesigned to accomodate this change, and you will most likely get a lot of CRC errors when you transfer between these modified cards.. (assuming it even works)

  28. old news by gammatron · · Score: 0
    http://slashdot.org/articles/99 /08/05/2051234.shtml



    August 5, 1999. Same frikkin story. Thanks, guys.

    YAWN
    --

  29. Damnit.. by cryptonix · · Score: 0

    who in the hell needs that kind of power?
    RAM IS THE ANSWER!*#
    yeah rdram is great, if you feel like selling your soul or your liver to buy any of it let alone the system to go w/ it. for gods sake, give the consumers something they can afford for once.

    1. Re:Damnit.. by cperciva · · Score: 1

      who in the hell needs that kind of power?

      Anyone who runs a website which gets mentioned on /. of course.

  30. Re:What took you all so long ? by Anonymous Coward · · Score: 1

    "Windows 2000 has "load-balancing" where it will run processes that are processor intensive on the chip that isn't running the OS."

    Either this shows your fundamental misunderstanding of how SMP is implemented in modern OSes, or W2K's SMP is where Linux 2.0.x's was (Megalock. Only one process in kernel at a time). Judging from the benchmark scores, I'd put a lot of money on option A.

    BTW, SMP OSes don't call that "load-balancing" they call it scheduling.

  31. I'm violating an NDA here but... by Anonymous Coward · · Score: 1

    I saw a VLSI layout of one of these puppies about a year ago. It's one of things where you've got to do a double-take. One side was a mirror image of the other. At first I thought it was an old POWER mirrored for double-redundant mission critical stuff. But then I noticed the linewidth...

    Hiding before IBM lawyers get to me...

  32. Re:Power arch at 500 MHz! Correction by Paul+Komarek · · Score: 1

    Heck, I just noticed it was the bus running at 500MHz, with the cpu better than double this. Now I'm really impressed!

    -Paul Komarek

  33. Re:Fort Knox, phsa! by Eccles · · Score: 1

    Yeah, Goldfinger took it.... oh wait, he was foiled by James Bond...

    Actually, Goldfinger tried to irradiate it and make it unusable, there was simply too much to carry away.

    But there's still 140 million ounces of gold at Fort Knox according to the U.S. Mint's web site.

    --
    Ooh, a sarcasm detector. Oh, that's a real useful invention.
  34. PowerPC is approx 40-50% faster. by Colin+Smith · · Score: 1

    At the same clock speed, PowerPC chips run approx 40-50% faster than the PIII equivalent. So your 550MHz PPC is approx equivalent to a 750MHz PIII.

    Though (Hint Hint IBM/Motorola) it *would* be really nice to have a 1+GHz PPC! A 1GHz PPC would be approx equivalent to a 1.5GHz Intel.

    Of course in *real* life, CPU speed is largely irrelevant. RAM and disk performance is much much more important. (It's all about I/O)

    --
    Deleted
  35. Re:What took you all so long ? by Lally+Singh · · Score: 1
    As for this IBM chip. What took you all so long ? SMP on a single chip is an obvious advance.
    Actually, the IBM Power architecture was always designed for multiple cores on one die. It not only is not a surprise, but quite common amongst high end CPU architectures. It gets you the speed of one-die SMP, but involves the cost of cooling one of these SOBs.

    --

    --
    Care about electronic freedom? Consider donating to the EFF!
  36. Re:overclocking by Mr.+Flibble · · Score: 1

    I think his point is if you toast a $87 celeron no great damage is done. But if you toast a $5000-9000 processor you are either Bill Gates or you are out one processor that is worth more than my car.

    I think few people have the cash to "risc" overclocking such expensive processors.

    --
    Try to hack my 31337 firewall!
  37. Re:Not consumer level, thats for certain. by Mr.+Flibble · · Score: 1

    I did not think that it did either, but I found a link off the utah GLX project that said it did. I cannot confirm this though. I could be talking out of my ass...

    --
    Try to hack my 31337 firewall!
  38. My bad. by Mr.+Flibble · · Score: 1

    I was wrong, the link I was thinking of is here and it has nothing to do with Unreal SMP. Oops.

    --
    Try to hack my 31337 firewall!
  39. Other IBM developments by Zoyd · · Score: 1

    Also announced today by IBM are the two newest world's highest capacity hard drives. These also sport IBM's first glass disk platters.

    Should be a good match for these new CPUs.

  40. Re:Not consumer level, thats for certain. by red_one · · Score: 1

    AFAIK, UT does not support SMP.

  41. Re:Explanation - Re:What took you all so long ? by zzg · · Score: 1

    No, Im saying 486SX were 486DX chips with defect FPU so they just disabled the FPU and sold them as SX. I think there were ways to enable the FPU afterwards, never tried myself though. Also, later they built special 486SX chips without the FPU.

  42. could they pull the SX trick? by zzg · · Score: 1

    That is, if the defect only affects on of the cores disable that one and sell the chip as a singlecore chip. Seems like it would work even better on quadcore chips where one core more or less wouldnt really affect performance.

  43. Power PC G4 by kennedy · · Score: 1

    Have you all forgotten about the G4 7400 chips already? they can have up to *4* cores on each chip.

    now all i need is a quad core-quad processor G4 and i can take over the world....

  44. Re:overclocking by Bald+Wookie · · Score: 1

    Yeah, and I doubt that anything that comes with a $5000+ processor also has SoftMenu II. Although I would like to see the service engineer's face when I ask him what jumpers to change...


    -BW

  45. Re:Overclock? by Bob+Dobbs · · Score: 1

    The RS/6000s aren't that expensive. A Power3 based 44P Model 170 lists at around $10,000. A bit more than the typical PC, but no need to keep it under lock and key.

    The Power4 chip is expected to show up in similar models and, I would expect, in similar price ranges.

  46. Re:interesting details by Mike+Miller · · Score: 1
    Simply putting two cores on a die is no big deal. What IBM is doing is near-insane (in a good way). From what I recall from MPR...

    OK, each die has 2 independant cores, with a shared 4MB L3 and their own memory controler to RAM. They also have two ultra-high speed links to connect to other chips

    Each cartridge (IBM's famous ceramic substrate) contains 4 dies, connected to each other via their high speed interconnects and for the power, ground, memory and I/O they have in excess of 2000 BGA 'pins' requiring something like half a ton of force to hold it to the motherboard!

    It gets even better :-) The power estimates are around 125W/die. So for the cartridge, you are looking at Half a Kilowatt of power! For a 32 way system, you would have 4KW of power in the processors alone. You still have to add drives, memory, I/O processors and fans. Is that just nifty or what?

    Thats no computer... Thats a industrial heating system!

    - Mike

  47. Fort Knox, phsa! by eriks · · Score: 1
    to all you non-US people, Fort Knox is a place owned by the Treasury department where lots of precious metals are stored. It is locked up pretty tight

    Only thing in Fort Knox is a few "Guards" sitting around playing cards. No gold or silver there, was gone long ago... :>

  48. PowerPC WAS approx 40-50% faster. by Valdrax · · Score: 1

    That's how it used to be. AMD and Intel have caught up now. Read the Ars Technica article comparing a G4 to an Athlon. While PPC has a vastly superior SIMD archetecture, the x86 family has been catching up on integer and FP performance. Nowdays, an Athlon at the same clock speed slightly outperforms a G4 for the common integer and FP ops that your system spends more than 80-90% of its time doing, but Athlons are soon to be out in 1 GHz models while PPC 7400s are still languishing at 500 MHz.

    I mean, this SUCKS. IBM had the first demo silicon at 1.1 GHz almost 2 years ago now. We were promised 1 GHz chips with multiple processors on the core by the AIM consordium's projections 2 years ago by 2000. For servers, they seem to have only slipped 3-6 months, but us desktop PPC users are stuck with x86 envy.

    x86 envy! Of all the *#@! archetecures out there, it has to be one of the most arcane, messed up designs that is beating the pants off of everyone. I mean, all the addressing modes, the stack-based FPU, variable-length instructions, and MMX/3DNow/SSE! Have you ever downloaded volume 2 of the "Intel Architecure Software Developer's Manual", the instruction set reference? It's 854 pages! How can companies with so much baggage to work with beat everyone else to the punch on 1 GHz?

    Gah. It's bad enough to realize that we'll probably never see the end of that hideous kludge of a design, much less that it's because it's beating the pants off of cleaner designs due to production problems. Makes me nauseous...

    For that matter where's our 1 GHz+ Alphas?

    --
    If it's for-profit but free, you're not the customer -- you're the product (e.g., the Slashdot Beta's "audience").
  49. IBM Chip Developments by Maclir · · Score: 1
    Despite IBM's (well stated) committment to Linux and Open source, it is their proprietary product lines - AS/400's, RS/6000's and the big iron dinosaur mainframes that still make the profits that allow them to undertake this R&D. These certainly sound impressive CPUs - and one wonders at how much money IBM is still spending on R&D each year to continue to come up with these devices.

    Provided there is always a market for the top end, proprietary (and expensive) closed architectures, then IBM (and others) will continue to generate the profits to research and build leading edge stuff. Will you and I ever have one of these babies on our desktop, or will my server at home running Linux or FreeBSD or whatever thump along at 2+GHZ? Not likely, but you can bet that in the next few years, most consumer level chips will use some of these features.

    Moore's Law will last at least a few more years, I expect.

    Maclir

    Disclaimer: I once worked for IBM in the bad old days.

  50. Re:Starting at 1.1GHz? by Betcour · · Score: 1

    Sorry but your specs are wrong !

    1+ gigahertz (2 Ghz is later, it starts at 1.1 in 2001)
    Dual processor on one dies
    500mhz bus
    LARGE L2 cache (I would imagine 2-4mB
    64 bit
    -------------------------------
    x86 CPU's ::
    1+ gigahertz (this year, should be 1.5 to 2 Ghz by the time the Power4 launch)
    One processor on die
    400mhz bus (not 200 Mhz)
    512kB-2mB L2 cache (most probably 1 MB, but the Forster will have up to 4 MB since it is a Willamette for servers)
    32 bit

    Doesn't look so bad for x86 anymore...

  51. Re:Overclock? by jovlinger · · Score: 1

    nitpick:

    negroponte, I thought, defined bits as the immaterial thingies, and atoms as the material ones.

    So shouldn't that be atoms are atoms!?

    Or perhaps you were referring to something else

  52. Re:Overclock? by jovlinger · · Score: 1

    I imagine that the secure storage of the hardware is due less to the price of the servers than it is to the fact that most system security is predicated on the machine remaining physically inviolate. I get root read privs on any machine I can rip the harddrive out of, for example.

    Johan

  53. Re:Performance by jovlinger · · Score: 1

    well, the Tera chip (which I constantly rant about -- I love that thing) and the MAJC do know about threads. However the IBM thingies, and indeed pretty much every other chip out there indeed do not know a thread from a twinned up piece of animal hair.

    They need OS support to switch from one instruction stream to the other. Without that os support, it is up in the air what the IBM multichip would do (I guess I should read the link, eh?), but I imagine that one of cores is a master and is the only one which is activated on startup. It is up to it to run the OS code that initialises the other cores with the appropriate instruction streams.

  54. Re:Explanation - Re:What took you all so long ? by RickyD · · Score: 1

    You are ignoring the fact that the chip geometry is also getting much smaller. Yes there are more transistors on the die, but the die is still small because you can get more of them in a given area. Thus, the yields on wafers is still good.

  55. Re:OverClock by Cerberus7 · · Score: 1

    I draw the line at my large intestine. Ick.

    --
    I don't know about you, but my servers run on the power of cotton candy and happy thoughts. -Anonymous Coward
  56. Re:overclocking by dgb2n · · Score: 1

    Uhh... He's joking. By any chance has your sense of humor been surgically removed?

  57. Re:Explanation - Re:What took you all so long ? by Northern+Hunter · · Score: 1

    That's true. But as your geometry gets smaller, so does your vulnerability to smaller bits of dust and smaller defects / imperfections.

    Of course there are other things: we know they've gotten yields up higher and higher per transistor, because they keep packing so many on... I think another poster implied that it must have simply been more cost/performance effective to design more complex single core chips than to try and do multi-cpu chips with the less complex cores...

    There must be a technical/trade paper/review out there somewhere which details not only what all the issues, sub-issues, and permutations of issues have been over the past 10 years on this, but what the actual numbers/progress on each item have been, and how the math actually worked out along the way, and thus show what things were actually important in getting the yields high enough to do this. It would be an interesting read.

    -NH

    Hey zzg: Are you saying that the 486 SX's were chips which had defects/failures in their caches, and thus 'selected' for cache-disabling? (I knew they had their caches disabled, but I don't think I knew/figured that it was a by-product of the yield failures... I think I just figured it was a corporate decision to hobble and sell into the lower cost market...)

  58. I wonder.... by Darth+Yoshi · · Score: 1

    I wonder if Transmeta could do that with their Crusoe processor?

    Hmmm, four Crusoe processors on one chip....

    --
    // TODO: fix sig
  59. How about hundreds of small processors in one die? by porttikivi · · Score: 1

    I have the impression that implementing a 2n-bit instruction set requires exponentially more die area compared to an n-bit architecture?

    Modern Intel 32-bit processors have tens of millions of transistors. The 80286 processor, with 16 bit architecture, had something like 125 000 tansistors.

    Imagine a processor die which would run hundreds of tightly coupled equal 16-bit cores, with a modern clock frequency of 1 GHz.

    The arithmetics that a program does can be broken to sub-problems that work within 16-bit number sets. When you need bigger numbers, you could use software emulation of bigger numbers. Accessing large data sets should be done through object methods anyway, not by direct addressing, so you don't necessarily need the traditionally desirable large, flat address spaces.

    All in-die processor cores could have a sizable private memory for higly dynamic small objects like run-time system constructs and such.

    It should be easy to design and optimize a processor, which is built of small equal, cloned parts.

    Of course you would need parellel programming techniques to use the power. But modern languages like Bell labs Alef and Limbo make it fairly easy and starightforward to write highly parallellizable prgrams. Most current systems use threading anyway. The channel abstraction in Hoares "Communicating sequential processes", later in Occam language and in those Bell languages I mentioned works nicely in this kind of arcitecture.

    --
    Anssi Porttikivi / app@iki.fi
  60. Network cards by cperciva · · Score: 1

    There was a serious discussion on usenet recently about whether it was possible to replace the oscillator on a network card, to get it to run at 200mbps instead of 100mbps.
    The discussion was cut short when it was pointed out that you would have to change all the network cards connected to the same network for it to work.
    As far as I know, nobody has tried this yet.

    1. Re:Network cards by cperciva · · Score: 1

      Man, what I really want to overclock is the battery in my laptop. =)

      You want to make the battery run faster? How odd. I'd prefer it to run slower, and thus last correspondingly longer. ;-)

  61. Re:Starting at 1.1GHz? by cperciva · · Score: 1

    You missed my point. RISC processors in theory run on at least as high frequencies as their CISC counterparts, if not significantly higher. Instead we have Intel releasing 1.3GHz Willamettes in Q3 00, and IBM releasing 1.1GHz Power4s in Q3 01, almost a year later. Sure the memory bandwidth, dual processors, increased cache, and massive superscalarity will make the Power4 faster anyway... but it used to be that RISC systems had better CPI *and* higher clockspeeds.

  62. Starting at 1.1GHz? by cperciva · · Score: 1

    I'm not impressed. Sure Intel and AMD have had trouble even getting up to 1GHz, but the power4 should be compared to the upcoming Willamette cpus, which are going to be starting at 1.3GHz -- and that is in the middle of *this* year, not the second half of next year when the power4 will be shipping.
    Sure it will be cool to have two processors within a single die, sure it will be cool to have a 500MHz bus... but the article makes it sound like the clock speed will be something really great, while in fact it is a little disappointing.

    1. Re:Starting at 1.1GHz? by The+Variable+Man · · Score: 1

      I read it in microprocessor report several months ago. It was a very good article but the chip rotation really impressed me.

      Nice to see lateral thinking is alive and well!

    2. Re:Starting at 1.1GHz? by The+Variable+Man · · Score: 2

      The really interesting design feature of this architecture is that the chips work very well in SMP. 4 chips can be placed together each rotated through 90 degrees so that their fast interconnects align.

    3. Re:Starting at 1.1GHz? by Haven · · Score: 4

      "...will operate at upward of 2 gigahertz. It will be called the Power4, will use a .18 micron fab process, and feature on-chip L2 cache (supposedly quite large, though no numbers mentioned), and bus speeds of 500Mhz..."

      Power 4 ::

      2+ gigahertz
      Dual processor on one dies
      500mhz bus
      LARGE L2 cache (I would imagine 2-4mB
      64 bit

      -------------------------------

      x86 CPU's ::

      1+ gigahertz
      One processor on die
      200mhz bus (I don't recall the bus of the willamette)
      512kB-2mB L2 cache
      32 bit

      This not something you will see on Toms Hardware. Clockspeed isn't everything. A 500mhz 21264 DEC Alpha is MUCH faster than a 500mhz PIII. The Power4 is not a desktop processor. Compaq will not ship computers with the Power4 processor in them. People need to understand this! When was the last time you saw a benchmark that was PIII vs. RS/6000? I have only seen it once, and that was the PIII Xeon compared to other server hardware namely from Sun and DEC. That was on Intels site.

  63. Specs on the power3...predecessor of the power4 by joedumb · · Score: 1

    http://bwrc.eecs.berkeley.edu/CIC/summary/local/ scroll to bottom for the specs(including the SPEC values) for the power3...includes practically all the mainstream processors

  64. Re:What took you all so long ? by billybob+jr · · Score: 1

    What took you all so long ? SMP on a single chip is an obvious advance.

    Memory bandwidth is a problem in all SMP systems AFAIK. Maybe what they were waiting for wasn't the capability to put two or more chip cores on a chip. Maybe it was multiple cores + on die L2 cache to alleviate the memory bottleneck problems.

  65. Re:On the Desktop? by gaudior · · Score: 1
    Of course it will arrive on the desktop. It will have an Apple logo, come in a translucent graphite case, and be declared a munition by the DOD.

    I can't wait!

  66. The guys wrote this article is a dick! by DeeezNutz · · Score: 1
    Pure Fiction said in the article:
    I wanna overclock one of these bad boys

    Seriously, the chip runs at over 2 GHz and has a 500 MHz bus, and the first thing Pure Fiction says is I wanna over clock one of these bas boys? Get a life dick.

  67. Run Linux on this? Of course! by Bug-Y2K · · Score: 1


    Thanks to the folks at Terra Soft: Yellow Dog Linux!

    See it in action on a prototype.... Applefritter

  68. OSX - Hoza bout it Steve by HiyaPower · · Score: 1

    If our buddy can get out from under Motorola's thumb and offer a multi-processor Mach Kernal for this puppy...

  69. Re:Two sets of register files by Deflatamouse! · · Score: 1

    You missed my point.... (not like anyone will read this since this thread/topic is pretty old...)

    The utilization of this CPU (percentage wise) would be the same as the utilization of just one of the cores. Since it is two cores, and not just one with two register file. One core cannot dump extra instructions into the other core since the internal issue buffers are separate and probably not shared.

  70. Re:Performance by Deflatamouse! · · Score: 1

    Completely missed my point again.

    You mentioned threads... yes, this cpu can process two threads at the same time... so can SMP machines...

    A thread is a concept of the operating system... by definition, a thread shares the same memory space as other threads of the same task, etc. The CPU has no idea what you want to do, it only crunch numbers/streams of instructions.

    The OS schedules the instructions into the CPU. Therefore, in order to crunch vertices 1-1000 and 1001-2000 simultaneously, it is up to the OS to tell the processor to do that, i.e., set the two pc's to correct locations. Assuming this machine does run an OS (most if not all machines does these days) the OS will need to schedule other processes/threads which means overhead... as much as SMP machines.

    Your message also contradicts the your previous. If the CPU is two complete cores, the execution units cannot be shared since the issue buffers are not shared. You cannot issue one instruction that resides in one core into the other core, there are no paths connecting those two. The article only mentioned that the L2 can be shared, not other buffers lower down the abstraction.

    The architecture they described from the article seems to me like SMP on a single chip, which is different from multithreading. Go read about these two, and learn the differences...

    It's funny how a message that is totally wrong still get a score of 2... just shows that the majority of slashdot readers, even the moderators, do not know much about internals of computers... not that I expect them to know though... but it's just funny.

  71. Two sets of register files by Deflatamouse! · · Score: 1

    Last time I heard, is that the Power4 chip supports multithreading by having two separate register files on chip (not two separate complete cores.) Perhaps the number of execution units in the core is also increased, but I am not to sure about that.

    But IBM could have changed the architecture since then...

    I think the whole point was to keep CPU utilization (and execution core unit utilization) high. Seems to me that slapping two complete cores together is kind of dumb because both chips will still be underutilized... half the silicon will be sitting there unused...

    1. Re:Two sets of register files by be-fan · · Score: 2

      No, its two complete cores. And what do you mean the second proc would be sitting there unused? The stuff that this proc is going to be used for is highly parallel. Even most media stuff is parallel. Load BeOS up on a dual proc box and run a few media apps. You'll see that both procs have pretty high utilization.

      --
      A deep unwavering belief is a sure sign you're missing something...
  72. Re:Performance by Deflatamouse! · · Score: 1

    Unfortunately, the CPU has no concept of processes or threads. It just processes streams of instructions like a fatass with a stream of hamburgers. It won't care whether the burgers are from Burger King or MacDonalds or Wendy's.

    This means that in order to utilize this dual core, single die chip, we still need an OS, which means overhead... I would think the overhead is as much as SMP machines.

  73. Re:Performance by Deflatamouse! · · Score: 1

    One benefit of this dual core chip is of course, as mentioned in the article, the bandwidth between the two cores, and the ability to share L2 caches.

    This would improve raw performance and will be much better than SMP setups, although I think the overhead _percentage_ would be about the same.

  74. Performance by vchoy · · Score: 1

    It will be interesting to see if dual processor on-die will double the performance. I wonder what the SMP overhead will be?

    1. Re:Performance by be-fan · · Score: 2

      Still wrong. It does process a stream of instructions, but that is exactly what a thread is! Whats to say that it can't process 2 streams of instructions? The guy above is still wrong, the POWER4 is two chips, but multiple threads CAN be done on the proc level. I think (don't quote me, I read it a long time ago on /.) that the Sun MAJC can process two threads. It goes like this. If one thread is say an OpenGL transform thread, while the other is a rasterization thread, whats to keep the transform thread from using the fp units while the raster threads uses the integer units? Or two transform threads sharing 4 fp units? 3D in general is hidously parallel. Again, I'm not quite sure, but I think someone is working on a multithreaded open gl implementation that uses multiple threads. Seriously, though, it makes sense. Whats to stop one proc from doing the matrix multiplys on verticies 1-1000, while the other does it on 1001-2000?

      --
      A deep unwavering belief is a sure sign you're missing something...
  75. New Technology by Splitzy · · Score: 1
    A while ago there was a slashdot article about how IBM now has the ability to put "tens of billions" of transistors on a single chip, while intel only sports 27 million.

    What I want to know is when IBM will make some chips with this technology, seing as how chips will probably push well past 2Ghz?

    Overclock that bud...

    Do not provoke me to violence, for you could no more evade my wrath than you could your own shadow.

  76. Power3 is PowerPC by Wesley+Felter · · Score: 1

    The info I read (which is admittedly vague) says that Power3 implements the PowerPC ISA (both 32 and 64 bit versions).

  77. Re:overclocking by ILikeRed · · Score: 1

    What many people fail to realize is that more than your chip is at risk. I have seen motherboards go bad - and video, audio, and hard drives are at risk as well. I have replaced motherboards in systems where the owner overclocked their AMD system - and the cpu was fine. I have also seen corupted data.... not what I would call a good solution.

    --
    I have come to a conclusion that one useless man is a shame, two is a law firm, and three or more is a congress -J Adams
  78. Re:overclocking by Pxtl · · Score: 1

    Idunno, overclocking high-end processors is doable in some cases (not necessarily this one), you just need more coolant then your average nuclear powerplant.

  79. Re:Overclock? by Spoing · · Score: 1

    How would you overclock a "production (by production I mean RS/6000 AS/400 type proprietary machines)" type server? This isn't some BX motherboard with clock speed jumpers.

    The old fashoned way would probably be the easiest; change the frequency that the chip uses for timing. Either swapping out the crystal or modify the traces that are used to set the timing frequency would do it. That's what the BX boards do.

    Remember: bits are bits!

    You could "Kryotech" it, but I think there would be vast amounts of cooling already being it 2 chips on one die running at 2 gigahertz even with a .18 micron fabrication.

    Cooling is a necessity after you actually increase the frequency...and we're back to the crystal again.

    --
    A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
  80. Hoo hoo, Magic Bus! by effer · · Score: 1

    Where and when will we have a board speed to use this uniformly? I want this for loading high cache calculations, but I also want it to (gak) handle the dsp aspects in a handhald. Very cool, very hot!!

  81. Re:overclocking by toph42 · · Score: 1
    Do you know any engineers? They overestimate everything, like Scotty telling Kirk how long repairs will take. You bet that you can run that chip faster than it is rated.

    Topher

    "I've not met a human I thought was worth cloning, yet. Lot's of cows though." -- Mark Westhusin

  82. Re:overclocking by Sir+Nimrod · · Score: 1

    Do you know any engineers?

    I am an engineer; or, at least, I pretend to be one most of the day. I help design chipsets for high-end systems.

    Yes, we provide some margin when we set operating frequencies. But we spend an awful lot of time determining the operating boundaries. The word "estimate" doesn't give the full flavor of what we do.

    I'm not going to provide details, because they're probably confidential. But I will say this: We know the voltage/frequency/temperature points at which our chips stop working properly. And it's in our interests to push the frequency as high as it will go.

    Call me a wimp. Go ahead. But I don't overclock my system, and I definitely wouldn't overclock anyone else's.

    --
    The United States of America: We mean well.
  83. Microprocessor Report article on power4 by Anonymous Coward · · Score: 2

    IBM announced power4 at the Hot Chips conference last fall. There is an excellent article in Microprocessor Report detailing the procesor. The report can be found on IBM's website here: http://www.chips.ibm.com/news/1999/microprocessor9 9.pdf

  84. Re:OverClock by Russ+Steffen · · Score: 2

    I once overclocked my watch - first time in my life I have every been early for anything.

  85. What took you all so long ? by Forge · · Score: 2

    You == IBM, iNTEL, AMD etc..

    As for this IBM chip. What took you all so long ? SMP on a single chip is an obvious advance. When you vastly increase the amount of circuits on a chip as happens between a Celeron and a P3 without a matching increase in performance something has to give. Why not make that the number of cores on the chip? I hope this isn't patented because it really is obvious.

    This brings up something I have been thinking about with the Cruise. If you can convert 32 bit instructions to 128 bit meta instructions and have the finished product run as fast as on the genuine 32 bit CPU.

    What if the same technique is applied to an SMP setup in such a way that the software sees the processors as a single CPU. Right now this kind of abstraction is handled by the Operating system and except on the Mainframe that is very inefficient. To the point where 2X400MHz CPUs is a whole lot faster than 4X200MHz.

    Now if the whole thing including say 6 CPUs and 2 Megs of cache were put on a single chip at 500MHz to 2GHz, how fast would it be ? My guess is that this could easily be the fastest low end server or workstation chip by a good margin.

    --
    --= Isn't it surprising how badly I spell ?
    1. Re:What took you all so long ? by buysse · · Score: 2
      To the point where 2X400MHz CPUs is a whole lot faster than 4X200MHz.

      Depends on what you're doing, my boy. If you're running 4 different CPU-hungry jobs, a 4X200 may well be faster than a 2X400 -- assuming everything else about the processors is equal.

      --
      -30-
    2. Re:What took you all so long ? by Haven · · Score: 2

      No, when running a process on a Windows 2000 box such as Quake II that doesn't do SMP, Windows 2000 will put the non-SMP program on its own processor. "Load Balancing"

    3. Re:What took you all so long ? by UnknownSoldier · · Score: 2

      > No, when running a process on a Windows 2000 box such as Quake II that doesn't do SMP, Windows 2000 will put the non-SMP program on its own processor. "Load Balancing"

      That is correct. To prove this is the case, you can set the affinity (which cpu a thread is bound to). Task Manager | Process | Right-click on process | Set affinity.
      (This setting doesn't show up on a single cpu.)

      Another quick way to see this is the case is to start up Quake, and look at the cpu utilization. It will be around 50%, meaning the one cpu is taxed, while the other one isn't doing anything.

      One means of burning in a new dual system is to run 2 copies of Prime95: one on each cpu.
      For fun, I left 2 copies of prime95 and one copy of unreal running overnight. The one prime95 hadn't reached as many calculations as the 2nd one.

      Note: Windows NT runs the OS on both processors. It will not run a non-SMP aware process on both cpu's.

      For anyone looking for a cheap dual system, this is what I did:
      $35 cel/366 o/c to 550
      $140 Abit BP6
      Hard to beat the price !

      Cheers

    4. Re:What took you all so long ? by Haven · · Score: 3

      What took you all so long ? SMP on a single chip is an obvious advance

      1 terahertz is an obvious advance too. Just because its obvious doesn't make it easier. I'm sure that IBM has had prototypes of dual chips on one die before. They wanted the 7000 series(G4) of the Power PC chips to have a high end model that was 4 processors in the processors core. It is just hard to do. Just like it is hard to write an operating system that will make Non-SMP programs utilize SMP. Windows 2000 has "load-balancing" where it will run processes that are processor intensive on the chip that isn't running the OS.

  86. Re:overclocking by Tet · · Score: 2
    Do you know any engineers? They overestimate everything, like Scotty telling Kirk how long repairs will take. You bet that you can run that chip faster than it is rated.

    Yes, you can, if you're prepared to take the risk -- that's the whole basis of overclocking. Chips are rated at the speed the manufacturer can guarantee they'll operate as intended. Say you overclock your chip by 15%. You're now encroaching into the safety margin that the engineers and the manufacturer allowed to be sure that all chips will work correctly. Even so, perhaps 98% of all chips will be OK. Do you want to gamble on whether or not you've got one of the 1 in 50 chips that won't work? Personally, I don't like the odds, particularly when the chips cost as much as this one will...

    --
    "The invisible and the non-existent look very much alike." -- Delos B. McKown
  87. Re:overclocking by Tet · · Score: 2
    Overclocking SMP is NOT suicide [...] What's the risk?

    The risk is both damage to the physical hardware and data corruption. The hardware can easily be replaced when it's a cheap Celeron, but not when it's a dual core IBM Power CPU. The data corruption can't be ignored, though. Don't believe me? Maybe you'd like to hear it directly from someone you might trust.

    --
    "The invisible and the non-existent look very much alike." -- Delos B. McKown
  88. Re:overclocking by Tet · · Score: 2
    Overclocking stupid, eh? Actually, many PowerPC processors overclock quite well, I know from personal experience.

    Maybe they do, maybe they don't. You're missing the point though. If you want faster speeds, go buy faster processors (or more of them). Overclocking is only for those who can't afford to do that. People buying these chips aren't going to fall into that category.

    The other point to consider is that overclocking an SMP system is tantamount to suicide, by all accounts. Now maybe that won't be the case here, because the cores are on the same die, and hence will be affected in exactly the same way, but I don't know enough about it to be sure, and I certainly wouldn't risk it.

    --
    "The invisible and the non-existent look very much alike." -- Delos B. McKown
  89. Q: How do I Overclock my Light Bulb? by Guppy · · Score: 2

    I've been trying to overclock my lightbulb, and I thought I'd ask you gurus on Slashdot for some pointers. My bulb says "60W" on it, and I want to get it up to 75 or 100.

    • I'm having a heck of a time getting the heatsink to stay put, it keeps sliding off the top of the bulb. Any suggestions?
    • Microsoft Lightswitch keeps crashing. Do I need to up the voltage to keep it stable? I've got a 220V line that I could try plugging it into.
    • My lightbulb is currently running at 60 Hz. I've heard that when you increase the frequency, the lightbulb will start emmitting ultraviolet or even X-rays. My friends tell me I can protect myself by painting the lightbulb black. I need to know how many coats of paint to use, please help!!!
  90. Not consumer level, thats for certain. by Mr.+Flibble · · Score: 2

    I suspect that it will be some time before this technology ends up in consumer PC's. The fact that its meant for servers aside, most stuff is not coded to support multi-threading.

    Sure, *nix is, BeOS, and NT (2000) are, but the majority of people still run 9X on their desktops.

    Quake 3 and Unreal Tournament support SMP, but there are few consumer level applications that support it. Apparently BeOS can force multithreading, and this is cool, but what we really need are more apps that can take advantage of paralell calculations. Even Carmack states that dual processors running Q3A only increases performance in the most demanding situations.

    Even the guys who maintain the Beowulf-How-to (someone is going to post this...) say that paralell computing is great for crunching data, well, IN PARALELL. Quake is not paralell. Clock speed matters more in 3d shooters than overall crunching power (Unless you *like* a slideshow.)

    Don't get me wrong, I personally would love to have a machine running either Linux or BSD with one of these things in it (or many) but I don't know what the hell I would do with it.

    Until then I will stick with a BP6 and dual-celerons, heck, maybe flip-chips or the new Jalapeno's from VIA/Cyrix.

    I think that this is the way of the future, but we won't see it on the desktop for at least 5 years. (IMHO)

    --
    Try to hack my 31337 firewall!
  91. I wonder... by Graymalkin · · Score: 2

    If Motorola plans to incorporate this into their PPC lines. Taje the 604e for example, from what I understand of its architecture it could have easily been made to do two-chips-on-one-die. I would sorta like to see chips of this caliber in the next generation or so of Mac servers, maybe even non-Mac PPC systems (Linux, BeOS). The benefits of SMP over supercalar is that SMP allows you to have multiple superscalar processing units, if a processor can do n number of processes with a single superscalar processing unit then with SMP it can so xn processes where x is the number of processors. Most people know this already. What really interests me is the high bus speed. Intel and AMD's offerings may be nice for server platforms because of their price but they would get their asses chomped off by the sheer system speed from the Power4. I'm sick of hearing about the Athlon's EV6 bus, the memory (read the entire system besides the processor) only runs at 100mhz. IIRC AMD is going to be using DDR SDRAM with the Sledgehammer to boost its overall system performance and the system will clock at 133, I would still rather have a 500mhz system bus.

    --
    I'm a loner Dottie, a Rebel.
  92. Re:How about hundreds of small processors in one d by Graymalkin · · Score: 2

    The problem there lies in the large datasets, if you were running 16bit code it would be fine but for many applications today (games, graphics, voice recognition, encryption/decryption, ect.) you need more than 16 bits. If you had to emulate 2^n bits higher than 4 you'd have major system slowdown. Having a bunch of identical cores would mean they would need to be small. Small cores mean they won't have the space to have optimized cores. Todays chips have highly optimized cores, like AltiVec that can handle large data sets at high speeds. It's like with Rambus memory, they have really high frequencies but a teeny tiny data bus which means they have lots of latency, sometimes faster is more valuable.

    --
    I'm a loner Dottie, a Rebel.
  93. Re:overclocking by Azog · · Score: 2

    Overclocking SMP is NOT suicide. I know several people with overclocked dual Celerons that work fine. And why not? They are cheap, if one burns up, you throw it away and get another one.

    Heck, throw them away every four months and upgrade anyway. Celerons are cheap as dirt, and when overclocked, are as fast as far more expensive P-III's.

    What's the risk?


    Torrey Hoffman (Azog)

    --
    Torrey Hoffman (Azog)
    "HTML needs a rant tag" - Alan Cox
  94. Re:overclocking by akey · · Score: 2

    Enough with overclocking already. This isn't your $70 Celeron toy. When you get to work +$5.000 chips , you are free to overclock them but I doubt it even occurred to anyone to overclock their $9000 UltraSparc cpu or similar. Yep, overclocking is stupid. flame on ..

    Acutally, when I used to work in Ross (used to manufacture CPUs for Suns) in their modules lab, one of the things that we routinely did was to overclock the CPUs (not to mention other nasty little tricks involving soldering, cutting traces on the MB with an exacto knife, etc.). Mostly it's just a matter of providing proper heat sinks and air circulation. So it did actually occur to at least someone. :-) But you're right in that no serious business customer is going to overclock their high-end workstations and risk invalidating the warranty.

    --

    ---
    "Go Metallica. Die RIAA." -- Linus Torvalds
  95. Enough with the cynicism! This is desktop tech! by xtal · · Score: 2

    Hey guys, are we quick to forget history? The more people get up and proclaim that a given technology is too expensive / not needed / 640k is enough for the desktop, someone goes and proves them flat wrong.

    One of two things happens: Consumer technology just blows away these so called "elite" chips, (anyone want to compare one of those "elite" Alpha 150Mhz machines - once a VERY expensive minicomputer chip - with a 1GHz consumer athlon?). The other is that "poof", it appears.

    There are issues with semiconductor yields as people mentioned preivously. But with celerons going for $70, it won't be too long before someone figures out how to do it cheaply.

    Ahhh, SMP on chip. Long way from the 6502 babyee :)

    Kudos

    --
    ..don't panic
  96. Re:Superscalar vs. on-die SMP by orz · · Score: 2
    1. Each core in the Power4 is very superscalar, possibly more-so than any processor shipping today.

    2. I don't think that such a test (superscalar vs. SMP) would be usefull, as the results would be very, very, VERY heavily influenced by the multi-threadedness (or lack thereof) of the benchmarks, and any two processors available will have enough other differences in architecture to invalidate the tests.

    3. Both cores have small (16 or 32 k, I think) L1 caches, but share a large (1.5M or 2M) L2 cache. Furthermore, several chips share L2s via a ring-arrangement of uni-directional 128-bit 500 Mhrz buses, moving things around such that all cached data exists in the L2 of the chip that most recently accessed it, and in no other L2.

  97. On the Desktop? by fcd · · Score: 2

    You will see this technology on the desktop. Beyond the fact that the Power Series is related to the PowerPC series (IBM uses both in their RS/6000 series), multiple cores has been on the PowerPC Roadmap for a while. (Yes I know that is a rumors site. I have seen something similar on Motorola's site I believe, but can't find it right now). Yeah I know the info is a little out of date...but its just a matter of time.

  98. OverClock by Nastard · · Score: 2

    I wanna overclock one of these bad boys ...

    Always someone willing to ruin good hardware. Is there *anything* you people wont overclock?

  99. Alpha has similar plans for long while now. by Kernel+Corndog · · Score: 2

    I just so happens I was visiting alphalinux.org today and saw Compaq has "just released" a document detailing the Alpha 21364 EV7 SMP on-chip processor. However this document has been out since I believe the October 1998(?) Microprocessor Forum. However, IBM's proposed 2 GHz at 500MHz FSB is quite intrigueing. I know... I know... Compaq seemes to be letting the Alpha wilt away on it's once strong vine but I'm still rooting for it. I remember when Alpha had reached 600MHz and Intel/x86 were sputtering along at half the speed. It wasn't until after the settlement between Digital and Intel did x86 start speeding up. Hmmm...anyone else smell fish? Well here's hoping that the Alpha can bring itself back to it's glory as speed king. And hopefully before the Merced/Itanium "Marchitecture" infects the corporate world.

  100. Superscalar vs. on-die SMP by Shaheen · · Score: 3

    When I initially read this, I thought to myself, "Why didn't IBM just do a machine that was super-superscalar?" (Superscalar basically means that the processor takes n instructions at a time, rather than just 1 at a time).

    It would be really interesting to see the results from using on-die SMP versus a chip that is just twice as wide (2n instructions, instead of n).

    Also in question is how the caching is done. Do both cores update the same cache? Or do they operate on separate caches?

    --
    You should never take life too seriously - You'll never get out of it alive.
    1. Re:Superscalar vs. on-die SMP by cperciva · · Score: 3

      When I initially read this, I thought to myself, "Why didn't IBM just do a machine that was super-superscalar?"

      Because of limited instruction level parallelism. Even with a 512 entry reorder window, 256 renaming registers, and a 256-way superscalar architecture, you still won't have ILP beyond about 10 on the gcc component of the spec benchmarks. Furthermore, as you increase the width of a machine, you increase the difficulty of finding all the data dependancies quadratically, since each instruction must be compared with each other instruction. Ultimately it comes down to an issue of decreasing returns, and you find that it is cheaper and faster to run two threads at once than it is to allocated twice as many resources to a single thread.

      As for the question of caching, I'd assume that they share the L2 cache the same way as in any other such system -- they share the bus, write to and read from the same cache, and snoop each other's actions. They of course would have their own internal L1 caches, with lower latency.

  101. Re:Power arch at 500 MHz! by RISCy+Business · · Score: 3

    No, POWER and PowerPC are not finally merging, nor do I think they ever will. The POWER architecture, however, since the POWER3, has fully supported the PowerPC instruction set in 32 and 64 bit implementations.

    Yeah, IBM and Motorola are in bed again. But it's been on again off again for years now. Don't count on it bein a final merging of the two architectures.

    =RISCy Business

  102. Overclock? by Haven · · Score: 3

    How would you overclock a "production (by production I mean RS/6000 AS/400 type proprietary machines)" type server? This isn't some BX motherboard with clock speed jumpers. You could "Kryotech" it, but I think there would be vast amounts of cooling already being it 2 chips on one die running at 2 gigahertz even with a .18 micron fabrication.

    Second of all, good luck on coming up with the cash to buy one. Even if where you worked got one they would still keep it under lock and key tighter than Fort Knox (to all you non-US people, Fort Knox is a place owned by the Treasury department where lots of precious metals are stored. It is locked up pretty tight.). I'm a super user for my network at work, and I'm not even allowed near some of the boxes we have.

  103. Better article on Power4 by slyfox · · Score: 3
    There is a good article on Power4 at IBM's web site.

    The article says the system will have 10 GBytes/second of memory bandwidth and a 45 GBytes/second multiprocessor interface. The article estimates the cache sizes as 1.5 MB for the shared on-chip L2, and 32MB for the off-chip L3 cache. Each processor die has 5,500 pins and attach directly to a multi-chip-module (MCM).

    The article also suggests that the system will support up to 32 processors (2 per die x 16), and even more processors using clustering technology.

    Looks like this is going to make for a fast server system.

  104. Power arch at 500 MHz! by Paul+Komarek · · Score: 4

    At one time, not too long ago, the Power 3 architecture was rated (by some) as the second fasted floating point to the Alpha 21264 500MHz. The punchline is that the Power chip was running at 200 MHz!

    In the past, complications with multiprocessor computers has prevented their supremacy of single cpu architectures. I'd love to see IBM succeed with their multicpu chips, as I believe this technology may solve the nagging parallel problems with processor interconnect. And the Power architecture is very nice.

    Does anyone know if the PowerPC and Power architectures will finally become one with this product, as was expected with previous Power revisions? Somehow, I really don't expect to see it ever happen, with the way Motorola and IBM have gotten along.

  105. overclocking by guacamole · · Score: 4

    I wanna overclock one of these bad boys ...

    Enough with overclocking already. This isn't your $70 Celeron toy. When you get to work +$5.000 chips , you are free to overclock them but I doubt it even occurred to anyone to overclock their $9000 UltraSparc cpu or similar. Yep, overclocking is stupid. flame on ..

  106. interesting details by orz · · Score: 4
    The two processor cores is really cool, and something a lot of people have been hoping for for a long time, although not quite as cool as some of the stuff Compaq/Alpha is doing, but

    This article doesn't mention the most interesting detail I heard about the Power4: They're supposed to come in small rings of about four chips connected by ultra-high frequency 128 bit uni-directional buses that allow multiple chips to share their L2 caches, with fairly intelligent coherency stuff handled in hardware.

    The only bad stuff is that they're really targeting the highend server market, where I want most of that stuff for the low-end too. It's supposed to be 400 mm^2 on a .18 micron process w/ copper, so even after it moves to .13 micron it'll still be too expensive to mainstream use.

    Other tidbits include: 1. It's dropping a few of the more complex instructions from it's instruction set and depending on the OS to emulate them, 2. To simplify instruction scheduling, they're keeping track of packets of instructions instead of individual instructions, and 3. The per chip L2 size is supposed to be 1.5 megabytes.

  107. Explanation - Re:What took you all so long ? by Northern+Hunter · · Score: 5

    > SMP on a single chip is an obvious advance.

    Unfortunately if you multiply the amount of circuitry you are trying to deliver in one fully working device, you cut your yield exponentially. This is a SERIOUS problem if your yields aren't high enough to make the exponential nature a small effect.

    Say on one wafer you have 30 defects bad enough to wreck whatever chip they are on. Now normally you make 100 chips on that wafer. So (first approximations here, I won't actually do the statistics) 70 chips make it, your yield is 70 percent.

    But now you double the size of your chips, so that same wafer now only produces 50. But you still have those same 30 bad defects. Whoops, your yield is now 40 percent. Quadruple the size of your die... Whoops, now you will be lucky to get a handfull of that entire wafer (you're trying to get 25 chips when there are 30 randomly distributed defects... I leave the answer as an excercise for the reader :)

    On the other hand if you do the same rough approximation with only 10 super bad defects per wafer, then you go from a 90 percent yield to an 80 percent yield when doubling the die size. No where near as bad an effect on the economics.

    So, the only reason they are now considering it is that they expect to have defect rates reduced enough to make it reasonably ecomonical.

    -NH

    My apologies for avoiding the statistics and actual mathematics, and my examples above use randomly chosen yields. I have an optoelectronics background that is a few years old, back when production yields at some places for III-V QWH Lasers with simple integration with a few other devices had utterly pathetic yields... Like 10 percent!!