IBM One-Chip Dual Processor Due Next Year
PureFiction writes, "Looks like IBM is going to be scaling processors at the chip-die level. ZDnet has this story about plans for a dual-processor, single-die chip that will operate at upward of 2 gigahertz. It will be called the Power4, will use a .18 micron fab process, and feature on-chip L2 cache (supposedly quite large, though no numbers mentioned), and bus speeds of 500Mhz. I wanna overclock one of these bad boys ..." Better get out your pocketbook, then -- they're slated to power RS/6000 servers rather than consumer PCs, at least for a while. 64 bits, copper interconnects, and plans to move down to a .13 micron fab show that IBM's is thinking long-term. Similar technology may reach your desktop first, though, in products like AMD's Sledgehammer.
When I initially read this, I thought to myself, "Why didn't IBM just do a machine that was super-superscalar?" (Superscalar basically means that the processor takes n instructions at a time, rather than just 1 at a time).
It would be really interesting to see the results from using on-die SMP versus a chip that is just twice as wide (2n instructions, instead of n).
Also in question is how the caching is done. Do both cores update the same cache? Or do they operate on separate caches?
You should never take life too seriously - You'll never get out of it alive.
No, POWER and PowerPC are not finally merging, nor do I think they ever will. The POWER architecture, however, since the POWER3, has fully supported the PowerPC instruction set in 32 and 64 bit implementations.
Yeah, IBM and Motorola are in bed again. But it's been on again off again for years now. Don't count on it bein a final merging of the two architectures.
=RISCy Business
your company here.
shelby != ford
What took you all so long ? SMP on a single chip is an obvious advance
1 terahertz is an obvious advance too. Just because its obvious doesn't make it easier. I'm sure that IBM has had prototypes of dual chips on one die before. They wanted the 7000 series(G4) of the Power PC chips to have a high end model that was 4 processors in the processors core. It is just hard to do. Just like it is hard to write an operating system that will make Non-SMP programs utilize SMP. Windows 2000 has "load-balancing" where it will run processes that are processor intensive on the chip that isn't running the OS.
How would you overclock a "production (by production I mean RS/6000 AS/400 type proprietary machines)" type server? This isn't some BX motherboard with clock speed jumpers. You could "Kryotech" it, but I think there would be vast amounts of cooling already being it 2 chips on one die running at 2 gigahertz even with a .18 micron fabrication.
Second of all, good luck on coming up with the cash to buy one. Even if where you worked got one they would still keep it under lock and key tighter than Fort Knox (to all you non-US people, Fort Knox is a place owned by the Treasury department where lots of precious metals are stored. It is locked up pretty tight.). I'm a super user for my network at work, and I'm not even allowed near some of the boxes we have.
This would be different because two threads would be executing simultaneously, so as long as the OS could find two threads that need cpu-time, the hardware would gain a lot of parallelism without having to do more scheduling.
This approach is good because it offers a way to use the excess die space without requiring too much extra effort from the designers. In the last decade or two the # of transistors per chip has gone up several orders of magnitude, while the # of man-years per chip-designer has not come close to keeping pace. It's also nice because the other common approaches are obviously reaching the point of diminishing return.
What Compaq is doing is more interesting though... they are processing multiple threads simultaneously... on the same set of execution units! If one thread doesn't have enough parallelism... that's O.K.. The other 7 can pick up the slack!
The article says the system will have 10 GBytes/second of memory bandwidth and a 45 GBytes/second multiprocessor interface. The article estimates the cache sizes as 1.5 MB for the shared on-chip L2, and 32MB for the off-chip L3 cache. Each processor die has 5,500 pins and attach directly to a multi-chip-module (MCM).
The article also suggests that the system will support up to 32 processors (2 per die x 16), and even more processors using clustering technology.
Looks like this is going to make for a fast server system.
At one time, not too long ago, the Power 3 architecture was rated (by some) as the second fasted floating point to the Alpha 21264 500MHz. The punchline is that the Power chip was running at 200 MHz!
In the past, complications with multiprocessor computers has prevented their supremacy of single cpu architectures. I'd love to see IBM succeed with their multicpu chips, as I believe this technology may solve the nagging parallel problems with processor interconnect. And the Power architecture is very nice.
Does anyone know if the PowerPC and Power architectures will finally become one with this product, as was expected with previous Power revisions? Somehow, I really don't expect to see it ever happen, with the way Motorola and IBM have gotten along.
I wanna overclock one of these bad boys ...
..
Enough with overclocking already. This isn't your $70 Celeron toy. When you get to work +$5.000 chips , you are free to overclock them but I doubt it even occurred to anyone to overclock their $9000 UltraSparc cpu or similar. Yep, overclocking is stupid. flame on
"...will operate at upward of 2 gigahertz. It will be called the Power4, will use a .18 micron fab process, and feature on-chip L2 cache (supposedly quite large, though no numbers mentioned), and bus speeds of 500Mhz..."
::
::
Power 4
2+ gigahertz
Dual processor on one dies
500mhz bus
LARGE L2 cache (I would imagine 2-4mB
64 bit
-------------------------------
x86 CPU's
1+ gigahertz
One processor on die
200mhz bus (I don't recall the bus of the willamette)
512kB-2mB L2 cache
32 bit
This not something you will see on Toms Hardware. Clockspeed isn't everything. A 500mhz 21264 DEC Alpha is MUCH faster than a 500mhz PIII. The Power4 is not a desktop processor. Compaq will not ship computers with the Power4 processor in them. People need to understand this! When was the last time you saw a benchmark that was PIII vs. RS/6000? I have only seen it once, and that was the PIII Xeon compared to other server hardware namely from Sun and DEC. That was on Intels site.
This article doesn't mention the most interesting detail I heard about the Power4: They're supposed to come in small rings of about four chips connected by ultra-high frequency 128 bit uni-directional buses that allow multiple chips to share their L2 caches, with fairly intelligent coherency stuff handled in hardware.
The only bad stuff is that they're really targeting the highend server market, where I want most of that stuff for the low-end too. It's supposed to be 400 mm^2 on a .18 micron process w/ copper, so even after it moves to .13 micron it'll still be too expensive to mainstream use.
Other tidbits include: 1. It's dropping a few of the more complex instructions from it's instruction set and depending on the OS to emulate them, 2. To simplify instruction scheduling, they're keeping track of packets of instructions instead of individual instructions, and 3. The per chip L2 size is supposed to be 1.5 megabytes.
> SMP on a single chip is an obvious advance.
Unfortunately if you multiply the amount of circuitry you are trying to deliver in one fully working device, you cut your yield exponentially. This is a SERIOUS problem if your yields aren't high enough to make the exponential nature a small effect.
Say on one wafer you have 30 defects bad enough to wreck whatever chip they are on. Now normally you make 100 chips on that wafer. So (first approximations here, I won't actually do the statistics) 70 chips make it, your yield is 70 percent.
But now you double the size of your chips, so that same wafer now only produces 50. But you still have those same 30 bad defects. Whoops, your yield is now 40 percent. Quadruple the size of your die... Whoops, now you will be lucky to get a handfull of that entire wafer (you're trying to get 25 chips when there are 30 randomly distributed defects... I leave the answer as an excercise for the reader :)
On the other hand if you do the same rough approximation with only 10 super bad defects per wafer, then you go from a 90 percent yield to an 80 percent yield when doubling the die size. No where near as bad an effect on the economics.
So, the only reason they are now considering it is that they expect to have defect rates reduced enough to make it reasonably ecomonical.
-NH
My apologies for avoiding the statistics and actual mathematics, and my examples above use randomly chosen yields. I have an optoelectronics background that is a few years old, back when production yields at some places for III-V QWH Lasers with simple integration with a few other devices had utterly pathetic yields... Like 10 percent!!