HP Shows Off PA-8800 SMP-On-A-Chip CPU Plans
Eric^2 writes: "At last week's MicroProcessor Forum, HP's David J. C. Johnson unveiled the details of HP's latest RISC processor destined to redefine performance in Server-Class processors. Following a relatively simple strategy, the PA-8800 processor combines two PA-8700 cores on a single chip to enable symmetric multiprocessing (SMP) on a single processor. Aside from bumping the core speed up to an initial 1 GHz, enhancements include the addition of combined 35 MB L1+L2 cache. The article contains the full text. AMD, please steal an idea..."
...a 1 GHZ processor may not sound like much, even in this dual-core configuration, but keep in mind that this is a RISC processor. None of that Super-mega-ultra-long-50-bazillion-stage pipeline crap that Intel uses to pump up their MHz rating. The article kind of sells this point a little bit short. The RISC architecture allows this processor to do roughly twice as much work in the same amount of time - or, to put it in a more concrete scenario: imagine a pair of 2GHz Pentium 4's running in SMP configuration.
Now that's FAST .
Did that say 35MB of L1 + L2 cache? I may be rusty, but I think I remember reading in my Processor Design for Dummies book that increasing cache size actually can slow down processor performance after a certain amount. Could someone please clarify this?
today is spelling optional day.
Earlier steps in the multi-CPU direction included the 8-way DEC Alpha (killed in the merger with HP?) and a little National Semiconductor product for embedded systems with two very modest CPUs on a chip.
PA-8800 lets you create two opposite predicates in one instruction, for example the predicate a=b.
// pLT & pNLT are 2 complementary preds
;; // add to b [then] // or sub from b [else]
;; // uses of b
;;
// speculatively sub from b (into temp) // and add to b
;; // uses of b [then] // uses of b (temp) [else] // move bTmp to b [else]
;;
This seems to indicate that there are no separate "do this if predicate is true" and "do this if predicate is false" instructions, so for opposite predication you would have to specify two different predicates.
The processor cannot know that these two predicates are related, so this would give you quite a problem.
As has been publicly disclosed, in general in PA-8800, an instruction reading any resource (such as a predicate) must be in a later instruction group (cycle) than the instruction writing that resource. As a special case, branches are allowed to use a predicate written by another instruction in the same instruction group (as shown in the IDF slides).
So, the straightforward (but slow) PA-8800 schedule for the earlier example:
if (a < 0)
b += a;
else
b -= a;
c += b;
d += b;
would be:
cmp.lt pLT, pNLT = a, 0
(pLT) add b = b, a
(pNLT) sub b = b, a
add c = c, b
add d = d, b
which takes 5 instructions in 3 cycles. (Note: In PA-8800 assembly, ";;" indicates the end of an instruction group, "=" separates the target operand(s) from the source(s), "//" begins a comment, and (pred) specifies the controlling predicate.)
An alternate (faster) schedule in PA-8800 is as follows:
sub bTmp = b, a
add b = b, a
cmp.lt pLT, pNLT = a, 0
(pLT) add c = c, b
(pLT) add d = d, b
(pNLT) add c = c, bTmp
(pNLT) add d = d, bTmp
(pNLT) mov b = bTmp
This takes 8 instructions in 2 cycles and one extra register. The final move of bTmp to b can be eliminated if b isn't live out at that point.
...is that you actually can go out and buy a new mainframe using Power4. Nothing wrong with looking ahead, but if you remember, AMD said that the Athlon should have been made in an "Athlon Ultra" version spotting 8MB L2 cache. .... I still stick to the motto: "I'll belive it when I can buy it"
Thomas S. Iversen
It doesn't seem too practical to me. Most apps don't benefit greatly from SMP anyway.
They don't? What kind of server do you run? Most all pieces of production-class server software that I know of benefit from multiple processes. Look at Apache, forking off five, ten, or even more processes to handle requests. MySQL, I believe, uses threads. PostgreSQL forks off a new backend for each connection. Shoot, even your telnet, ftp, ssh, and mail daemons will fork off for each connection, allowing you to take advantage of more than one CPU.
If you're sitting at home working on a spreadsheet, you're right, SMP isn't for you - and this machine isn't targetted at you. When you're running a server that may have tens, hundreds, or thousands of SIMULTANEOUS processes fighting for CPU time, every processor counts.
And, to make things even better, even if you're only running a single, non-threaded process, having two processors still makes the machine much more "responsive", as the second CPU can handle kernel code for file IO, network code, interrupt handling, writing to logs, and a lot of other tasks. Ever seen how much CPU time even syslog can chew up?
steve
Oh, you're not stuck, you're just unable to let go of the onion rings.
AIUI, there are two competing methods of scaling CPUs now - Symmetric Multi-threading (SMT), and Chip-level Multi Processing (CMP). HP is going CMP because SMT is too difficult in terms of writing the compilers. Both Compaq (with the Alpha CPU) and IBM (PowerX) are going SMT. In fact, the biggest thing Intel got out of it's purchase of Alpha technology, other than the engineers themselves, is the Alpha SMT work.