HP Shows Off PA-8800 SMP-On-A-Chip CPU Plans
Eric^2 writes: "At last week's MicroProcessor Forum, HP's David J. C. Johnson unveiled the details of HP's latest RISC processor destined to redefine performance in Server-Class processors. Following a relatively simple strategy, the PA-8800 processor combines two PA-8700 cores on a single chip to enable symmetric multiprocessing (SMP) on a single processor. Aside from bumping the core speed up to an initial 1 GHz, enhancements include the addition of combined 35 MB L1+L2 cache. The article contains the full text. AMD, please steal an idea..."
These companies tend to patent anything that will give them a competitive edge in the marketplace. "Stealing an idea" would probably get them into some legal hot water, just like stealing a TV, or your car.
Slashdot requires you to wait 20 seconds between hitting 'reply' and submitting a comment.
It's been 13 seconds since you hit 'reply'!
If you this error seems to be incorrect, please provide the following in your report to SourceForge:
Browser type
User ID/Nickname or AC
What steps caused this error
Whether or not you know your ISP to be using a proxy or some sort of service that gives you an IP that others are using simultaneously.
How many posts to this form you successfully submitted during the day
* Please choose 'formkeys' for the category!
Thank you.
The IBM p690 server uses POWER4 processors. Each
chip has 2 POWER cores with high-speed interconnects. Even better is that each chip is connected to 3 other chips to make up 8 CPU packs.
then i'd be happy.
If you ask the average slashbot, he (I can say he with confidance, cause no women use linux) would tell you that he would want the troll off of slashdot as soon as possible. I argue that slashdot needs trolls to be what it is today. Most posters to slashdot are repressed geeks that no one cares about and everybody pickes on. They are never on the winning team and are always left out. Slashdot readers have latched on to Linux and the community that has arose around it; they love to defend it; even if they are wrong.
Most trolls are helpful and caring folk who care about their fellow man. Trolls help the poor linux user by giving him a easy target to flame, or to attempt to flame, as the case me be. Troll try to be there for slashdot readers of a wide variety of mental capacities. Some trolls are easy to spot; they are designed to be that way. But, to give some of the smarter (and that term is very subjective) slashdot readers a challenge, some of the trolls are harder to spot. The troll might be an easy topic, like the death of BSD, or it might be a harder topic like Natilie Portman and the grits that occupy her pants.
No matter what the level of trolling, it can be said hat slashdot readers love responding to trolls as much as much as the trolls like being responded to. Keep in mind that when a (logged in) troll makes a first post or gives a link to goatse.cx, it is his way of saying "I love you, man!"
Please discuss
Michael Loves Me!
The new edger attachement on my 3-stroke Ryobi (john deere) is just fcukign fantastic! I mean jeses fucking raspberries! It is queit and powerful and even more even efficient that the older two stroke technology of my fathers day! I read that it causes less pollutione than electirc too, when you factor in all the powre plants that make the juice.
A spork who is benevolent.
Wow... And I thought the 8MB L2 cache on UltraSPARC IIIs was a lot, not to mention the 16MB on some IBMs. Now we're talking about 3MB just in L1 with 32MB L2 cache. This beasty should have some impressive benchmark scores (yeah, I know, benchmarks aren't everything...)
A Beowulf cluster of these babies!
Didn't HP dump the PA-RISC line for the Intel/HP joint venture?
wow, that sounds a lot like IBM's release re: the Power4... except not as interesting
...a 1 GHZ processor may not sound like much, even in this dual-core configuration, but keep in mind that this is a RISC processor. None of that Super-mega-ultra-long-50-bazillion-stage pipeline crap that Intel uses to pump up their MHz rating. The article kind of sells this point a little bit short. The RISC architecture allows this processor to do roughly twice as much work in the same amount of time - or, to put it in a more concrete scenario: imagine a pair of 2GHz Pentium 4's running in SMP configuration.
Now that's FAST .
Intel jammed two 486 cores on a chip and called it a "Pentium."
Gross simplification is a viable debating tactic, BTW.
I am very small, utmostly microscopic.
It doesn't seem too practical to me. Most apps don't benefit greatly from SMP anyway. Add to that the potential heat problems caused by two cores on one chip...Why not just go with a more traditional SMP approach? At least you won't have to worry too much about heat then.
Job? I don't have time to get a job! Who will sit around and bitch about being broke and unemployed then?
Did that say 35MB of L1 + L2 cache? I may be rusty, but I think I remember reading in my Processor Design for Dummies book that increasing cache size actually can slow down processor performance after a certain amount. Could someone please clarify this?
today is spelling optional day.
This is true, and it sucks. I really don't care what you mods have to say to me. That's right... give it to me low. Knock me down! See if I won't just get back up again.
Cocksmokers
Why hasn't someone else done something like this? I would pay whatever it cost to get even an 8MB L1 & L2 Cache. Anyone want to make me one?
Um, this is my sig.
Just imagine a Beowulf cluBLAM!!! Thud.
What does this have to do with goat sex? Seriously... what has slashdot become... FIRST POST, BITCHES!
The most interesting parallel architecture I heard about at the MPF was Siroyan's OneDSP architecture. This is a clustered VLIW machine that can execute up to 64 instructions each cycle! See the EE times article and their MPF paper
Do you work for apple? Are you 13? And if yes to the second question, are you female and nubile?
In other news:
Anonymous Coward Dies at IQ 58
Anonymous coward died messily today in a pool of its own feces at the tender IQ of 58. Sources close to the fag say that he died in a gay penis festival gone horribly awry. Timothy Henchfaglet, another close asshumping homosexual friend of the AC, said he died while jumping from a 3 foot step ladder onto a 48 inch didlo held erect by the cum guzzling queer-extroardinaire, michael the censor.
A spork who is benevolent.
" One guy wrote that we should take all these Legos and build giant robots with which to attack Afghanastan. " -- Rob Malda, Founder of Slashdot, a "News for Nerds" website, in a NPR report on post WTC gen-X, 10/22/2001
I, for one, would like to take a moment to thank Rob for setting us "Nerds" back where we belong. Way to make us look like a bunch of childish tech-heads with no conception of the real world! As a troll, I think it's high time that you slashdotters got slapped down for the idiotic geeks that you are! (That was sarcasm, you nincompoop!)
The official HP presentation on the PA-8800 is0 01.pdf.
available as a PDF from http://www.cpus.hp.com/technical_references/mpf_2
Y.
It has the clock frequency of a 300bps modem's dsp. That's still pretty darn cool! *shrug*
Earlier steps in the multi-CPU direction included the 8-way DEC Alpha (killed in the merger with HP?) and a little National Semiconductor product for embedded systems with two very modest CPUs on a chip.
'nuff said
Doesn't Chuck Moore's 25x already do SMP-like things, at a few billion instructions per second? Last time I checked he was using a 20-word instruction set on a stack-based computer, which IMO counts as RISC.
This is hardly new, but HP's version probably uses some fancy new lithography, and wins when it comes to clock speed.
"Look at me, I invented the stove!" -- Ben Franklin
1GHz? Man my Pentium 4 2GHz beats all these supposedly "fast" chips. It's HALF the speed, for christ's sake. That is sooooooooo quarter 3 2000.
Acting stupid isn't much fun when there's someone around who knows better
PA-8800 lets you create two opposite predicates in one instruction, for example the predicate a=b.
// pLT & pNLT are 2 complementary preds
;; // add to b [then] // or sub from b [else]
;; // uses of b
;;
// speculatively sub from b (into temp) // and add to b
;; // uses of b [then] // uses of b (temp) [else] // move bTmp to b [else]
;;
This seems to indicate that there are no separate "do this if predicate is true" and "do this if predicate is false" instructions, so for opposite predication you would have to specify two different predicates.
The processor cannot know that these two predicates are related, so this would give you quite a problem.
As has been publicly disclosed, in general in PA-8800, an instruction reading any resource (such as a predicate) must be in a later instruction group (cycle) than the instruction writing that resource. As a special case, branches are allowed to use a predicate written by another instruction in the same instruction group (as shown in the IDF slides).
So, the straightforward (but slow) PA-8800 schedule for the earlier example:
if (a < 0)
b += a;
else
b -= a;
c += b;
d += b;
would be:
cmp.lt pLT, pNLT = a, 0
(pLT) add b = b, a
(pNLT) sub b = b, a
add c = c, b
add d = d, b
which takes 5 instructions in 3 cycles. (Note: In PA-8800 assembly, ";;" indicates the end of an instruction group, "=" separates the target operand(s) from the source(s), "//" begins a comment, and (pred) specifies the controlling predicate.)
An alternate (faster) schedule in PA-8800 is as follows:
sub bTmp = b, a
add b = b, a
cmp.lt pLT, pNLT = a, 0
(pLT) add c = c, b
(pLT) add d = d, b
(pNLT) add c = c, bTmp
(pNLT) add d = d, bTmp
(pNLT) mov b = bTmp
This takes 8 instructions in 2 cycles and one extra register. The final move of bTmp to b can be eliminated if b isn't live out at that point.
a beowulf cluster of these!!!
Following a relatively simple strategy, the PA-8800 processor combines two PA-8700 cores on a single chip to enable symmetric multiprocessing (SMP) on a single processor.
It doesn't enable SMP "on a single processor". It provides two processors on a single die. There is a distinction.
AMD, please steal an idea...
The big rumor regarding the third version of Hammer is that it'll be a dual-CPU module. Any guesses as to Hammer's clock speed on release?
299,792,458 m/s...not just a good idea, its the law!
Galileo: "The Earth revolves around the Sun!"
Score: -1 100% Flamebait
He's not even smart enough to be a truly funny troll, do the people who vote for this have so little sophistication that they consider this well disguised sarcasm?
...is that you actually can go out and buy a new mainframe using Power4. Nothing wrong with looking ahead, but if you remember, AMD said that the Athlon should have been made in an "Athlon Ultra" version spotting 8MB L2 cache. .... I still stick to the motto: "I'll belive it when I can buy it"
Thomas S. Iversen
Slow Down Cowboy!
Slashdot requires you to wait 2 minutes between each successful posting of a comment to allow everyone a fair chance at posting a comment.
It's been 1 minute since you last successfully posted a comment
If you this error seems to be incorrect, please provide the following in your report to SourceForge:
Browser type
User ID/Nickname or AC
What steps caused this error
Whether or not you know your ISP to be using a proxy or some sort of service that gives you an IP that others are using simultaneously.
How many posts to this form you successfully submitted during the day
* Please choose 'formkeys' for the category!
Thank you.
This is getting rediculous. Symmetric Multiprocessing on a single chip? That's impossible, unless you're just screwing around with semantics. I mean, think about it. You need TWO chips, at least, in order to engage in SMP, and anyone who says otherwise is putting out meaningless hype. SMP on one chip, or just vaporware?
Slashdot: Open Source, Closed Minds.
IBM unveiled its SMP-on-a-chip solution, the Power4, almost 2 weeks ago. 64 bit PowerPC. And only 2 OS'es run it.
One of them is Linux.
With no such niceties as virtual memory, large address spaces, fast additions etc etc there is not a lot of software which would run well on them.
That seems practicle enough to me.
You know when AMD 1st brought out the Athlon they were spose to be compatible with Alpha 21264 boards too.
AMD even made a couple of engineering samples in slot B packages for testing but that's as far as it it.
If someone could hack a slot A/Slot B adaptor then they could hypothetically do the same thing. They might have to hack a bios update to though.
compared to say a 2.2GHz P4 or an Athlon XP 1800+. Inquiring minds want to know.
http://www-1.ibm.com/servers/eserver/pseries/hardw are/whitepapers/p690_config.html#arch
You asked for it; here's some yummy troll food fer ya:
For the last time, clock speed doesn't compare between architectures. This is a RISC processor with a short pipeline, the pentium 4 you drool about (but don't really have) is a CISC with an extra-long needlessly-clock-boosting pipeline. Of course, you'd know this if you read the article.
If you read the specs, you'd see: "Speaking of performance, each PA-8700 RISC core delivers a SPEC performance of around 550 (for both Int and FP) at 750 MHz and the dual core PA-8800 running at 1 GHz will start out at a minimum of 900 / 1000 SPEC2000 int/fp scores, according to very conservative estimates."
A 2GHz P4 is in the 650 SPECINT/670 SPEC2000FP range, so basically each PA processor is about 10% faster.
Was it trolliscious? satisfied? oh, furry troll, why do I even bother feeding you?
Benevolent_spork is an asshole
I thought HP had committed itself to ditching the PA-RISC and moving to Itanic, err, Itanium.
Who says the OS needs to know there are two or 4 or 6 CPUs in a system? Threaded programming works best only on Clusters. SMP scales poorly on Intel Pentium IV's because maybe that isn't a reason to use two Pentium IV CPUs...Each CPU provides more resource management. Someone should've written some supporting text on "the SMP myth" which includes why SMP is not a good and efficient solution to increase calculation performance on a given workstation. Pentium Pro CPUs provide BUS Mastering in SMP mode and the secondary CPU provides upto %30 of extra system performance. Of Course, for every extra CPU in a Pentium Pro system, it adds 30% - (NumCPUs*5) performance due to scheduling in the software. It is always the Primary CPU in a multi-cpu system that must schedule events for the Secondary, Tertiary, and Quaternary cpu in software. On a Pentium Pro CPU, that requires somewhere around 10% of that Primary CPU's processing power to schedule software for that second CPU; it's around 15%/20% for the 3rd, etc. That's why you see Dual Pentium IV Workstations not performing upto par with another workstation with only one Pentium IV. It's SMP doesn't scale well. The only value of multiple-CPUs is for BUS Mastering and providing more system resource management. Intel abandoned BUS Mastering in SMP systems after the Pentium Pro. So, for the extra cost of using a second Pentium IV CPU, it isn't worth it. Just get a nice Pentium Pro Server on eBay and you will get your money's worth for those extra CPUs; which provide BUS Mastering. Pentium Pro has always been a nice CPU. You can scramble an egg in 5 minutes, versus 15 for the Pentium IV.
But I'm sure you already Gnu that.
Reading through the article, this design seems to share a lot in common with Sun's MAJC architecture. Both allow for multiple cores on a single chip. Anyone else notice the similarities?
I guess the biggest difference would be that the HP chip is actually going to be built, while the MAJC chip seems to still just be a design.
It is interesting that a number of designs lately seem to be looking to the integration of multiple CPU cores on a single chip to increase performance in server applications.
zor_prime
"We all do no end of feeling, and we mistake it for thinking." -Mark Twain
sorry, could resists, there was no single grep of beowolf.
EEtimes Story
Everyone in the high-performance CPU market (except itanic) is doing either this or multiple concurrent thread contexts to speed overall system computational throughput.
from The Daily Telegraph
You have to go the site and search on "Gore":
Did Al Gore win after all? US newspapers
would rather not say
By Charles Laurence in New York
(Filed: 21/10/2001)
Did Al Gore win after all? US newspapers
would rather not say
By Charles Laurence in New York
(Filed: 21/10/2001)
THE most detailed analysis yet of the contested Florida
votes from last year's presidential election - with the
potential to question President Bush's legitimacy - is being
withheld by the news organisations that commissioned it.
Results of the inspection of more than 170,000 votes
rejected as unreadable in the "hanging chad" chaos of last
November's vote count were ready at the end of August.
The study was commissioned early this year by a
consortium including the Wall Street Journal, the
Washington Post and the New York Times, the nation's
most powerful newspapers, and the broadcaster CNN.
It was regarded as a means of supplying final answers to
the nagging questions over President Bush's razor-thin
victory margin. The cost was more than ú700,000.
Now, however, spokesmen for the consortium say that
they decided to "postpone" the story of the analysis by
the National Opinion Research Centre (NORC) at the
University of Chicago for lack of resources and lack of
interest in the face of the enormous story of the
September 11 attacks and the subsequent "war on
terrorism".
Newspapers were saying last week that the final phase of
the analysis, the actual counting of the 170,000 votes,
had been "postponed" but would become known at an
appropriate time.
America's liberal newspaper establishment originally set
up the commission in the belief that it would discover that
Al Gore was the winner of the Florida count.
Their hope for a Gore victory appears to have been
sacrificed on the altar of patriotism and a perception that
America needs to be led into war by a strong president.
"Our belief is that the priorities of the country have
changed, and our priorities have changed," said Steven
Goldstein, the vice-president of corporate communications
at Dow Jones and Co, the owners of the Wall Street
Journal.
Catherine Mathis, a spokesman for the New York Times,
said: "The consortium agreed that because of the war,
because of our lack of resources, we were postponing the
vote-count investigation. But this is not final. The intention
is to go forward."
However David Podvin, an investigative journalist who
runs an independent web page, Make Them Accountable,
said he had been tipped off that the consortium was
covering up the results.
He refused to disclose his source other than to describe
him as a former media executive whom he knew "as an
accurate conduit of information" and who claimed that the
consortium "is deliberately hiding the results of its recount
because Gore was the indisputable winner".
He also claims that a New York Times journalist who was
involved in the recount project had told "a former
companion" that the Gore victory margin was big enough
to create "major trouble for the Bush presidency if this
ever gets out".
Man oh man. Natalie portman was on that concert for NY on VH1 with some firefighter introducing Elton John. She probably ran backstage and sucked that guy off in the dressing room as soon as that faggot Sir Elton took the stage. Doesn't that just burn you? Hehe get it?
Now, before you start flaming little old me, remember that it's not me you're pissed off at, you're just "projecting". You're really pissed out at those greedy pussy hoarders at the NY fire department like Steve Buscemi. Fuckin' Bastards!
When you consider that the PA-RISC team has been transferred to that "evil" company Intel.
Conformity is the jailer of freedom and enemy of growth. -JFK
I *thought* the cache density looked a bit high for ordinary SRAM - the article mentions something they're calling "single-transistor SRAM".
Does anyone know how on earth they're managing this? Or is this just some low-leakage variant of DRAM with added marketing spin?
Yes, that's right. Secretary of Defense Rumsfeld helped me out by licking some of those pesky Anthrax spores out of my rectum so I didn't get any cutaneous hemhorroids.
Then afterwords we sat around and watched TV, he was really pissed off that someone leaked to the press about the ground invasion over the weekend. Then he licked my asshole again just to make sure.
AC RULEZ!!!!!
...a Furbeowulf cluster of these things!
TO BUY A NEW CAR WOULD MAKE YOU SEXUALLY ATTRACTIVE.
In news today, a small chunk of Austin TX vaporized when an engineer tripped over a Thermaltake vortex containment field, causing an experimental single-chip SMP AMD processor to go critical in its 1024 pin socket...
Less crack. Go study modern OSs and stay away from SunOS and old Slackware "SMP" kernels.
Imagine a Beowolf Cluster of THESE!!!
AIUI, there are two competing methods of scaling CPUs now - Symmetric Multi-threading (SMT), and Chip-level Multi Processing (CMP). HP is going CMP because SMT is too difficult in terms of writing the compilers. Both Compaq (with the Alpha CPU) and IBM (PowerX) are going SMT. In fact, the biggest thing Intel got out of it's purchase of Alpha technology, other than the engineers themselves, is the Alpha SMT work.
I heard it will start at 3 GHz?
Sorry, while it may be true for Pentium series, it is not true for SMP in general.
1) It is actually possible to get better than linear improvement under certain conditions (like if something is already in a shared cache because it was fetched by the other cpu).
2) It is possible to have each cpu schedule itself based on contents of ram.
Yes, there is overhead of having two cpus, but it is very variable dependent on OS and workload.
it looks like artificial breathing attempts when resources don't allow for better chip designs anymore. 3dfx did it with voodoo2, and it's such a cheap solution that I'm surprised HP even bothers to this show'n'tell. "look, we didn't have enough money to do 1 good and new solution so we slapped together 2 old ones!! all hail the new ultrafast processor!"
Sounds like more kernel work. I'm won't be happy until I can mount file systems in my cache. Think about it. My 286 only had a 40 MB hard drive. Hello, solid state!
WARNING: there is a trojan on your
well yes HP PA-RISC is nice but really its catch up
S 0002
x 2/index.asp
MIPS 1GHz Dual core on same die for a while
and that its 64bit
check
http://www.electronicstimes.com/story/OEG20010612
or
http://www.pmc-sierra.com/products/details/rm9000
oh yeah did I mention that PA-RISC is a MIPS decendant
but shhh they made so many changes they fscked the pipeline(they might have got it working again but I dont know any more)
may the SPECINT and SPECFP fight it out
regards
john jones
p.s. I wonder what the HP layout guys think of Intel chips (-;
I'll bet you could fry eggs on it pretty well with that much silicon cranking out heat in one chip.
As has been pointed out above, this is just HP playing catchup to IBM. IBM has taken a leap ahead of their competitors and now they have to play catchup.
HP's announcement is nothing compared to what IBM has in development.
HP workstations certainly seem to be very solid and nifty and they have a lot of potential for linux boxes. Assembly programmers will appreciate all of the registers that are available.
Clickety Click
With an agenda based on scale, you don't get there by introducing a new CPU in a dead line. HP's SuperDome line is getting creamed by Sun and IBM - HP cannot afford to go back to the front lines with another enterprise offering unless SuperDome pans out a hell of a lot more than it is currently.
HP has always had impressive technology but still loses market share . HP-UX has dwindling market share and software support. The merger with Compaq will derail any plans for further proprietary architectures.
If you want to look at the gee-whiz value here, fine, but don't expect to see this in a product.
HP is going CMP because SMT is too difficult in terms of writing the compilers.
Actually, I think they're doing it because it means they don't have to design a new processor core.
As far as each thread being executed in an SMT chip is concerned, they're running on a single-thread processor. The same scheduling optimizations that benefit code in a single-thread system will benefit the code running SMT with other threads. SMT actually makes this job a bit easier, by reducing the effective latency of instructions (if neither thread's stalled, each thread will execute every other clock, making a 10-cycle-latency instruction look like a 5-cycle-latency instruction, which in turn makes each thread less _likely_ to stall; nice feedback loop here).
The only extra complexity would be in the operating system's scheduling and context switching routines, and that wouldn't be much more complicated than on a multiprocessor system.
Tricky. This time Spootnik copied and pasted from not one but two Usenet articles, neither of which has anything to do with PA-8800.
Not just article hijacking, but blatantly false article hijacking.