One Processor, 128 32-bit Cores
Max Entropy writes: "EETimes reports that a German company named Pact GmbH has developed a chip containing 128 microprocessor cores as part of the company's 'Extreme Processor Platform' (XPP). 'Each of the XPP's 128 processor cores sports its own 32-bit fixed-point multiplier, yielding a theoretical output of 12.8 billion multiply-accumulate operations per second at an expected clock frequency of 100 MHz. Pact claims the architecture will scale to produce devices capable of more than 400 giga operations/s in 2002 and into the peta-ops range within a decade.' The transistor budget for this behemoth is 30M, fabricated on a 0.21-micron process." Of course, each one of those processor nodes is completely proprietary and requires some peculiar programming.
'Each of the XPP's 128 processor cores sports its own 32-bit fixed-point multiplier, yielding a theoretical output of 12.8 billion multiply-accumulate operations per second at an expected clock frequency of 100 MHz. Pact claims the architecture will scale to produce devices capable of more than 400 giga operations/s in 2002 and into the peta-ops range within a decade.'
Mother of God! The first time some fool runs Quake IV Slaughter on a Beowulf cluster of these puppies (you knew that had to get in here somewhere), it'll instantly self-evolve into Quake X^100 and wipe out the human race!
A truly excellent pizza parlor is a delight unto the heavens. Treasure the sauce and the toppings!
It's not Rob's job to write new pieces of slash right now. Slash is open source, and as Rob stated over and over again in #forum last week, if someone adds a feature to the code, he will consider it.
I suspect that if someone added a method of moderating articles, and defining user thresholds, it would make its way from slash to slashdot.org.
So, in short, if you want something done, do it yourself.
As already mentioned, as Moore's law runs out of steam, something else will be needed. Besides, it really doesn't matter how long Moore's law still holds; for any extremely powerful single processor it still holds true that for certain problems X of them will be X times more powerful still. While the theoretical processing power of some as-yet-unimplemented molecuar or quantum computing devices could be quite high, it will still be very finite and fall short of the requirements of some types of processing--such as nuclear reaction simulations, for that next generation of smart atomic weapons that can single you out and kill you based on certain profiles.
Besides, this 128 processor devices isn't really any different from most other multi-processing systems. The holy grail is still the development of smart compilers and algorithms that can allow even dumb programmers to write effective multi-processing code.
You should care. The x86 has had so much bolted onto it and the clock pushed up so high it's not that much of a surprise the 1.13GHz PIII ran out of steam recently. I program VLIW machines and there's some mileage left there, but I can see the writing on the wall; we need something else.
We've got at least a decade of Moore's Law left and we have to find some way of really using huge chip complexities. Putting many processors on a die is simple enough for the hardware guys (not to underestimate what they do at all). Just bloating a processor to make use of a whole chip is do-able, but what do you suggest other than tons of cache?
Figuring out how to use parallel processors is a big issue for the future IMHO. Maybe this one will bomb, but we should support their innovation.
Also, programming weird architectures is fun and teaches you stuff - as an example I went to a lecture on optimising code ar Siggraph, people liked it, the content was good, but some of the stuff was already second nature to us VLIW programmers.
What do you call a "proprietary" processor ?
As opposed to a "standard" one ? How do you define a "proprietary" processor ? This statement is simply ridiculous, and has probably been expressed by someone who doesn't know what a processor is.
-- javaDragon is an instance of JavaDragon.
Will Taco flame me on IRC for this? Damn, I hope so!
To this day, I cannot understand why the powers that be feel that it is beneath them to participate in the discussion threads.
I just don't get it. Oh well, another item for the list.
A now defunct company called Masspar of Santa Clara California developed a massively parallel computer based on putting dozens of CPUs on a single chip. They were trying to beat Thinking Machines, a defense department funded massively parallel company, that was looking good at the time. Masspar had a nice mchine and several dozen customers. However, as with most of the 1980s and early 90s "mini-super" business, the people who made custom CPUs and ASICs could not keep up with the commodity CPU super-clusters (ironically pioneered by Thinking Machines). At best a custom company could engineer a new generation every three years, while Intel (Sun, IBM and MIPs) come out with a new chip on an annual basis or faster. These mini-supers were often obsolete by the time they shipped.
You don't need anything so fancy.
Just analyze the responses to the editor's articles and rank them by the cumulative karma gains of all respondents. Editors whose articles generate lots of interesting, insightful and funny quotes score high, editors whose articles generate lots of flamebait score low.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
The ever-sexy Connection Machine used bit serial processors. At least in the first generation, get-me-a-thesis version. Later versions that had to be applicable to more than fluid dynamics used commodity cpus, I believe.
Can anyone summarize why they went belly up? Too hard to program? What was their value-added, after they moved to Centinode commodity CPU systems. Doesn't everyone and their brother have huge systems? Or was CM special in that it had a good architecture for Shared memory busses (based on the hypercube, at least in mk 1)?
You moderate an editor below a negative number, they are booted off the site. :)
- I don't care if they globalize against free speech. All my best free thoughts are done in my head.
the hell there won't be beowulf references!!
Wow! Could you imagine a beowulf cluster of these?!?
Lets see. There is a mention of this that is still on the front page. Uhm... you guys really need to get together and talk, or at least read Slashdot before you post new articles.
Seriously, the editors need to start reading Slashdot more often. I know timothy posted this at 5:23 AM and he probably wasn't thinking clearly, but the quality of the articles is becoming a joke.
I have lost count of the number of duplicate articles that appeared on the same day. Or is this a side effect of Slashdot getting hacked (rouge processes non-deterministically posting articles)
it's in my head
I think we should have a new moderation option - and that it should apply to stories:
Score -1, BEOWULF BAIT
This is specialized hardware. NASA used to have similar beasts with 2^16 16 bit ALUs for satellite image processing. Not on one chip of course. On multiple.
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/
Each ALU runs at 100Mhz. Why so slow ? It makes the chip much less impressive than it seems. I think that an Athalon can theoretically perform 6 integer multiply-accumulates per clock cycle. A 1Ghz Athalon then can theoreticaly perform 6 giga multiply-accumulates per second. The XPU128 theoretically perform 12Giga multiply-accumulates per second. Twice as fast. Big deal. So why is the XPU128 clock rate so slow ????????? Athalon info: http://www.azillionmonkeys.com/qed/cpujihad.shtml
Support the organizations that make up the Global Internet Liberty Campaign http://www.gilc.org/
If I were a Troll Brigadier, I might seriously think about posting stories straight into comments, and then having them "voted on" by moderators. Hence (once again!) you've hijacked Slashdot.
Will Taco flame me on IRC for this? Damn, I hope so! Think of the Dark Side Geek Aura!
be well;
JC.
--
"Don't declare a revolution unless you are prepared to be guillotined." - Anon.
Classical Liberalism: All your base are belong to you.
He eventually went to work around Beaverton Oregon for one of those silicon foundries, but I think he got more interested in parallel, hardware regular expression evaluators.
Seastead this.
I don't understand the people who moderate posts like this down. Isn't Slashdot about news for nerds? And isn't it nerdy to salivate over high-end computing hardware? And if we built a Beowulf Cluster out of these bad boys with Linux as the OS, wouldn't that be extremely nerdy? Hell, wouldn't it be cool?!
I'm imagining a Beowulf Cluster of these. And I'm imagining running dozens of instances of any distributed.net client. Oh hellz yeah =)
--
Peace,
Lord Omlette
ICQ# 77863057
[o]_O
Maybe I should post this as a separate thread, but your post made me think about it, so here goes.
An old saying is that if you find a way to make an O(n^2) operation into an O(n lg n) one, the world will beat a path to your door finding ways to use it. It was originally said about FFT, which is used in any number of situations where you wouldn't expect it.
Anyways, I've been wondering about cellular simulations, like fluid dynamics or nuclear modelling.
How feasable would it be to make a special purpose SIMD chip that takes a simplish formula and applies it to each a large number of cells. The driving insight behind this proposition is that a cell has fairly predicable communication needs, so that you can hard wire efficient communications, and also that a n-way multicell (as would be expressed as a unit of silicon in my proposition) has comm needs that rise as n, but contains n^2 functional units. So the bigger you can make it, the cheaper communication becomes.
So this would be a very specialised peice of computing machinery. My question to you lot is how applicable the "beating a path to your door" would be. Can cellular computing be applied to much appart from the game of life?
It's somewhat strange. Whenever a new piece of good hardware is published, everybody seems to think: "When can I use it for playing quake?".
Hardly anyone thinks of the possibility that there are potential customers that surely will buy this processor, because they simply need it and can't get around this need with other means?
Just think of wheather forecast, or scientific simulations.
Michael
Michael Bergbauer (michael.bergbauer@gmx.net)
Timothy doesn't read Slashdot
Posted by timothy on 04:20 AM April 1st, 2001
from the at-least-read-the-front-page dept.
Frac writes: "It seems rather obvious that timothy doesn't read Slashdot, considering that the an article still on the main page mentions the exact same news." Interesting stuff. And in other news, there are now proton polymer batteries available, results from ICANN elections, and a really interesting article at ZDNET on reverse-engineering.
> This would refute EVERYTHING that you said.
It's interesting that in the three lengthy paragraphs preceding your link you don't offer a single valid explanation for why even single-threaded apps would benefit from SMP, only strong assertions that such would be the case. In fact, the three paragraphs smack suspiciously of marketing-speak, conveniently sidestepping anything resembing an argument.
Microkernels are great, not many people would dispute that, but what the hell do they have to do with SMP per se? In order for an efficient MP architecture (be it symmetrical, distributed etc) to be effective, applications have to be coded to take advantage of it. This includes multi-threading it in meaningful ways. Just putting QNX and Neutrino on your system won't make your linear app any more efficient than running it under DOS--except for the GUI being more responsive, possibly. Just because system resources are multi-threaded doesn't mean squat--your app is still sitting around waiting for disk-bound operations to complete, no matter how those are implemented.
Unless you know how to effectively multi-thread your app, neither QNX nor BeOS will do much to improve your code. You can make any assertions and include any links to the opposite you want, but that's a pretty fundamental tenet of current OSs.
I was kind of looking forward to more and easier to find information on the big chip here. Instead, I find all of these complaints which are even less interesting than other chip news.
Friends don't help friends install M$ junk.
Compared to the power consumed by many boxes, a cluster of these things will be a very useful tool. I can imagine 10 of these in a single box doing great work. It will be very nice when they scale up the clock speed.
Friends don't help friends install M$ junk.
This souns a lot more impressive than it really is. Getting all these processors to actually do something useful isn't easy. The posted maximum number of operations/second is for pure calculations only. A lot of real life applications, even if they are scalable to a large number of processors still need to have access to large data structures.
Having to use a special language is going to shy people away as well. Anybody remember transputers ? Occam ?
These processors, made by small start-ups also have lots of real-life issues that need to be solved. For example, you also need a good motherboard, with appropriate chip sets to access peripherals (network/hard disks) as well as a high-bandwidth memory bus. Who's going to make those ? Also, what's the quality of the development tools, like the compiler? Even if all those obstacles are overcome, users still need to spend time and money to get acquainted with this platform, and they are risking that the manufacturer(s) will be out of business a couple of years later.
For applications requiring more than just pure calculations, it's not going to be easy to offer a solution that offers users more value for money than a bunch of networked SMP machines based on off-the-shelf hardware, and using development tools that they are already used to, and can be assumed to be bug-free.
Before you go off on tirades, maybe analyse what you're about to say first. First, there's nothing magical about QNX or BeOS. Sure, BeOS apps may be automatically multithreaded, but only in a relatively superficial way (by putting the message loop into a separate thread if I'm not mistaken). BeOS certainly doesn't take your linear code and somehow magically extract multiple threads of execuion--you still have to do the leg work. Same with QNX. Same with WinNT/2K, if you multi-thread you apps, they will spread quite nicely across CPUs (though HOW nicely is a matter of debate for the religious). Don't get me wrong, I think what BeOS is doing is perfectly fine and laudable, but it's not exactly what you're implying.
The fact is, in you model the unit of execution is the thread. What you yourself don't know how to de-serialise and pull into separate threads (and properly synchronise), the compiler or OS certainly won't do for you. So even if your app is multithreaded, if your threads are big fat chains of serial code, the app won't benefit any from multiple CPUs. The holy grail of MP is a compiler that could, for a trivial example, look at your loop and be able to unroll it into x smaller loops working on subunits of the data.
One of the big promises of MP is to avoid the end of Moore's law through parallel computing, IN GENERAL PURPOSE COMPUTERS, not just arcane research machines. We already know how to tickle esoteric MP hardware today into doing our bidding, but it's no trivial task and takes a lot of skill. If MP is to give us a mainstream migration path from single processing, it can't expect more from programmers than they can give today. In other words, MP machines will have to deliver even with mediocre programmers, because they form the bulk of the work force. You can't stipulate as a condition for effective MP an overall higher quality work force, because it ain't happening.
Note that this is NOT necessarily a general-purpose system. It seems currently intended more for high-volume data manipulation. On the other hand, I think that it would do a peachy job on many image rendering problems (for ray tracing, you could assign one processor to a group of rays). It would also be great for multi-threaded applications (Each process gets a handfull of processors). For Seti@Home, I think that it would kick butt. On the other hand, it would suck on a single task that was indivisibly serial (only a handfull of the 128+ processors doing anything).
Note that some processes that seem inherently serial (summing data from a single stream) are actually quite parallelizable (N processors each gets 1/Nth of the stream and pass their intermediate results, on demand, to a supervisor processor that totals the intermediate values.)
`ø,,ø`ø,,ø!
Free Software: Like love, it grows best when given away.