Panic in Multicore Land
MOBE2001 writes "There is widespread disagreement among experts on how best to design and program multicore processors, according to the EE Times. Some, like senior AMD fellow, Chuck Moore, believe that the industry should move to a new model based on a multiplicity of cores optimized for various tasks. Others disagree on the ground that heterogeneous processors would be too hard to program. The only emerging consensus seems to be that multicore computing is facing a major crisis. In a recent EE Times article titled 'Multicore puts screws to parallel-programming models', AMD's Chuck Moore is reported to have said that 'the industry is in a little bit of a panic about how to program multicore processors, especially heterogeneous ones.'"
AMD's Chuck Moore presumably has a lot of self interest in pushing heterogeneous cores. They are combining ATI+AMD cores on a single die and selling the benefits in a range of environments including scientific computing etc.
So take it all with a grain of salt
--Q
It is portable, scalable, standardized and supports many languages.
nemesis. Home of an experimental fe code.
That's why it's so important that languages begin to adopt threading primitives and immutable data structures. Java does a good job. Newer languages, like Clojure are built from the ground up with concurrency in mind.
This article is referring to AMD's Charles R. "Chuck" Moore, who worked on the POWER4 and PowerPC 601, not the language and chip designer Charles H. "Chuck" Moore who invented Forth, ColorForth, et al. and was interviewed on slashdot.
o/~ Join us now and share the software
...functional programming languages? Or flow programming?
Yeah, but if you extrapolate to where things are going, we're going to have CPUs with dozens if not hundreds of cores on them. (See Intel's 80 core technology demo as an example of where their research is going). Can you write or use general purpose software that takes advantage of that many cores? Right now I expect there is a bit of panic because it's relatively easy to build these behemoths, but not so easy to use them efficiently. Outside of some specialized disciplines like computational science and finance (that have already been taking advantage of parallel computing for years), there won't be a big demand for uber-multicore CPUs if the programming models don't drastically improve. And those innovations need to happen now to be ready in time for CPUs of 5 years from now. Since no real breakthroughs have come however, the chip companies are smart to be rethinking their strategies.
I was speaking of the last 5-7 years.
I have an old AMD-XP-something running windows XP at home, it is at 5 years old. I have a Core2Duo machine is sometimes use. I dont see much difference in day-to-day usage. Even if there is one, i would attribute most of that to faster drives and i/o.
Unless you're speaking of AMD SMP systems, the Intel systems up until recently share the FSB among all the CPUs. So from the Intel side of things, SMP vs multi-core is nearly the same (save for L2 cache sharing and whatnot). The only notable exception, on the Intel side, that I have noticed is that the recent Xeon systems (within like the last two years) seem to be using two "northbridges". For example, my "quad-core" Mac Pro tower that I bought in April of 2007. It has two dual-core Xeons and the motherboard has two northbridges (though Intel doesn't refer to their chipsets that way last I checked. They like to talk about "hubs".).
And frankly, it helps a lot to write code that is microprocessor-friendly to begin with:
If the node-code is bad enough, it can make any parallelism look good to the user. But writing good node-code is hard;-( As a reviewer, I have recommended rejection for a parallel-processing paper that claimed 80% parallel efficiency on 16 processors for the author's air-quality model. But I knew of a well-coded equivalent model that outperformed the paper's 16-processor model-result on a single processor -- and still got 75% efficiency on 16 processors (better than 10x the paper-author's timing).
fwiw.
"My opinions are my own, and I've got *lots* of them!"
well, yes, i believe the thingie taking care of that is even called "threadweaver" :)
yep, here it is : http://api.kde.org/4.0-api/kdelibs-apidocs/threadweaver/html/index.html
Rich
The Cell has one PowerPC core ("PPU"), which is a general purpose PowerPC processor. Nothing exotic at all about programming it. But then you have 6 (for the Playstation 3) or 8 (other computers) "SPE" cores that you can program. Transferring data to/from them is a pain, they have small working memories (256k each), and you can't use all C++ features on them (no C++ exceptions, thus can't use most of the STL). They also have poor speed for double-precision floats.
I find the most useful parts of the STL, don't even use exceptions. It just has a lot of undefined behaviors. There is only one call, at, for vectors and deques that will throw a exception directly. The STL is mostly concerned with being exception safe. Do you have a reference for C++ programming the cell processor concerning the exceptions?
Ah, sorry: I didn't mean to imply that it is unnecessary for the applications of tomorrow. Where I work we also do those massive simulations mentioned by another poster, and we welcome _any_ number of cores (one thing we simulated was the ATV, mentioned a few days ago on slashdot. The simulator runs on two machines with a total of ten cores between them, and when we started the work, we were afraid our state of the art 1GHz CPU's (single core, at that time) might not be fast enough. Hahaha, it seems so quaint now! ;-) ).
What I did mean to imply is that something fundamental needs to change in the rest of the system as well before this becomes important, though, since right now most of the time I'm not waiting for the CPU, I'm waiting for the hard disk. That guy waiting for the address bar in IE? I'd bet a dollar that he is really waiting for his harddisk. Possibly IE is scanning some history file each time he types a character, and there might be some paging going on, and he might have some severe fragmentation issues, and some torrents open, and all those would combine to making something that should be lightning fast, unbearably slow.
My dualcore, 2.4GHz machine with a staggering 3GB of RAM, occasionally feels slower than my ancient Amiga 500 (7.14MHz, 512KB of RAM, and no hard disk - and no paging file!). As soon as your application swaps out (and that is an activity Windows does as a hobby, just to spite you), you will lose significant time when you want it to come back to life.
And as long as systems remain mostly limited by the harddisk, rather than the CPU, adding threads will not help. Even those massively parallel monster applications of tomorrow will just be spending their time waiting to be paged in.
Clock frequency is not an indicative of CPU performance. For example, the Core 2 chips, despite generally operating at a lower frequency than the Pentium 4's outperform them significantly.
But massive instruction per clock improvements do not happen very often in the x86 chip industry. In fact, I can count all the major improvements for the last 15 years on one hand:
1993: Intel Pentium Pro (approximately 2 INT, 2 FP operations per clock, best case) introduces real time instruction rescheduling to the x86 world. The design can decode 3 instructions per clock. Yes, I am disregarding the Pentum, because you got NO performance improvement without an optimizing compiler.
1997: MMX increases maximum number of integer instructions to 8 per cycle. But, because of the 64-bit data size, you really see little improvement unless using 16-bit or 8-bit types.
1998/1999: 3DNOW! and SSE double the potential throughput for 32-bit floating point, again not all that impressive.
2001: the Pentium 4 actually REDUCES performance per clock, with a single instruction decoder, and heavy reliance on trace cache to make up for this. SSE2 gives the potential to increase FP thoroughput to 4 instructions per clock, per SSE unit, but a half-assed implementation by both Intel and AMD means nothing changes.
2006/2007: the addition of more decode units on the Core 2, packed SSE instructions for both the Core 2 and the Phenom, and TWO 128-bit SIMD units means we see the first improvements in instructions per clock in years.
Man is the animal that laughs.
And occasionally whores for Karma.
Read http://view.eecs.berkeley.edu/wiki/The_Landscape_of_Parallel_Computing_Research:_A_View_From_Berkeley (specifically the white paper linked from it)
1) Is there a programming language that tries to make programming for multiple cores easier?
2) Is programming for parallel cores the same as parallel programming?
3) Is anybody aware of anything in this direction on the C++ front that does not rely on OS APIs?
1) Yes.
2) Maybe.
3) Yes.
Game! - Where the stick is mightier than the sword!