SW Weenies: Ready for CMT?
tbray writes "The hardware guys are getting ready to toss this big hairy package over the wall: CMT (Chip Multi Threading) and TLP (Thread Level Parallelism). Think about a chip that isn't that fast but runs 32 threads in hardware. This year, more threads next year. How do you make your code run fast? Anyhow, I was just at a high-level Sun meeting about this stuff, and we don't know the answers, but I pulled together some of the questions."
Whatever the clock rate, multiply it by eight and it's pretty obvious that this puppy is going to be able to pump through a whole lot of instructions in aggregate.
Ho hum.
On a good day, with a following wind, Niagara might be able to do 8 integer instructions per second, provided it has 8 independent threads not blocking on I/O to execute.
It only has one floating-point execution unit attached to one of those 8 cores, so if you have a thread that needs to do some FP, it has to make its way over to that core and then has to be scheduled to be executed, and then it can only do one floating-point instruction.
Superb.
The thing is, all of the other CPU vendors with have super-scalar, out-of-order 2- and 4- core 64- bit processors running at over twice to three times the clock frequency.
You do the mathematics.
Stick Men
Some people have predicted this move for quite some time. I remember hearing about it back in the late 80's early 90's and I'm sure it goes way back before then. The analogy was to Steam Engines and why they lost out over Diesels. You can only make a Steam engine so big but you cannot connect them together to get more power. With diesels you can hook many of them together for more power. Chips are finally getting to the same point -- It is more cost efficient to chain them together than to create a monsterous one. I'm surprised it has take this long to get to this point.
In games the AI of non-player-characters (-objects) can profit a lot from threading.
But for common apps
It sounds like you want a cell.
and take some advance architecture courses.
The BEST a single core multi-thread design can hope for is the performance of a single core single thread design...
I'm sorry but that turns out not to be the case.
When you have a system that is running lots of different threads simultaneously the amount of time that it takes to do a context switch from one thread to another becomes an issue. In the real world, threads often do things like I/O which cause them to block or they wait on a lock. If you can do a fast context switch you get back the time that you would have wasted saving registers off to RAM and pulling back another set. Faster thread switching means that your multi-thread single core now runs its total load (all of the threads) faster than a single core single thread design. Also, things like microkernels become a lot more feasible (microkernels are notorious for being slow because context switches are slow).
When you have looked beyond your desktop machine maybe you'll have earned the right to sneer at your professors. I don't think you're there yet.
From the parent post:
"Current programming languages are insufficiently descriptive to permit compilers to generate usefully multi-threaded code."
The portion of importance is:
"insufficiently descriptive"
In C, C++, and Java, you must program with concurrency in mind to obtain any benefit from multiple threads of execution. In a functional programming language, the restrictions placed on the behavior of functions often imply concurrency without the programmer necessarily intending that as the result. If you write a C program without concurrency in mind and want to adapt your solution later to take advantage of multiple threads, you may need to code a completely different solution and also locate a compiler that knows how to take advantage of concurrency. In a functional language, you may only need to get an updated version of your compiler/interpreter. This is why C, C++, and Java are in the "insufficiently descriptive" category and functional programming languages are not.