Intel Says to Prepare For "Thousands of Cores"

← Back to Stories (view on slashdot.org)

Intel Says to Prepare For "Thousands of Cores"

Posted by ScuttleMonkey on Wednesday July 2, 2008 @08:42AM from the viva-la-coding-revolucion dept.

Impy the Impiuos Imp writes to tell us that in a recent statement Intel has revealed their plans for the future and it goes well beyond the traditional processor model. Suggesting developers start thinking about tens, hundreds, or even thousand or cores, it seems Intel is pushing for a massive evolution in the way processing is handled. "Now, however, Intel is increasingly 'discussing how to scale performance to core counts that we aren't yet shipping...Dozens, hundreds, and even thousands of cores are not unusual design points around which the conversations meander,' [Anwar Ghuloum, a principal engineer with Intel's Microprocessor Technology Lab] said. He says that the more radical programming path to tap into many processing cores 'presents the "opportunity" for a major refactoring of their code base, including changes in languages, libraries, and engineering methodologies and conventions they've adhered to for (often) most of the their software's existence.'"

9 of 638 comments (clear)

Min score:

Reason:

Sort:

Re:Not Sure I'm Getting It by Delwin · 2008-07-02 08:46 · Score: 5, Informative

Because each core is no longer task switching. Once you have more cores than tasks you can remove all the context switching logic and optimize the cores to run single processes as fast as possible.

Then you take the tasks that can be broken up over multiple cores (Ray Tracing anyone?) and fill the rest of your cores with that.
Re:Ok.. so how do I do that? by Phroggy · 2008-07-02 09:02 · Score: 4, Informative

A year or so ago, I saw a presentation on Thread Building Blocks, which is basically an API thingie that Intel created to help with this issue. Their big announcement last year was that they've released it open-source and have committed to making it cross-platform. (It's in Intel's best interest to get people using TBB on Athlon, PPC, and other architectures, because the more software is multi-core aware, the more demand there will be for multi-core CPUs in general, which Intel seems pretty excited about.)

--
$x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
$x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
Re:Not Sure I'm Getting It by k8to · 2008-07-02 09:56 · Score: 5, Informative

True but misleading. The major cost of task switching is a hardware-derived one. It's the cost of blowing caches. The swapping of CPU state and such is fairly small by comparison, and the cost of blowing caches is only going up.

--
-josh
Re:Not Sure I'm Getting It by blahplusplus · 2008-07-02 10:50 · Score: 4, Informative

"Because each core is no longer task switching. Once you have more cores than tasks you can remove all the context switching logic and optimize the cores to run single processes as fast as possible.
Then you take the tasks that can be broken up over multiple cores (Ray Tracing anyone?) and fill the rest of your cores with that."
Unfortunately all this is going to lead to bus and memory bandwidth contention, you're just shifting the burden from one point to another. Although their is a 'penalty' for task switching, there is an even greater bottleneck at the bus and memory bandwidth level.
IMHO intel would have to release a cpu on a card with specialized ram chips and segment the ram like GPU's do to get anything out of multicore over the long term, ram is not keeping up and the current architecture for PC ram is awful for multicore. CPU speed is far outstripping bus and memory bandwidth. I am quite dubious of multi-core architecture, there is fundamental limits of geometry of circuits. I'd be sinking my money into materials research not glueing cores together and praying CS and math guys come up with solutions that take advantage of it.
The whole of human history of engineering and tool use, is to take something extremely complicated and offload complexity, and compartmentalize it so that it's mangable. I see the opposite happening with multi-core.
Re:Databases and implimentation-neutrality by Shados · 2008-07-02 10:54 · Score: 4, Informative

By "a lot of processing can potentially be converted into DB queries", what you discovered is functional programming :) LINQ in .NET 3.5/C# 3.0 is an example of functional programming that is made to look like DB queries, but it isn't the only way. It is a LOT easier to convert that stuff and optimize it to the environment (like how SQL is processed), since it describes the "what" more than the "how". It is already done, and one (out of many examples) is Parallel LINQ, which smartly execute LINQ queries in parallel, optimized for the amount of cores, etc. (And I'm talking about LINQ in the context of in memory process, not LINQ to SQL, which simply convert LINQ queries into SQL ones).
Functional programming, tied with the concept of transactional memory to handle concurency, is a nice medium term solution to the multi-core problem.
Re:Not Sure I'm Getting It by skulgnome · 2008-07-02 10:57 · Score: 5, Informative

No. I/O is the slowdown in multitasking OSes.
Re:Not Sure I'm Getting It by Salamander · 2008-07-02 13:23 · Score: 5, Informative

Because each core is no longer task switching. Once you have more cores than tasks you can remove all the context switching logic and optimize the cores to run single processes as fast as possible.
OK, so now the piece that's running on each core runs really really fast . . . until it needs to wait for or communicate with the piece running on some other core. If you can do your piece in ten instructions but you have to wait 1000 for the next input to come in, whether it's because your neighbor is slow or because the pipe between you is, then you'll be sitting and spinning 99% of the time. Unfortunately, the set of programs that decompose nicely into arbitrarily many pieces that each take the same time (for any input) doesn't extend all that far beyond graphics and a few kinds of simulation. Many, many more programs hardly decompose at all, or still have severe imbalances and bottlenecks, so the "slow neighbor" problem is very real.
Many people's answer to the "slow pipe" problem, on the other hand, is to do away with the pipes altogether and have the cores communicate via shared memory. Well, guess what? The industry has already been there and done that. Multiple processing units sharing a single memory space used to be called SMP, and it was implemented with multiple physical processors on separate boards. Now it's all on one die, but the fundamental problem remains the same. Cache-line thrashing and memory-bandwidth contention are already rearing their ugly heads again even at N=4. They'll become totally unmanageable somewhere around N=64, just like the old days and for the same reasons. People who lived through the last round learned from the experience, which is why all of the biggest systems nowadays are massively parallel non-shared-memory cluster architectures.
If you want to harness the power of 1000 processors, you have to keep them from killing each other, and they'll kill each other without even meaning to if they're all tossed in one big pool. Giving each processor (or at least each small group of processors) its own memory with its own path to it, and fast but explicit communication with its neighbors, has so far worked a lot better except in a very few specialized and constrained cases. Then you need multi-processing on the nodes, to deal with the processing imbalances. Whether the nodes are connected via InfiniBand or an integrated interconnect or a common die, the architectural principles are likely to remain the same.
Disclosure: I work for a company that makes the sort of systems I've just described (at the "integrated interconnect" design point). I don't say what I do because I work there; I work there because of what I believe.

--
Slashdot - News for Herds. Stuff that Splatters.
Re:Not Sure I'm Getting It by Erich · 2008-07-02 15:17 · Score: 5, Informative
Single Address Space is horrible.
It's a huge kludge for idiotic processors (like arm9) that don't have physically-tagged caches. On all non-incredibly-sucky processors, we have physically tagged caches, and so having every app have its own address space, or having multiple apps share physical pages at different virtual addresses, all of these are fine.
Problems with SAS:
- Everything has to be compiled Position-independent, or pre-linked for a specific location
- Virtual memory fragmentation as applications are loaded and unloaded
- Where is the heap? Is there one? Or one per process?
- COW and paging get harder
- People start using it and think it's a good idea.
Most people... even people using ARM... are using processors with physically-tagged caches. Please, Please, Please, don't further the madness of single-address-space environments. There are still people encouraging this crime against humanity.
Maybe I'm a bit bitter, because some folks in my company have drunk the SAS kool-aid. But believe me, unless you have ARM9, it's not worth it!
--
-- Erich
Slashdot reader since 1997
Re:The thing's hollow - it goes on forever by dryeo · 2008-07-02 17:00 · Score: 5, Informative

And before they made it into a movie it was an interesting short story. http://en.wikipedia.org/wiki/The_Sentinel_(short_story)
If you'd like to read it, seems it is this PDF, http://econtent.typepad.com/TheSentinel.pdf

--
https://en.wikipedia.org/wiki/Inverted_totalitarianism