IEEE Says Multicore is Bad News For Supercomputers

← Back to Stories (view on slashdot.org)

IEEE Says Multicore is Bad News For Supercomputers

Posted by timothy on Friday December 5, 2008 @12:04AM from the unexpected-downsides dept.

Richard Kelleher writes "It seems the current design of multi-core processors is not good for the design of supercomputers. According to IEEE: 'Engineers at Sandia National Laboratories, in New Mexico, have simulated future high-performance computers containing the 8-core, 16-core, and 32-core microprocessors that chip makers say are the future of the industry. The results are distressing. Because of limited memory bandwidth and memory-management schemes that are poorly suited to supercomputers, the performance of these machines would level off or even decline with more cores.'"

6 of 251 comments (clear)

Min score:

Reason:

Sort:

Re:Time for vector processing again by Retric · 2008-12-05 01:17 · Score: 3, Informative

Modern CPU's have 8+ Mega Bytes of L2/L3 cache on chip so RAM is only a problem when your working set it larger than that. The problem super computing folks are having is they want to solve problems that don't really fit in L3 cache which creates significant problems but they still need a large cache. However, because of speed of light issues off chip ram is always going to be high latency so you need to use some type of on chip cache or stream lot's off data to the chip.

There are really only 2 options for modern systems when it comes to memory you can have lot's of cores and a tiny cache like GPU's or lot's of cache and fewer cores like CPU's. (ignoring type of core issues and on chip interconnects etc.) So there is little advantage to paying 10x per chip to go custom vs using more cheaper chips when they can build supper computers out of CPU's, GPU's, or something between them like the Cell processor.
The problem allegedly being.. by Junta · 2008-12-05 01:21 · Score: 4, Informative

For a given node count, we've seen increases in performance. The claimed problem is that for the workloads that concern these researchers, they don't see people mentioning significant enhancements to the fundamental memory architecture projected to follow the scale at which multi-core systems go. So you buy a 16 core chip system to upgrade your quad-core based system and hypothetically gain little despite the expense. Power efficencies drop and getting more performance requires more nodes. Additionally, who is to say that clock speeds won't lower if programming models in the mass market change such that distributed workloads are common and single-core performance isn't all that impressive.
All that said, talk beyond 6-core/8-core is mostly grandstanding at this time. As memory architecture for the mass market is not considered as intrinsically exciting, I would wager there will be advancements that no one speaks to. For example, Nehalem leapfrogs AMD memory bandwidth by a large margin (like by a factor of 2). It means if Shanghai parts are considered satisfactory today to get respectable yield memory wise to support four cores, Nehalem, by a particular metric, supports 8 equally satisfactorily. The whole picture is a tad more complicated (i.e. latency, numbers I don't know off hand), but the one metric is a highly important one in the supercomputer field.
For all the worry over memory bandwidth though, it hasn't stopped supercomputer purchasers from buying into Core2 all this time. Despite improvements in their chipset, Intel Core2 still doesn't reach AMD performance. Despite that, people spending money to get into the Top500 still chose to put their money on Core2 in general. Sure, Cray and IBM supercomputers in the Top2 used AMD, but from the time of its release, Core2 has decimated AMD supercomputer market share despite an inferior memory architecture.

--
XML is like violence. If it doesn't solve the problem, use more.
Re:It's so obvious... by AlXtreme · 2008-12-05 01:44 · Score: 3, Informative

You mean something like a CPU cache? I assume you know that every core already has a cache (L1) on multi-core systems, and shares a larger cache (L2) between all cores.
The problem is that on/near-core memory is damn expensive, and your average supercomputing task requires significant amounts of memory. When the bottleneck for high performance computing becomes memory bandwidth instead of interconnect/network bandwidth you have something a lot harder to optimize, so I can understand where the complaint in IEEE comes from.
Perhaps this will lead to CPUs with large L1 caches specifically for supercomputing tasks, who knows...

--
This sig is intentionally left blank
Re:Time for vector processing again by yttrstein · 2008-12-05 02:04 · Score: 3, Informative

We still have different processors for desktops and supercomputers.

http://www.cray.com/products/XMT.aspx

Rest assured, there are still people who know how to build them. They're just not quite as popular as they used to be, now that a middle manager who has no idea what the hell they're talking about can go to an upper manager with a spec sheet that's got 8 thousand processors on it and say "look! This ones got a whole ton more processors than that dumb Cray thing!"
Re:Time for vector processing again by TapeCutter · 2008-12-05 02:04 · Score: 4, Informative

"Multi-Core technology is good for desktop systems as it is meant to run a lot of relatively small apps Rarely taking advantage of more then 1 or 2 cores. per app.In other-words it allows Multi-Tasking without a penalty. We don't use super computers that way. We use them to to perform 1 app that takes huge resources that would take hours or years on your PC and spit out results in seconds or days."

Sorry but that's not entirely correct, most super computers work on highly parallel problems using numerical analysis techniques. By definition the problem is broken up into millions of smaller problems that make ideal "small apps", a common consequence is that the bandwidth of the communications between the 'small apps' becomes the limiting factor.

"Back in the early-mid 90's we had different processors for Desktop and Super Computers."

The earth simulator was refered to in some parts as 'computenick', it's speed jump over it's nearest rival and longevity at the top marked the renaissance of "vector processing" after it had been largely ignored during the 90's.

In the end a supercomputer is a purpose built machine, if cores fit the purpose then they will be used.

--
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
Re:Time for vector processing again by David+Greene · 2008-12-05 04:45 · Score: 3, Informative

Cray did not stream vectors from memory. One of the advances of the Cray-1 was the use of vector registers as opposed to, for example, the Burroughs machines which streamed vectors directly to/from memory.
We know how to build memory systems that can handle large vectors. Both the Cray X1 and Earth Simulator demonstrate that. The problem is that those memory systems are currently too expensive. We are going to see more and more vector processing in commodity processors.

--