Intel's "Terascale" Vision

← Back to Stories (view on slashdot.org)

Posted by ryuzaki0 on Wednesday September 27, 2006 @08:15AM from the 80-cores-and-nothing-on dept.

Vigile writes, "Intel is pushing the envelope with its latest vision — 80 cores on a single processor. Dubbed 'Terascale' computing, Intel aims to bring low-powered, massively interconnected cores and unleash a new era in data-mining, media creation, and entertainment." For balance, read Tom Yager over at InfoWorld imploring AMD to stop at 8 cores while everybody gets the architecture right.

7 of 220 comments (clear)

Min score:

Reason:

Sort:

We have a dupe! by Anonymous Coward · 2006-09-27 08:19 · Score: 1, Informative

dupe

doioioioioi
Re:Why have 8 strong ox? by Anonymous Coward · 2006-09-27 08:36 · Score: 1, Informative

FYI, the plural of ox is oxen.
Someone needs to relearn SI by ncc05 · 2006-09-27 08:39 · Score: 5, Informative

[A] teraflop is approximately 1000 Megaflops.
Is there such a thing as a gigaflops? What happened to that?
Erlang would shine! by Anonymous Coward · 2006-09-27 08:41 · Score: 1, Informative

This is the sort of hardware environment that would allow a programming language like Erlang to thrive. We could very well see it being used more and more often in the future, as even regular consumers have access to machines with well over 16 cores.

For many applications, Erlang provides a far superior model to that of languages like C and C++ (with pthreads), Java, C# and Perl when it comes to massively multithreaded programming. Very high reliability is possible using Erlang, as witnessed by the many telephony products in which it is used. So for consumer applications, it could help developers build very solid systems that easily take advantage of many processors.
Re:80 Submissions by archen · 2006-09-27 10:30 · Score: 2, Informative

I think that was part of what the article was inferring. Assuming you had a perfectly optimized kernel and a zillion cores, performance still isn't going to scale all that well. There is just too many bottlenecks all over the way the general purpose PC is designed today. And lets not forget how far behind hard drive tech is dragging compaired to the rest of the system. It's funny because everyone acts like this is so new despite the fact that high end stuff like supercomputers have been dealing with these issues for decades. The PC arcetecture is going to have to change in more than a couple ways, but before that happens everyone is going to have to get used to the fact that their system has more than one core. Maybe that's one of the tricks AMD has up its sleeves. Buying ATI may have been a step towards re-engineering the parts of the PC that are going to be bottlenecks.
Not a good way to speed up general purpose apps by mpaque · 2006-09-27 12:22 · Score: 2, Informative

I like the idea of an 80 core processor. Multithreaded applications will work better

Multithreading models from the Windows/Unix/Linux community all assume equal access to system resources such as memory across all threads. They like Uniform Memory Architecture models.

An 80 core system can't really provide a uniform memory access model, as it runs into severe switching and coherency problems. (You want to snoop HOW MANY L1 caches?!??). Fancy interconnects like hyperchannel and Monte Carlo stochastic schemes start getting pinched for bandwidth around 8 cores. With this many cores, you'll wind up with computing meshes of local processors and memory interconnected using some interesting switching scheme. The article even mentions this, with a bit of hand-waving over the issues of bandwidth in shared system resources. "Intel's answer is to attach 256 Mbit of SRAM directly to EACH core. " Interconnect topology is left at a simple tiling scheme, but they are exploring ring topologies.

The result looks remarkably like a transputer mesh. I've programmed these in the past, and the model is rather different than simple multithreading. Being able to decompose the programming problem into a number of independent steps with relatively low communications demands is essential. The ability to reconfigure the interconnect topology to match the problem's data flow is essential to being able to get as much out of the processor set as possible. Without this, one can wind up with lots of idle processors, blocked on data starvation.
Re:Arrgghhh by Procyon101 · 2006-09-28 09:31 · Score: 2, Informative

To clarify, by heirarchial I was refering to HFC.

GA/GP searches a problem space for answers using a distributed hill climbing algorithm. It isn't a magic bullet for all problems, but performs well when the fitness of the solution set is a contiguous function and the slopes of the fitness hyperplane are not too extreme. If the fitness landscape is not contiguous, then GA/GP is unlikely to outperform random search by very much.

For instance: If your problem is "Devise a key that will open this lock", then GA/GP is not a very good way to go about searching because the feedback for whether the solution is good or not is purely binary... it either works, or it doesn't. In a failed attempt there is no "hints" as to whether this failure was closer of further away from any solution. If, however, we can gague how good the solution is in comparison with other attempts, we can hill-climb the fitness landscape and home in on good solutions.

Traditionally, however, GA/GP has been limited by the fact that it tends to home in on 1 good solution, to the exclusion of all others, even if it's not the "best" solution. The algorithms tend to refine a good solution forever, never escaping their local maxima in the fitness landscape. The only way to get them to find another solution is to restart the evolution from the beginning. HFC allows the run to continuously probe the entire solution set and converge on all maxima.

Your question about optimal subsolutions is debated actually. Koza's original premise was that GP works by combining good subtrees in a solution. More recent research has brought this into question, and often random mutation will outperform crossover, or at least come close. There is work being done to see if we can increase the role of solution "building blocks" but there is no concensus. Your NP problem does not need to be highly decomposeable for most GA/GP systems to work, but it does need to have a smooth fitness landscape.