Intel Says to Prepare For "Thousands of Cores"

← Back to Stories (view on slashdot.org)

Intel Says to Prepare For "Thousands of Cores"

Posted by ScuttleMonkey on Wednesday July 2, 2008 @08:42AM from the viva-la-coding-revolucion dept.

Impy the Impiuos Imp writes to tell us that in a recent statement Intel has revealed their plans for the future and it goes well beyond the traditional processor model. Suggesting developers start thinking about tens, hundreds, or even thousand or cores, it seems Intel is pushing for a massive evolution in the way processing is handled. "Now, however, Intel is increasingly 'discussing how to scale performance to core counts that we aren't yet shipping...Dozens, hundreds, and even thousands of cores are not unusual design points around which the conversations meander,' [Anwar Ghuloum, a principal engineer with Intel's Microprocessor Technology Lab] said. He says that the more radical programming path to tap into many processing cores 'presents the "opportunity" for a major refactoring of their code base, including changes in languages, libraries, and engineering methodologies and conventions they've adhered to for (often) most of the their software's existence.'"

31 of 638 comments (clear)

The thing's hollow - it goes on forever by stoolpigeon · 2008-07-02 08:42 · Score: 5, Funny

- and - oh my God - it's full of cores!

--
It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
1. Re:The thing's hollow - it goes on forever by Maxo-Texas · 2008-07-02 10:49 · Score: 5, Funny
  
  Don't give up! Stay the cores!
  
  --
  She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.
2. Re:The thing's hollow - it goes on forever by dryeo · 2008-07-02 17:00 · Score: 5, Informative
  
  And before they made it into a movie it was an interesting short story. http://en.wikipedia.org/wiki/The_Sentinel_(short_story)
  If you'd like to read it, seems it is this PDF, http://econtent.typepad.com/TheSentinel.pdf
  
  --
  https://en.wikipedia.org/wiki/Inverted_totalitarianism
Not Sure I'm Getting It by gbulmash · 2008-07-02 08:44 · Score: 5, Insightful

I'm no software engineer, but it seems like a lot of the issue in designing for multiple cores is being able to turn large tasks into many independent discrete operations that can be processed in tandem. But it seems that some tasks lend themselves to that compartmentalization and some don't. If you have 1,000 half-gigahertz cores running a 3D simulation, you may be able to get 875 FPS out of Doom X at 1920x1440, but what about the processes that are slow and plodding and sequential? How do those get sped up if you're opting for more cores instead of more cycles?

--
Start a happiness pandemic
1. Re:Not Sure I'm Getting It by Delwin · 2008-07-02 08:46 · Score: 5, Informative
  
  Because each core is no longer task switching. Once you have more cores than tasks you can remove all the context switching logic and optimize the cores to run single processes as fast as possible.
  
  Then you take the tasks that can be broken up over multiple cores (Ray Tracing anyone?) and fill the rest of your cores with that.
2. Re:Not Sure I'm Getting It by Mordok-DestroyerOfWo · 2008-07-02 08:47 · Score: 5, Funny
  
  My friends and I have lots of conversations about girls, how to get girls, how to please girls. However until anything other than idle talk actually happens this goes into the "wouldn't it be nice" category
  
  --
  "Never let your sense of morals prevent you from doing what is right" - Salvor Hardin
3. Re:Not Sure I'm Getting It by pla · 2008-07-02 09:10 · Score: 5, Insightful
  
  I'm no software engineer [...] but what about the processes that are slow and plodding and sequential? How do those get sped up if you're opting for more cores instead of more cycles?
  
  As a software engineer, I wonder the same thing.
  
  Put simply, the majority of code simply doesn't parallelize well. You can break out a few major portions of it to run as their own threads, but for the most part, programs either sit around and wait for the user, or sit around and wait for hardware resources.
  
  Within that, only those programs that wait for a particular hardware resource - CPU time - Even have the potential to benefit from more cores... And while a lot of those might split well into a few threads, most will not scale (without a complete rewrite to chose entirely different algorithms - If they even exist to accomplish the intended purpose) to more than a handful of cores.
4. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-02 09:21 · Score: 5, Insightful
  
  That is what most current processors do and use branch prediction for. Even if you have a thousand cores, that's only 10 binary decisions ahead. You need to guess really well very often to keep your cores busy instead of syncing. Also, the further you're executing ahead, the more ultimately useless calculations are made, which is what drives power consumption up in long pipeline cores (which you're essentially proposing).
  In reality parallelism is more likely going to be found by better compilers. Programmers will have to be more specific about the type of loops they want. Do you just need something to be performed on every item in an array or is order important? No more mindless for-loops for not inherently sequential processes.
5. Re:Not Sure I'm Getting It by 192939495969798999 · 2008-07-02 09:33 · Score: 5, Insightful
  
  I concur, furthermore I'd like to see one core per pixel, that would certainly solve your high-end gaming issues.
  
  --
  stuff |
6. Re:Not Sure I'm Getting It by k8to · 2008-07-02 09:56 · Score: 5, Informative
  
  True but misleading. The major cost of task switching is a hardware-derived one. It's the cost of blowing caches. The swapping of CPU state and such is fairly small by comparison, and the cost of blowing caches is only going up.
  
  --
  -josh
7. Re:Not Sure I'm Getting It by cpeterso · 2008-07-02 10:29 · Score: 5, Interesting
  
  Now that 64-bit processors are so common, perhaps operating systems can spare some virtual address space for performance benefits.
  The OPAL operating system was a University of Washington research project from the 1990s. OPAL uses a single address space for all processes. Unlike Windows 3.1, OPAL still has memory protection and every process (or "protection domain") has its own pages. The benefit of sharing a single address space is that you don't need to flush the cache (because the virtual-to-physical address mapping do not change when you context switch). Also, pointers can be shared between processes because their addresses are globally unique.
  
  --
  cpeterso
8. Re:Not Sure I'm Getting It by painehope · 2008-07-02 10:36 · Score: 5, Funny
  
  They have been simulating it, that's why he said "My friends and I". *shudders*
  
  --
  PC moderators can suck my White pierced, tattooed dick. If you think pride == hate, s/dick/Aryan meat mallet/g.
9. Re:Not Sure I'm Getting It by LandDolphin · 2008-07-02 10:41 · Score: 5, Insightful
  
  "Having 2 cores is enough for most consumers"
  
  Before having 1 core was enough, and having 512mb of RAM was enough for most consumers. Computing power grows, and software developers makes use of that additional power. However, this will mainly effect the gaming industry.
  
  --
  Spelling and Grammar errors have been added to this post for your enjoyment
10. Re:Not Sure I'm Getting It by skulgnome · 2008-07-02 10:57 · Score: 5, Informative
  
  No. I/O is the slowdown in multitasking OSes.
11. Re:Not Sure I'm Getting It by kv9 · 2008-07-02 11:08 · Score: 5, Funny
  
  I can break a password protected Excel file in 30 hours max with this computer, and a 10000 core chip might reduce this to 43 seconds, but other than that, what difference is it going to make?
  29 hours 59 minutes 17 seconds?
  
  --
  Stop Computers/Cars Analogies on S
12. Re:Not Sure I'm Getting It by kesuki · 2008-07-02 12:35 · Score: 5, Interesting
  
  yes, but if you have 1000 cores each with 64k of cache, then you start to run into problems with memory throughput when computing massively parallel data.
  memory throughput has been the achilles heel of graphic processing for years now. and as we all know, splitting up a graphic screen into smaller segments is simple. so GPUs went massively parallel long before CPUS, in fact you will soon be able to get over 1000 stream processing units in a single desktop graphic card.
  so, the real problem is memory technology, how can a single memory module consistently feed 1000 cores, for instance if you want to do real-time n-pass encoding of a hd video stream... while playing a FPS online, and running IM software, and a strong anti-virus suite...
  I have a horrible horrible ugly feeling that you'll never be able to get a system that can reliably do all that. at the same time, just because they'll skimp on memory tech or interconnects, so you'll have most of the capabilities of a 1,000 core system wasted.
  
  --
  https://www.gnu.org/philosophy/free-sw.html
13. Re:Not Sure I'm Getting It by Salamander · 2008-07-02 13:23 · Score: 5, Informative
  
  Because each core is no longer task switching. Once you have more cores than tasks you can remove all the context switching logic and optimize the cores to run single processes as fast as possible.
  OK, so now the piece that's running on each core runs really really fast . . . until it needs to wait for or communicate with the piece running on some other core. If you can do your piece in ten instructions but you have to wait 1000 for the next input to come in, whether it's because your neighbor is slow or because the pipe between you is, then you'll be sitting and spinning 99% of the time. Unfortunately, the set of programs that decompose nicely into arbitrarily many pieces that each take the same time (for any input) doesn't extend all that far beyond graphics and a few kinds of simulation. Many, many more programs hardly decompose at all, or still have severe imbalances and bottlenecks, so the "slow neighbor" problem is very real.
  Many people's answer to the "slow pipe" problem, on the other hand, is to do away with the pipes altogether and have the cores communicate via shared memory. Well, guess what? The industry has already been there and done that. Multiple processing units sharing a single memory space used to be called SMP, and it was implemented with multiple physical processors on separate boards. Now it's all on one die, but the fundamental problem remains the same. Cache-line thrashing and memory-bandwidth contention are already rearing their ugly heads again even at N=4. They'll become totally unmanageable somewhere around N=64, just like the old days and for the same reasons. People who lived through the last round learned from the experience, which is why all of the biggest systems nowadays are massively parallel non-shared-memory cluster architectures.
  If you want to harness the power of 1000 processors, you have to keep them from killing each other, and they'll kill each other without even meaning to if they're all tossed in one big pool. Giving each processor (or at least each small group of processors) its own memory with its own path to it, and fast but explicit communication with its neighbors, has so far worked a lot better except in a very few specialized and constrained cases. Then you need multi-processing on the nodes, to deal with the processing imbalances. Whether the nodes are connected via InfiniBand or an integrated interconnect or a common die, the architectural principles are likely to remain the same.
  Disclosure: I work for a company that makes the sort of systems I've just described (at the "integrated interconnect" design point). I don't say what I do because I work there; I work there because of what I believe.
  
  --
  Slashdot - News for Herds. Stuff that Splatters.
14. Re:Not Sure I'm Getting It by Erich · 2008-07-02 15:17 · Score: 5, Informative
  Single Address Space is horrible.
  It's a huge kludge for idiotic processors (like arm9) that don't have physically-tagged caches. On all non-incredibly-sucky processors, we have physically tagged caches, and so having every app have its own address space, or having multiple apps share physical pages at different virtual addresses, all of these are fine.
  Problems with SAS:
  
  Everything has to be compiled Position-independent, or pre-linked for a specific location
  
  Virtual memory fragmentation as applications are loaded and unloaded
  
  Where is the heap? Is there one? Or one per process?
  
  COW and paging get harder
  
  People start using it and think it's a good idea.
  
  Most people... even people using ARM... are using processors with physically-tagged caches. Please, Please, Please, don't further the madness of single-address-space environments. There are still people encouraging this crime against humanity.
  Maybe I'm a bit bitter, because some folks in my company have drunk the SAS kool-aid. But believe me, unless you have ARM9, it's not worth it!
  --
  -- Erich
  Slashdot reader since 1997
Memory bandwidth? by Brietech · 2008-07-02 08:45 · Score: 5, Interesting

If you can get a thousand cores on a chip, and you still only have enough pins for a handful (at best) of memory interfaces, doesn't memory become a HUGE bottleneck? How do these cores not get starved for data?

--
I'm perfect in every way, except for my humility.
Disagreement about this trend by Raul654 · 2008-07-02 08:46 · Score: 5, Interesting

At Supercomputing 2006, they had a wonderful panel where they discussed the future of computing in general, and tried to predict what computers (especially Supercomputers) would look like in 2020. Tom Sterling made what I thought was one of the most insightful observations of the panel -- most of the code out there is sequential (or nearly so) and I/O bound. So your home user checking his email, running a web browser, etc is not going to benefit much from having all that compute power. (Gamers are obviously not included in this) Thus, he predicted, processors would max out at a "relatively" low number of cores - 64 was his prediction.

--

To make laws that man cannot, and will not obey, serves to bring all law into contempt.
--E.C. Stanton
1. Re:Disagreement about this trend by RailGunSally · 2008-07-02 08:57 · Score: 5, Funny
  
  Sure! 64 cores should be enough for anybody!
2. Re:Disagreement about this trend by MojoRilla · 2008-07-02 09:34 · Score: 5, Insightful
  
  This seems silly. If you create more compute power, someone will think of ways to use it.
  
  Web applications are becoming more AJAX'y all the time, and they are not sequential at all. Watching a video while another tab checks my Gmail is a parallel task. All indications are that people want to consume more and more media on their computers. Things like the MLB mosaic allow you to watch four games at once.
  
  Have you ever listened to a song through your computer while coding, running an email program, and running an instant messaging program? There are four highly parallelizable tasks right there. Not compute intensive enough for you? Imagine the song compressed with a new codec that is twice as efficient in terms of size but twice as compute intensive. Imagine the email program indexing your email for efficient search, running algorithms to assess the email's importance to you, and virus checking new deliveries. Imagine your code editor doing on the fly analysis of what you are coding, and making suggestions.
  
  "Normal" users are doing more and more with computers as well. Now that fast computers are cheap, people who never edited video or photos are doing it. If you want a significant market besides gamers who need more cores, it is people making videos, especially HD videos. Sure, my Grandmother isn't going to be doing this, but I do, and I'm sure my children will do it even more.
  
  And don't forget about virus writers. They need a few cores to run on as well!
  
  Computer power keeps its steady progress higher, and we keep finding interesting things to do with it all. I don't see that stopping, so I don't see a limit to the number of cores people will need.
3. Re:Disagreement about this trend by drinkypoo · 2008-07-02 09:42 · Score: 5, Interesting
  
  Architectures have changed and other stuff allow a current single core of a 3.2 to easily outperform the old 3.8's but then still why don't we see new 3.8's?
  The Pentium 4 is, well, it's scary. It actually has "drive" stages because it takes too long for signals to propagate between functional blocks of the processor. This is just wait time, for the signals to get where they're going.
  The P4 needed a super-deep pipeline to hit those kinds of speeds as a result, and so the penalty for branch misprediction was too high.
  What MAY bring us higher clock rates again, though, is processors with very high numbers of cores. You can make a processor broad, cheap, or fast, but not all three. Making the processors narrow and simple will allow them to run at high clock rates and making them highly parallel will make up for their lack of individual complexity. The benefit lies in single-tasking performance; one very non-parallelizable thread which doesn't even particularly benefit from superscalar processing could run much faster on an architecture like this than anything we have today, while more parallelizable tasks can still run faster than they do today in spite of the reduced per-core complexity due to the number of cores - if you can figure out how to do more parallelization. Of course, that is not impossible.
  
  --
  "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
been there, done that by frovingslosh · 2008-07-02 08:49 · Score: 5, Funny

Heck, my original computer had 229376 cores. They were arranged in 28k 16 bit words.

--
I'm an American. I love this country and the freedoms that we used to have.
Re:Useless by CastrTroy · 2008-07-02 08:59 · Score: 5, Insightful

Well, parallel programming is hard. It's not so hard that it can't be done, but it's harder than sequential programming. Unless your app will have a specific advantage because of this parallel programming, then it isn't worth the effort to do it in the first place. The nice thing however, would be that you could run each process on a separate core, and there wouldn't be any task switching needed. This would speed things up quite a bit. Also, if you locked a process or thread to each core, then one slow down wouldn't take out the entire system.

--

Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Re:Generic jokes by TaoPhoenix · 2008-07-02 09:03 · Score: 5, Funny

In the Soviet Union ...
Oh wait... the Soviet Union already broke into smaller cores.

--
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
Re:We all saw it coming anyway by ClosedSource · 2008-07-02 09:10 · Score: 5, Insightful

"So whether programmers find this move acceptable or not is irrelevant because this path is probably the only way to design faster CPU:s once we've hit the nanometer wall."
I guess you should put "faster" in quotes.
In any case, it is absolutely relevant what programmers think since any performance improvements that customers actually experience is dependent on our code.
Historically a primary reason to buy a new computer is because a faster system makes legacy applications run faster. To a large extent this won't be true with a new multicore PC. So why would people buy them?
That's why Intel wants us to redesign our software - so that in the future their customers will still have a reason to buy a new PC with Intel Inside.
Imagine a Beowulf cluster.... by davidwr · 2008-07-02 09:11 · Score: 5, Funny

oh nevermind, what's the point?

--
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
Profit!!! by DeVilla · 2008-07-02 10:03 · Score: 5, Funny

Hi. I make processors. I know a lot about processors. I think a big change is coming to processors. I think you should learn to use a lot of processors. A whole lot of processors. You need more processors. Oh, and did I tell you I make processors?
Re:It's all changing too fast by GatesDA · 2008-07-02 10:13 · Score: 5, Insightful

My dad's been programming for decades, and he's much more used to paradigm shifts than I am. His first programming job was translating assembly from one architechture to another, and now he's a proficient web developer. He understands concurrency and keeps up to date on new developments.
I'm reminded of an anecdote told to me during a presentation. The presenter had been introducing a new technology, and one man had a concern: "I've just worked hard to learn the previous technology. Can you promise me that, if I learn this one, it will be the last one I ever have to learn?" The presenter replied, "I can't promise you that, but I can promise you that you're in the wrong profession."
Bill gates was just mis-quoted by Growlor · 2008-07-02 13:38 · Score: 5, Funny

He meant 640k CORES should be enough for anybody.