Slashdot Mirror


Intel Says to Prepare For "Thousands of Cores"

Impy the Impiuos Imp writes to tell us that in a recent statement Intel has revealed their plans for the future and it goes well beyond the traditional processor model. Suggesting developers start thinking about tens, hundreds, or even thousand or cores, it seems Intel is pushing for a massive evolution in the way processing is handled. "Now, however, Intel is increasingly 'discussing how to scale performance to core counts that we aren't yet shipping...Dozens, hundreds, and even thousands of cores are not unusual design points around which the conversations meander,' [Anwar Ghuloum, a principal engineer with Intel's Microprocessor Technology Lab] said. He says that the more radical programming path to tap into many processing cores 'presents the "opportunity" for a major refactoring of their code base, including changes in languages, libraries, and engineering methodologies and conventions they've adhered to for (often) most of the their software's existence.'"

10 of 638 comments (clear)

  1. Memory bandwidth? by Brietech · · Score: 5, Interesting

    If you can get a thousand cores on a chip, and you still only have enough pins for a handful (at best) of memory interfaces, doesn't memory become a HUGE bottleneck? How do these cores not get starved for data?

    --
    I'm perfect in every way, except for my humility.
  2. Disagreement about this trend by Raul654 · · Score: 5, Interesting

    At Supercomputing 2006, they had a wonderful panel where they discussed the future of computing in general, and tried to predict what computers (especially Supercomputers) would look like in 2020. Tom Sterling made what I thought was one of the most insightful observations of the panel -- most of the code out there is sequential (or nearly so) and I/O bound. So your home user checking his email, running a web browser, etc is not going to benefit much from having all that compute power. (Gamers are obviously not included in this) Thus, he predicted, processors would max out at a "relatively" low number of cores - 64 was his prediction.

    --


    To make laws that man cannot, and will not obey, serves to bring all law into contempt.
    --E.C. Stanton
    1. Re:Disagreement about this trend by drinkypoo · · Score: 5, Interesting

      Architectures have changed and other stuff allow a current single core of a 3.2 to easily outperform the old 3.8's but then still why don't we see new 3.8's?

      The Pentium 4 is, well, it's scary. It actually has "drive" stages because it takes too long for signals to propagate between functional blocks of the processor. This is just wait time, for the signals to get where they're going.

      The P4 needed a super-deep pipeline to hit those kinds of speeds as a result, and so the penalty for branch misprediction was too high.

      What MAY bring us higher clock rates again, though, is processors with very high numbers of cores. You can make a processor broad, cheap, or fast, but not all three. Making the processors narrow and simple will allow them to run at high clock rates and making them highly parallel will make up for their lack of individual complexity. The benefit lies in single-tasking performance; one very non-parallelizable thread which doesn't even particularly benefit from superscalar processing could run much faster on an architecture like this than anything we have today, while more parallelizable tasks can still run faster than they do today in spite of the reduced per-core complexity due to the number of cores - if you can figure out how to do more parallelization. Of course, that is not impossible.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
  3. Lookahead/predictive branching is one option... by Cordath · · Score: 4, Interesting

    Say you have a slow, plodding sequential process. If you reach a point where there are several possibilities and you have an abundance of cores, you can start work on each of the possibilities while you're still deciding which possibility is actually the right one. Many CPU's already incorporate this sort of logic. It is, however, rather wasteful of resources and provides a relatively modest speedup. Applying it at a higher level should work, in principle, although it obviously isn't going to be practical for many problems.

    I do see this move by Intel as a direct follow up to their plans to negate the processing advantages of today's video cards. Intel wants people running general purpose code to run it on their general purpose CPU's, not on their video cards using CUDA or the like. If the future of video game rendering is indeed ray-tracing (an embarrassingly parallel algorithm if ever there was one) then this move will also position Intel to compete directly with Nvidia for the raw processing power market.

    One thing is for sure, there's a lot of coding to do. Very few programs currently make effective use of even 2 cores. Parallelization of code can be quite tricky, so hopefully tools will evolve that will make it easier for the typical code-monkey who's never written a parallel algorithm in his life.

  4. Re:Not Sure I'm Getting It by Talennor · · Score: 4, Interesting

    While prefetching data can be done using a single core, your post in this context gives me a cool idea.

    Who needs branch prediction when you could just have 2 cores running a thread? Send each one executing instructions without a break in the pipeline and sync the wrong core to the correct one once you know the result. You'd still have to wait for results before any store operations, but you should probably know the branch result by then anyway.

    --

    //TODO: signature
  5. look what happened to ps3 by edxwelch · · Score: 4, Interesting

    So now we have a shit load of cores all we have to do is wait for the developers to put some multi-threading goodness in their apps.... or maybe not.
    The PS3 was ment to be faster than any other system because of it's multi-cores cell architecture, but in a interview John Carmack said, "Although it's interesting that almost all of the PS3 launch titles hardly used any Cells at all."

    http://www.gameinformer.com/News/Story/200708/N07.0803.1731.12214.htm

  6. Databases and implimentation-neutrality by Tablizer · · Score: 4, Interesting

    Databases provide a wonderful opportunity to apply multi-core processing. The nice thing about a (good) database is that queries describe what you want, not how to go about getting it. Thus, the database can potentially split the load up to many processes and the query writer (app) does not have to change a thing in his/her code. Whether a serial or parallel process carries it out is in theory out of the app developer's hair (although dealing with transaction management may sometimes come into play for certain uses.)

    However, query languages may need to become more general-purpose in order to have our apps depend on them more, not just business data. For example, built-in graph (network) and tree traversal may need to be added and/or standardized in query languages. And, we made need to clean up the weak-points of SQL and create more dynamic DB's to better match dynamic languages and scripting.

    Being a DB-head, I've discovered that a lot of processing can potentially be converted into DB queries. That way one is not writing explicit pointer-based linked lists etc., locking one into a difficult-to-parallel-ize implementation.

    Relational engines used to be considered too bulky for many desktop applications. This is partly because they make processing go through a DB abstraction layer and thus are not using direct RAM pointers. However, the flip-side of this extra layer is that they are well-suited to parallelization.
           

  7. Re:Not Sure I'm Getting It by cpeterso · · Score: 5, Interesting

    Now that 64-bit processors are so common, perhaps operating systems can spare some virtual address space for performance benefits.

    The OPAL operating system was a University of Washington research project from the 1990s. OPAL uses a single address space for all processes. Unlike Windows 3.1, OPAL still has memory protection and every process (or "protection domain") has its own pages. The benefit of sharing a single address space is that you don't need to flush the cache (because the virtual-to-physical address mapping do not change when you context switch). Also, pointers can be shared between processes because their addresses are globally unique.

  8. Re:It's all changing too fast by uncqual · · Score: 4, Interesting

    If a programmer has prospered for 20 or 30 years in this business, they probably have adapted to multiple paradigm shifts.

    For example, "CPU expensive, memory expensive, programmer cheap" is now "CPU cheap, memory cheap, programmer expensive" -- hence Java et al. (I am sometimes amazed when I casually allocate/free chunks of memory larger than all the combined memory of all the computers at my university - both in the labs and the administration/operational side - but what amazes me is that it doesn't amaze me!)

    Actually some of the "old timers" may be a more comfortable with some issues of highly parallel programming than some of the "kids" (term used with respect, we were all kids once!) who have mostly had them masked from them by high level languages. Comparing "old timers" to "kids" doing enterprise server software, the kids seem much less likely to understand issues like memory coherence models of specific architectures, cache contention issues of specific implementations, etc.

    Also, too often, the kids make assumptions about the source of performance/timing problems rather than gathering empirical evidence and acting on that evidence. This trait is particularly problematic because when dealing with concurrency and varying load conditions, intuition can be quite unreliable.

    Really, it's not all that scary - the first paradigm shift is the hardest!

    --
    Why is there an "insightful" mod and why isn't it "-1"? If I wanted insight, I wouldn't be reading /.
  9. Re:Not Sure I'm Getting It by kesuki · · Score: 5, Interesting

    yes, but if you have 1000 cores each with 64k of cache, then you start to run into problems with memory throughput when computing massively parallel data.

    memory throughput has been the achilles heel of graphic processing for years now. and as we all know, splitting up a graphic screen into smaller segments is simple. so GPUs went massively parallel long before CPUS, in fact you will soon be able to get over 1000 stream processing units in a single desktop graphic card.

    so, the real problem is memory technology, how can a single memory module consistently feed 1000 cores, for instance if you want to do real-time n-pass encoding of a hd video stream... while playing a FPS online, and running IM software, and a strong anti-virus suite...

    I have a horrible horrible ugly feeling that you'll never be able to get a system that can reliably do all that. at the same time, just because they'll skimp on memory tech or interconnects, so you'll have most of the capabilities of a 1,000 core system wasted.