Slashdot Mirror


Researchers Unveil Experimental 36-Core Chip

rtoz writes The more cores — or processing units — a computer chip has, the bigger the problem of communication between cores becomes. For years, Li-Shiuan Peh, the Singapore Research Professor of Electrical Engineering and Computer Science at MIT, has argued that the massively multicore chips of the future will need to resemble little Internets, where each core has an associated router, and data travels between cores in packets of fixed size. This week, at the International Symposium on Computer Architecture, Peh's group unveiled a 36-core chip that features just such a "network-on-chip." In addition to implementing many of the group's earlier ideas, it also solves one of the problems that has bedeviled previous attempts to design networks-on-chip: maintaining cache coherence, or ensuring that cores' locally stored copies of globally accessible data remain up to date.

13 of 143 comments (clear)

  1. im still a bit skeptical. by nimbius · · Score: 3, Funny

    All this performance in just one chip. I mean, sure, it has 36 cores but lets be rational here...does it seriously expect to run crysis?

    --
    Good people go to bed earlier.
  2. Re:Moore's Law by Opportunist · · Score: 4, Interesting

    As an aside: It's been a while since we've seen any decent rise in processor Ghz.

    Just to abuse a car analogy: Maybe it's time we stop revving up and instead shift gears.

    --
    We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
  3. Re:Different Power Supply Voltage by fuzzyfuzzyfungus · · Score: 4, Interesting

    A higher high/low voltage swing (with a reasonable amount of other stuff being equal) will be more of a thermal nuisance; but if the perks make up for it, that's hardly a dealbreaker. The toasty end of boring desktop CPUs is somewhere north of 200watts already, with a little shoving that they typically survive, so if somebody really wants 36 cache-coherent cores on-die, they'll suck it up and make it work.

    For applications that don't specifically demand that, I'd be interested to know how the costs and benefits of 'dealing with the cooling demands of a smaller number of denser parts' compare with 'dealing with the cooling demands of more, cooler, parts, closer to whatever the performance per watt sweet spot is; but with more cabling, PSUs, switches, and similar interconnect and support stuff to buy and power'...

  4. Re:Intel Knights Landing by Trepidity · · Score: 5, Informative

    Yes, as usual, the MIT press release oversells the research, while the original paper [pdf] is a bit more careful in its claims. The paper makes clear that the novel contribution isn't the idea of putting "little internets" (as the press release calls them) on a chip, but acknowledges that there is already a lot of research in the area of on-chip routing between cores. The paper's contribution is to propose a new cache coherence scheme which they claim has scalability advantages over existing schemes.

  5. The two hardest problems in CS: by magsol · · Score: 4, Funny

    pointer arithmetic, cache invalidation, and off-by-one errors

    --
    "I'd just like to emphasise that taking a million years isn't a metaphor here..." -Rich Bradshaw
  6. Interesting by Virtucon · · Score: 3, Informative

    Cache coherency has been one of the banes of multicore architecture for years. It's nice to see a different approach but chip manufacturers are already getting high performance results without introducing additional complexity. The Oracle (Sun) Sparc T5 architecture has 16 cores with 128 threads running at 3.6Ghz. It gives a few more years to Solaris at least but it's still a hell of a processor. For you Intel fans the E7-2790 v2 sports 15 cores with 30 threads with a 37.5MB cache so they're doing something right because it screams and is capable of 85GB/s memory throughput.

    I'm sure the chip architects are looking at this research but somehow I think they're already ahead of the curve because these kinds of cores/threads are jumps ahead of where we were just a few years ago. Anybody remember the first Pentium Dual Core and The UltraSparc T1?

    --
    Harrison's Postulate - "For every action there is an equal and opposite criticism"
  7. Re:Is there anything new here? by Trepidity · · Score: 4, Informative

    The basic idea isn't new. What the paper is really claiming is new is their particular cache coherence scheme, which (to quote from the Conclusion) "supports global ordering of requests on a mesh network by decoupling the message delivery from the ordering", making it "able to address key coherence scalability concerns".

    How novel and useful that is I don't know, because it's really a more specialist contribution than the headline claims, to be evaluated by people who are experts in multicore cache coherence schemes.

  8. Re:Moore's Law by Anonymous Coward · · Score: 3, Interesting

    A better analogy is that they keep adding seats and making the whole vehicle slower.

    Kawasaki Ninja == 10GHZ single core (fastest way to get anywhere alone)
    Ford Mustang == 4GHz quad-core (most people only use the front two seats, but if desperate you can squeeze more people in)
    Chevy Suburban == 3.3 GHz 8-core (it seems like everyone wants one, but most people who have a full load just have a bunch of little kiddies)
    Mercedes Sprinter == 2.7 GHz 12-core (just meant to be a grinding people hauler)
    School Bus == 1.2GHz Xeon Phi (slow as hell and very specialized, no normal person would ever want one)
    Double Decker Bus == Peh's stuff (probably a use for mass transit(i.e virtualization) and as a cool novelty)

  9. Re:Moore's Law by Shoten · · Score: 3, Insightful

    Nope, Liquid Nitrogen cooling gets you past the speed limits. How about over 8Ghz on a chip that costs less than $200? Going to Helium and you can get over 8.5Ghz. although both become a bit unweildy when it comes to game play because I don't want my hard drives to freeze. I love that last video there's some real country boy engineering there including using a propane torch and a hair dryer to keep certain components from freezing.

    I'm a little confused as to why you're citing the chip's low low price of "less than $200" if you need liquid nitrogen to get it to perform the way you want it to. You do realize that cooling systems cost money, too...right? There's no point in being able to use a cheap processor to get to X performance benchmark if the required additional support systems cost thousands of dollars more than a more powerful and more expensive processor that can do it out of the box. Not to mention the fact that liquid nitrogen cooling isn't exactly hassle-free, especially in a household environment. And it's worth noting that even if you boost Ghz, you eventually run into choke points related to pushing data to and from the chip anyways. You can give the most important worker on an assembly line all the crystal meth they can eat, but they can't work any faster than the conveyor belt in front of them.

    --

    For your security, this post has been encrypted with ROT-13, twice.
  10. Re:Different Power Supply Voltage by Moof123 · · Score: 4, Interesting

    Banging my head on the table right now.

    Why do people with zero actual semiconductor knowledge try to speak as an authority*?!

    It's a research chip, meaning they don't need to be on the latest process node to show their proof of concept. Larger nodes (much cheaper to design a chip on) have thicker gate passivation layers and run at higher voltage. From an architecture standpoint the process node/voltage are irrelevant. So if their architecture proves out, some bigger outfit can run with it while targetting the latest-greatest itty-bitty process node to increase the clock-rate, drop the power, and reduce the area/price.

    *I am not a processor designer, just a mixed signal (mostly analog) guy, but I've been working in the semiconductor industry, including doing process bake-offs for over a dozen years.

  11. Re:36 cores? Network on a chip? Meh! by TheRaven64 · · Score: 5, Informative

    The core count isn't the interesting thing about this chip. The cores themselves are pretty boring off-the-shelf parts too. I was at the ISCA presentation about this last week and it's actually pretty interesting. I'd recommend reading the paper (linked to from the press release) rather than the press release, because the press release is up to MIT's press department's usual standards (i.e. completely content-free and focussing on totally the wrong thing). The cool stuff is in the interconnect, which uses the bounded latency of the longest path multiplied by single-cycle one-hop delivery times to define an ordering, allowing you to implement a sequentially consistent view of memory relatively cheaply.

    Since I'm here, I'll also throw out a plug for the work we presented at ISCA, The CHERI capability model: Revisiting RISC in an age of risk . We've now open sourced (as a code dump, public VCS coming soon) our (64-bit) MIPS softcore, which is the basis for the experimentation in CHERI. It boots FreeBSD and there are a few sitting around the place that we can ssh into and run. This is pretty nice for experimentation, because it takes about 2 hours to produce and boot a new revision of the CPU.

    --
    I am TheRaven on Soylent News
  12. Re:Moore's Law by ColdWetDog · · Score: 4, Funny

    You can give the most important worker on an assembly line all the crystal meth they can eat, but they can't work any faster than the conveyor belt in front of them.

    Ah! The 21st Century version of the 'mythical man month' - so much more apropos for this audience than the pregnancy analogy.

    --
    Faster! Faster! Faster would be better!
  13. Re:Is there anything new here? by enriquevagu · · Score: 3, Informative

    Some knowledge about multicore cache coherence here. You are completely right, Slashdot's summary does not introduce any novel idea. In fact, a cache-coherent mesh-based multicore system with one router associated to each core was presented on the market years ago by a startup from MIT, Tilera. Also, the article claims that today's cores are connected by a single shared bus -- that's far outdated, since most processors today employ some form of switched communication (an arbitrated ring, a single crossbar, a mesh of routers, etc).

    What the actual ISCA paper presents is a novel mechanism to guarantee total ordering on a distributed network. Essentially, when your network is distributed (i.e., not a single shared bus, basically most current on-chip network) there are several problems with guaranteeing ordering: i) it is really hard to provide a global ordering of messages (like a bus) without making all messages cross a single centralized point which becomes a bottleneck, and ii) if you employ adaptive routing, it is impossible to provide point-to-point ordering of messages.

    Coherence messages are divided in different classes in order to prevent deadlock. Depending on the coherence protocol implementation, messages of certain classes need to be delivered in order between the same pair of endpoints, and for this, some of the virtual networks can require static routing (e.g. Dimension-Ordered Routing in a mesh). Note a "virtual network" is a subset of the network resources which is used by the different classes of coherence messages to prevent deadlock. This is a remedy for the second problem. However, a network that provided global ordering would allow for potentially huge simplifications of the coherence mechanisms, since many races would disappear (the devil is in the details), and a snoopy mechanism would be possible -- as they implement. Additionally, this might also impact the consistency model. In fact, their model implements sequential consistency, which is the most restrictive -- yet simple to reason about -- consistency model.

    Disclaimer: I am not affiliated with their research group, and in fact, I have not read the paper in detail.