Researchers Unveil Experimental 36-Core Chip
rtoz writes The more cores — or processing units — a computer chip has, the bigger the problem of communication between cores becomes. For years, Li-Shiuan Peh, the Singapore Research Professor of Electrical Engineering and Computer Science at MIT, has argued that the massively multicore chips of the future will need to resemble little Internets, where each core has an associated router, and data travels between cores in packets of fixed size. This week, at the International Symposium on Computer Architecture, Peh's group unveiled a 36-core chip that features just such a "network-on-chip." In addition to implementing many of the group's earlier ideas, it also solves one of the problems that has bedeviled previous attempts to design networks-on-chip: maintaining cache coherence, or ensuring that cores' locally stored copies of globally accessible data remain up to date.
As an aside: It's been a while since we've seen any decent rise in processor Ghz.
Just to abuse a car analogy: Maybe it's time we stop revving up and instead shift gears.
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
A higher high/low voltage swing (with a reasonable amount of other stuff being equal) will be more of a thermal nuisance; but if the perks make up for it, that's hardly a dealbreaker. The toasty end of boring desktop CPUs is somewhere north of 200watts already, with a little shoving that they typically survive, so if somebody really wants 36 cache-coherent cores on-die, they'll suck it up and make it work.
For applications that don't specifically demand that, I'd be interested to know how the costs and benefits of 'dealing with the cooling demands of a smaller number of denser parts' compare with 'dealing with the cooling demands of more, cooler, parts, closer to whatever the performance per watt sweet spot is; but with more cabling, PSUs, switches, and similar interconnect and support stuff to buy and power'...
Yes, as usual, the MIT press release oversells the research, while the original paper [pdf] is a bit more careful in its claims. The paper makes clear that the novel contribution isn't the idea of putting "little internets" (as the press release calls them) on a chip, but acknowledges that there is already a lot of research in the area of on-chip routing between cores. The paper's contribution is to propose a new cache coherence scheme which they claim has scalability advantages over existing schemes.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
pointer arithmetic, cache invalidation, and off-by-one errors
"I'd just like to emphasise that taking a million years isn't a metaphor here..." -Rich Bradshaw
The basic idea isn't new. What the paper is really claiming is new is their particular cache coherence scheme, which (to quote from the Conclusion) "supports global ordering of requests on a mesh network by decoupling the message delivery from the ordering", making it "able to address key coherence scalability concerns".
How novel and useful that is I don't know, because it's really a more specialist contribution than the headline claims, to be evaluated by people who are experts in multicore cache coherence schemes.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
Banging my head on the table right now.
Why do people with zero actual semiconductor knowledge try to speak as an authority*?!
It's a research chip, meaning they don't need to be on the latest process node to show their proof of concept. Larger nodes (much cheaper to design a chip on) have thicker gate passivation layers and run at higher voltage. From an architecture standpoint the process node/voltage are irrelevant. So if their architecture proves out, some bigger outfit can run with it while targetting the latest-greatest itty-bitty process node to increase the clock-rate, drop the power, and reduce the area/price.
*I am not a processor designer, just a mixed signal (mostly analog) guy, but I've been working in the semiconductor industry, including doing process bake-offs for over a dozen years.
The core count isn't the interesting thing about this chip. The cores themselves are pretty boring off-the-shelf parts too. I was at the ISCA presentation about this last week and it's actually pretty interesting. I'd recommend reading the paper (linked to from the press release) rather than the press release, because the press release is up to MIT's press department's usual standards (i.e. completely content-free and focussing on totally the wrong thing). The cool stuff is in the interconnect, which uses the bounded latency of the longest path multiplied by single-cycle one-hop delivery times to define an ordering, allowing you to implement a sequentially consistent view of memory relatively cheaply.
Since I'm here, I'll also throw out a plug for the work we presented at ISCA, The CHERI capability model: Revisiting RISC in an age of risk . We've now open sourced (as a code dump, public VCS coming soon) our (64-bit) MIPS softcore, which is the basis for the experimentation in CHERI. It boots FreeBSD and there are a few sitting around the place that we can ssh into and run. This is pretty nice for experimentation, because it takes about 2 hours to produce and boot a new revision of the CPU.
I am TheRaven on Soylent News
You can give the most important worker on an assembly line all the crystal meth they can eat, but they can't work any faster than the conveyor belt in front of them.
Ah! The 21st Century version of the 'mythical man month' - so much more apropos for this audience than the pregnancy analogy.
Faster! Faster! Faster would be better!