Multicore Chips As 'Mini-Internets'
An anonymous reader writes "Today, a typical chip might have six or eight cores, all communicating with each other over a single bundle of wires, called a bus. With a bus, only one pair of cores can talk at a time, which would be a serious limitation in chips with hundreds or even thousands of cores. Researchers at MIT say cores should instead communicate the same way computers hooked to the Internet do: by bundling the information they transmit into 'packets.' Each core would have its own router, which could send a packet down any of several paths, depending on the condition of the network as a whole."
This technology that networks different cores can also serve another purpose, to prevent damage from core failure, and diagnose such failures. If the cores are connected to other cores, the same data can be processed by bypassing a damaged core, making over heating or manufacturing problems important, but almost treatable. Who knows, cores might even get replaceable.
This would work perfectly with a series of (very small) tubes.
I guess MIT has forgotten about the Transputer....
I started reading an immediately had flashbacks to the Transputer
Patent litigation: A doctrine of Mutually Assured Destruction... in which everyone seems willing to push the button
Ah, you're clever; but it's internets all the way down.
Errr... the internal "bus" between cores on modern x86 chips already is either a ring of point to point links or a star with a massive crossbar at the center.
ccNUMA?
Only the State obtains its revenue by coercion. - Murray Rothbard
And then each router, which is a processing unit in its own right, could have multiple cores, which would exhibit the same drawbacks... until you put a network of processors inside that!
I want to delete my account but Slashdot doesn't allow it.
Sounds like history... the history of the Hub in LAN technology.
Maybe it's time to move to a Switch, that can keep multiple core-pairs communicating simultaneously.
AMD uses HT and Intel has its ring bus, both of which use point-to-point links. Buses have serious trouble with the impedance jumps at the taps and clock skew between the lines, that's why nobody is using them in high speed applications any more. Even the venerable SCSI and ATA buses went the way of the dodo. The only bus I can see in my system is DDR3 (and I think that will go away with DDR4 due do the same problems.)
thegodmovie.com - watch it
That's just plain inefficient use of silicon area. They wish to waste some of that limited space on additional logic that isn't strictly necessary. And it will cause a significant bottleneck to be created. Did they forget about DMA controllers or something? You already need a DMA controller no mater what and it's perfectly capable of accessing the necessary memories as it is. Adding some extra capabilities to the DMA controller would be far more efficient in logic area size and most likely lead to a better performance compared to this bad idea.
I admit that despite being a technical user, I was not aware that only 2 chips are allowed to "talk" at a given time. I had (erroneously, it would seem) assumed that in order for a 3+-core chip to be fully useful, such a switch/router would have to already be in place.
So, have Intel, AMD, and others simply tricked us into thinking that a 3+-core chip can actively use all its cores at once (as is the natural assumption), or am I misinterpreting something? If they have, why on earth didn't they include a "router" in the original designs? It seems entirely too obvious for the eggheads in R&D to have missed (or so one would think, anyway). I'm sure there are technical hurdles to overcome, but unless that can be managed, what is really the point of many-core CPUs that can't have all cores acting at once?
Yeah, great idea. Take the very fastest communication that we have on the entire planet, and replace it with the absolute slowest communication we have on the planet. Great idea. And with it, more complexity, more caches, more lookup tables, and more things to go wrong.
The best part is that it's totally unbalanced. Internet protocols are based on a network that's ever-changing and totally unreliable. The bus, on the other hand, is best on total reliability and static.
I'd have thought that a pool concept, or a mailbox metaphor, or a message board analog would have been more appropriate. Something where streams are naturally quantized and sending is unpaired from receiving. Where a recipient can operate at it's own rate uncommon to the sender.
You know, like typical linux interactive sockets, for example. But what do I know.
No "typical" consumer chip 10 years ago had even 4 cores.
As mentioned in other comments, this has been done before. The method of message passing isn't as fundamental as one key point - that it is all explicit message passing.
Intel and AMD x86/x64 CPUs use coherent cache between cores to make sure that a thread running on CPU 1 sees the same RAM as a thread running on CPU 3. This leads to horrible bottlenecks and huge amounts of die tied up in trying to coordinate the writes, maintain coherency between N cores (N-1 ^2 connections!), and it all just goes to hell pretty fast. Intel has this super new transactional memory rollback thing, but it's turd polishing.
The next step is pretty obvious (see Barrelfish) and easy: no shared coherency. Everything is done with message passing. If two threads or processes (it doesn't really matter at that point) want to communicate they need to do it with messages. It's much cleaner than dealing with shared memory synchronization, and makes program flow much more obvious (to me at least - I use message queues even on x86/x64). If you need to share BIG MEMORY between threads, which is reasonable for something like image processing, you at least use messages to explicitly coordinate access to shared memory and the cores don't have to worry about coherency.
This scales extremely well for at least a couple thousand CPUs, which is where the 'local internet' becomes useful.
Where it becomes not easy is that almost all programs written for x86/x64 assume threads can share memory at will. They'd need to be rewritten for this model or would suddenly run a whole lot slower since you'd have to lock them to one core or somehow do the coordination behind their back. It'd be worth it for me!
oh, come on. buses have been dead for years (sata and pcie are great examples of the prevalence of point-to-point links). no reason we can't think of cachelines as packets (bigger than ATM packets were!). how about hypertransport and QPI?
who said anything about 10 years ago, and do you think in 10 years we will have typical consumer machines with "chips with hundreds or even thousands of cores"
in 10 years we will be honestly lucky to have serious machines with "hundreds or even thousands of cores" on the same plane and not strung together with networking.
What are you even referring to?
You're OP was implying this is all garbage because 6-8 cores is a high end chip, not a "typical" one.
Yet 10 years is not a long time - in the past decade 4 would've been a high-end chip, and before that having 2 physical processors would've been significant as well.
So I would think, there is in fact a great deal of importance to this kind of work seeing as how the number of cores per chip for consumer items has grown and grown. And then you undermine your own point by implying we might even be getting close to "hundreds" of cores on a chip in the next 10 years. If we are, then the typical consumer chip will be breaching 8-16 easily. Not to mention thing's like the Cell architecture where Sony was thinking about pushing 24 work-cores onto the chip for the PS4 (backed off since then, but it shows where things are headed).
I can't seem to find the old story or my comment on it, but when Google acquired a 'stealth' startup a year or so ago the most interesting thing about it was that the primary investigator had a few patents for packet-switched CPU's.
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
Slashdot in 2012 is largely technical support people and Windows administrators who hold their MCSAs more dear than their first born. This is how it has to be explained.
what are you replying to, no where does it state "in 10 years"
here just in case you missed it, the very first sentence of the headline
""Today, a typical chip might have six or eight cores, all communicating with each other over a single bundle of wires, called a bus"
in case you missed it again let me point it out to you TODAY, A TYPICAL CHIP MIGHT HAVE SIX OR EIGHT CORES
Come on people. Cores share information and suddenly it's just like the internet? Are these journalist's experiences so narrow that they have no other analogy? It's just a fricking bus! There are networks that exist which are not "the internet". Using the term "internet" implies global connectivity. OK, I expect journalists to be ignorant but please are slashdot editors this confused about basic technology as well?
Didn't AMD just buy a company that did something similar to this? While not at the chip or core level it seems kinda realted
We still do. The only major difference (other than generational improvements) is that these days it's x86 instead of MIPS.
Cyrano de Maniac
The problem is not the hardware but the software. The hardware has been parallel for ages, even locally (GPU, GPU-memory, CPU, memory, HDD, DMA - memory processor, ...).
...). We are doing #2.
Software is a different problem across networked/parallel arena. If you really think about an SMS it is not much more than 'hello world'. You type it and you see text (no other function, other than transport which isn't really a function, has been done) and testing it should be easy. This is not even about parallelism but about communication.
The best way to create software for networking is to not re-write it for all these new parallel architectures/internet (which means you compile it for compartimented execution). This is however, pretty hard to do (I don't know about such an implementation). The alternative is that everybody needs to put all the same glue in its software over and over (RMI, OpenMP,
By the way I think there is a big difference between networking, which has a premise that things fail, and local transport of data/code which is specced on its workings. Fundamentally different (price).
nosig today
isn't this a variation on Cell architecture? except, no one could figure out how to write the OS and compiler to fully realize the goal of programs that could be farmed out by the ARM CPU to the special processors on one chip, let alone farm to multiple cell's over a network.
AB HOC POSSUM VIDERE DOMUM TUUM
It is lolcats all the way down, in a pool of porn with an essence of "Me too" posts.
Anyway, I think the original poster needs to read up on what the Internet is. It is a network of networks. A number of CPU's networking together is just a network. If you could mix many different systems together, it would be an Internet.
If you could put a Intel cpu next to an AMD and they would just work together seamlessly, THAT would be an Internet.
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
The seminal paper proposing the use of switched/routed interconnection networks on-chip (NoCs) was published by Dally and Towels 11 years ago in DAC'01: Route packets, not wires: On-chip interconnection networks. The idea of associating a router to each core and replicating it in "tiles" is not new either; Tilera was (IIRC) the first company to sell processors based on a tiled design, which was an evolution of the RAW research project. A related research project, the TRIPs, replicated functional units on each tile, rather than full cores. Intel has used a tiled design in the Polaris, SSC and MIC (which includes the forthcoming Knights Corner).
So no, the idea of using routed interconnects is not new at all. In fact, after reading the linked article, turns out that 2/3ths of the text are introducing the idea, and the last section details the contributions: Two ideas developed by the group of Li-Shiuan Peh seeking to improve performance (by using virtual bypassing, a form of routing precomputation) and reducing power consumption (using low-swing signaling).
The idea is, that this is not 'a' bus, but many of them, making up several possible alternative routes.
A device deciding what route to take, is a router.
cores should instead communicate the same way computers hooked to the Internet do
apparently never heard of beowulf clustering
> MIT please get out of the dreams lab once in a while
Actually, no chip-designer wants to use a network-on-chip if they can avoid it due to the added complexity. However, for future SoC designs with hundred of modules it will simply not be efficient to have direct parallel links between every module on the chip. A network will in many cases therefore be the best trade-off between silicon area, bandwidth, and energy efficiency.
Also, note that a typical SoC used in for example a mobile phone already have significantly more eight cores (although most of these cores are not processors, they still require communication links of some sort). (Take the OMAP4470 as an example [1] - it has at least, two Cortex-A9, one IVA3 accelerator, powervr graphics, a signal processor, SDRAM controller, flash controller, MMC controller, HDMI output, SPI controllers, I2C controllers, SDIO controller, UART controller, USB controller, GPIO controller, etc). So if MIT is in a dream lab, the only thing they are doing is trying to come up with a way to handle the nightmare that future on-chip communication entails.
Then why do all Intel CPU's, except a very small amount of xeon CPU's, have only 4 cores max, even the new Ivy Bridge ones to be released this year, even though 5 years ago they also had chips with 4 cores already?
... now my mother will finally have Internet in her computer!
The idea is to make people say 'MIT? They're full of really smart people!' As with the last dozen or so MIT press releases published on Slashdot, it describes, in very vague term, an idea that people in the field have been working on in various institutions for a decade or so. I don't know what MIT is like for research these days, but their press office is probably the best of any university in the world.
I am TheRaven on Soylent News
I was going to say this seems to be the realisation that the Transputer had the answers decades ago, but it seems many others have said exactly the same thing. I shall resume my nap.,....
Donte Alistair Anderson Roberts - hi son!
Karma: Chameleon
The idea only works until one of the cores starts sending spam. Hey core, want Vi@gra?
The Network on Chip has been around as a concept so long we even have an abbreviation (NoC). Maybe this isn't in commodity products, but basically if you want to do an NoC, you don't have to invent anything yourself. There are several conferences and journals that have been publishing papers on this for decades. But, OH, if a professor from MIT mentions it, it must be something NEW. Sheesh.
The idea is actually a couple ideas as to how to do that. The idea of meshed connectivity in CPUs is far from news. The news here is the call-based protocol they developed by which one CPU sets up another for cut-through switching, and their power-saving "low swing" wire encoding.
A problem in this sub-field and in the CPU architecture field at large is the complexity ramps crazily the more interoperating time constraints get thrown into the mix. This means if they want predicatble, real-time results, programmers will need more intimate knowlege of the specific systems one which their code will be running, and along with supporting multiple platforms, this could get unmanageably complex. (For embarassingly parrallelizable throughput-oriented code with few real-time performance expectations, it should not be quite so much of a problem, but even there the potential for overcomplexity exists..)
I don't doubt the technology they are developing will lay groundwork for on-silicon networking and will be useful at some point. It may even end up being used as they intend, but will also likely be useful for more heterogenious circuits. The holy grail of course is a full mesh (likely using optics), and there's always the chance we might leapfrog straight to that should the right combination of innovation and investment occur.
Someone had to do it.
You'd think someone with a 7-digit UID wouldn't be so arrogant.
Support the EFF and Creative Commons. The war is coming, and they're supporting you...
I did speak about the general idea people have been working on, not MIT in particular.
The point is, this is not just "a glorified name for bus arbitrators" but a different concept...
Looks like a deja-vu, considering MIT's Connection Machine. While the interconnect will be less regular (not a hypercube), the message passing between cores will have to be routed in one way or the other, just as with the CM. So how is that news?
cpghost at Cordula's Web.
Sun pegged it right when they said "The Network Is The Computer."
The specific speed of the network interconnect, the topology of the network fabric, and whether you normally think of it as a network connection are all that distinguish any multi-core system from a distributed cluster. Cloud computing begins to scratch the implications of this at the cluster/site level, and now it would seem some VLSI gearheads are thinking in the same abstract model at the chip level.
Once you start thinking of all your compute and storage resources as nodes in a network, you can start applying some very interesting algorithms and research results to the problem of improving throughput and reducing latency within the network of networks.
But if the network is the computer, I guess that makes a distributed global collection of nodes the Cluster.
I do not fail; I succeed at finding out what does not work.
Been there one that. http://en.wikipedia.org/wiki/Transputer
XDBus: a high-performance, consistent, packet-switched VLSI bus
This paper appears in:
Compcon Spring '93, Digest of Papers.
Date of Conference: 22-26 Feb 1993
Author(s): Sindhu, P.
Xerox Palo Alto Res. Center, CA
Frailong, J.-M. ; Gastinel, J. ; Cekleov, M. ; Yuan, L. ; Gunning, B. ; Curry, D.
On Page(s): 338 - 344
The XDBus is a low-cost, synchronous, packet-switched VLSI bus designed for use in high-performance multiprocessors. The bus provides an efficient coherency protocol which guarantees processors a consistent view of memory in the presence of caches and IO. Low-voltage swing (GTL) CMOS drivers connected to balanced transmission line traces ensure low power as well as high speed for chip, board, and as backplane applications. The signaling scheme and coherency protocol work together to promote a high level of system integration, while permitting a wide variety of configurations to be realized. These configurations include small single board systems, multiple bus systems, multiboard backplane systems, and multilevel cache systems. The bus is used in several commercial systems including Sun Microsystem's new SPARCcenter 2000 series.
David B. Chorlian
I was around long before they started handing out 7-digit UIDs. It isn't arrogance; it's an accurate observation.
Any modern motherboard has that, PCIe is developed around that, HyperTransport uses it, CPU cache architectures use it.
Contrary to the popular belief, there indeed is no God.
But not mainstream CPUs.
It was in "mainstream CPUs" since the first multicore designs. Better yet, HyperTransport, a CPU-attached bus, implemented routing. As in, rule-based packet forwarding.
Contrary to the popular belief, there indeed is no God.