Multicore Chips As 'Mini-Internets'
An anonymous reader writes "Today, a typical chip might have six or eight cores, all communicating with each other over a single bundle of wires, called a bus. With a bus, only one pair of cores can talk at a time, which would be a serious limitation in chips with hundreds or even thousands of cores. Researchers at MIT say cores should instead communicate the same way computers hooked to the Internet do: by bundling the information they transmit into 'packets.' Each core would have its own router, which could send a packet down any of several paths, depending on the condition of the network as a whole."
This technology that networks different cores can also serve another purpose, to prevent damage from core failure, and diagnose such failures. If the cores are connected to other cores, the same data can be processed by bypassing a damaged core, making over heating or manufacturing problems important, but almost treatable. Who knows, cores might even get replaceable.
This would work perfectly with a series of (very small) tubes.
Having worked at some of the technology that is used in bus-arbitrators within SoC's these days, I can understand the need for better bus arbitrators, but terming it as mini-internet, routers, c'mon.
I guess MIT has forgotten about the Transputer....
I started reading an immediately had flashbacks to the Transputer
Patent litigation: A doctrine of Mutually Assured Destruction... in which everyone seems willing to push the button
Ah, you're clever; but it's internets all the way down.
Errr... the internal "bus" between cores on modern x86 chips already is either a ring of point to point links or a star with a massive crossbar at the center.
...embedded SOPA and PIPA :-P
ccNUMA?
Only the State obtains its revenue by coercion. - Murray Rothbard
And then each router, which is a processing unit in its own right, could have multiple cores, which would exhibit the same drawbacks... until you put a network of processors inside that!
I want to delete my account but Slashdot doesn't allow it.
So they'll have multiple busses, then. That's a rather goofy way of wording it.
It's not the packet switching itself that is improving performance, it's the extra bandwidth.
Sounds like history... the history of the Hub in LAN technology.
Maybe it's time to move to a Switch, that can keep multiple core-pairs communicating simultaneously.
AMD uses HT and Intel has its ring bus, both of which use point-to-point links. Buses have serious trouble with the impedance jumps at the taps and clock skew between the lines, that's why nobody is using them in high speed applications any more. Even the venerable SCSI and ATA buses went the way of the dodo. The only bus I can see in my system is DDR3 (and I think that will go away with DDR4 due do the same problems.)
thegodmovie.com - watch it
That's just plain inefficient use of silicon area. They wish to waste some of that limited space on additional logic that isn't strictly necessary. And it will cause a significant bottleneck to be created. Did they forget about DMA controllers or something? You already need a DMA controller no mater what and it's perfectly capable of accessing the necessary memories as it is. Adding some extra capabilities to the DMA controller would be far more efficient in logic area size and most likely lead to a better performance compared to this bad idea.
after the data is chopped up, formatted, sent down a narrow serial pipe is so much faster than directly over a parallel link, and besides no a TYPICAL chip has 2 to 4 cores, 6-8 would imply a higher end chip that currently is quite expensive and not in TYPICAL use by TYPICAL people.
MIT please get out of the dreams lab once in a while
I admit that despite being a technical user, I was not aware that only 2 chips are allowed to "talk" at a given time. I had (erroneously, it would seem) assumed that in order for a 3+-core chip to be fully useful, such a switch/router would have to already be in place.
So, have Intel, AMD, and others simply tricked us into thinking that a 3+-core chip can actively use all its cores at once (as is the natural assumption), or am I misinterpreting something? If they have, why on earth didn't they include a "router" in the original designs? It seems entirely too obvious for the eggheads in R&D to have missed (or so one would think, anyway). I'm sure there are technical hurdles to overcome, but unless that can be managed, what is really the point of many-core CPUs that can't have all cores acting at once?
SGI did this in just about every computer it produced from the early 90s until they stopped making MIPS machines (or existing, really). You could use Craylink cables and R-bricks to turn multiple C-bricks (full-fledged Origin servers with 1-4 CPUs), into single-system-image ccNUMA machines. They had quite a few big Origin machines in the Top 500 back in the day.
Bonus points, my capcha was "networking".
Yeah, great idea. Take the very fastest communication that we have on the entire planet, and replace it with the absolute slowest communication we have on the planet. Great idea. And with it, more complexity, more caches, more lookup tables, and more things to go wrong.
The best part is that it's totally unbalanced. Internet protocols are based on a network that's ever-changing and totally unreliable. The bus, on the other hand, is best on total reliability and static.
I'd have thought that a pool concept, or a mailbox metaphor, or a message board analog would have been more appropriate. Something where streams are naturally quantized and sending is unpaired from receiving. Where a recipient can operate at it's own rate uncommon to the sender.
You know, like typical linux interactive sockets, for example. But what do I know.
I had an idea for a MMO game, where people would use personal computer hardware up against an internet, but everyone would have multiple entry points because of processor complexity.
Unfortunately, as designed, this means an all out war on the internet, with no security nor privacy.
by Tandem computers, like a long time.
As mentioned in other comments, this has been done before. The method of message passing isn't as fundamental as one key point - that it is all explicit message passing.
Intel and AMD x86/x64 CPUs use coherent cache between cores to make sure that a thread running on CPU 1 sees the same RAM as a thread running on CPU 3. This leads to horrible bottlenecks and huge amounts of die tied up in trying to coordinate the writes, maintain coherency between N cores (N-1 ^2 connections!), and it all just goes to hell pretty fast. Intel has this super new transactional memory rollback thing, but it's turd polishing.
The next step is pretty obvious (see Barrelfish) and easy: no shared coherency. Everything is done with message passing. If two threads or processes (it doesn't really matter at that point) want to communicate they need to do it with messages. It's much cleaner than dealing with shared memory synchronization, and makes program flow much more obvious (to me at least - I use message queues even on x86/x64). If you need to share BIG MEMORY between threads, which is reasonable for something like image processing, you at least use messages to explicitly coordinate access to shared memory and the cores don't have to worry about coherency.
This scales extremely well for at least a couple thousand CPUs, which is where the 'local internet' becomes useful.
Where it becomes not easy is that almost all programs written for x86/x64 assume threads can share memory at will. They'd need to be rewritten for this model or would suddenly run a whole lot slower since you'd have to lock them to one core or somehow do the coordination behind their back. It'd be worth it for me!
oh, come on. buses have been dead for years (sata and pcie are great examples of the prevalence of point-to-point links). no reason we can't think of cachelines as packets (bigger than ATM packets were!). how about hypertransport and QPI?
I can't seem to find the old story or my comment on it, but when Google acquired a 'stealth' startup a year or so ago the most interesting thing about it was that the primary investigator had a few patents for packet-switched CPU's.
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
Come on people. Cores share information and suddenly it's just like the internet? Are these journalist's experiences so narrow that they have no other analogy? It's just a fricking bus! There are networks that exist which are not "the internet". Using the term "internet" implies global connectivity. OK, I expect journalists to be ignorant but please are slashdot editors this confused about basic technology as well?
Didn't AMD just buy a company that did something similar to this? While not at the chip or core level it seems kinda realted
The problem is not the hardware but the software. The hardware has been parallel for ages, even locally (GPU, GPU-memory, CPU, memory, HDD, DMA - memory processor, ...).
...). We are doing #2.
Software is a different problem across networked/parallel arena. If you really think about an SMS it is not much more than 'hello world'. You type it and you see text (no other function, other than transport which isn't really a function, has been done) and testing it should be easy. This is not even about parallelism but about communication.
The best way to create software for networking is to not re-write it for all these new parallel architectures/internet (which means you compile it for compartimented execution). This is however, pretty hard to do (I don't know about such an implementation). The alternative is that everybody needs to put all the same glue in its software over and over (RMI, OpenMP,
By the way I think there is a big difference between networking, which has a premise that things fail, and local transport of data/code which is specced on its workings. Fundamentally different (price).
nosig today
isn't this a variation on Cell architecture? except, no one could figure out how to write the OS and compiler to fully realize the goal of programs that could be farmed out by the ARM CPU to the special processors on one chip, let alone farm to multiple cell's over a network.
AB HOC POSSUM VIDERE DOMUM TUUM
It is lolcats all the way down, in a pool of porn with an essence of "Me too" posts.
Anyway, I think the original poster needs to read up on what the Internet is. It is a network of networks. A number of CPU's networking together is just a network. If you could mix many different systems together, it would be an Internet.
If you could put a Intel cpu next to an AMD and they would just work together seamlessly, THAT would be an Internet.
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
I believe this is the ultimate solution to parallelization. A total realization of actors.
That'd be all great and applicable to say... a Pentium D. Those processors had cores that, in fact, communicated via the local bus.
The seminal paper proposing the use of switched/routed interconnection networks on-chip (NoCs) was published by Dally and Towels 11 years ago in DAC'01: Route packets, not wires: On-chip interconnection networks. The idea of associating a router to each core and replicating it in "tiles" is not new either; Tilera was (IIRC) the first company to sell processors based on a tiled design, which was an evolution of the RAW research project. A related research project, the TRIPs, replicated functional units on each tile, rather than full cores. Intel has used a tiled design in the Polaris, SSC and MIC (which includes the forthcoming Knights Corner).
So no, the idea of using routed interconnects is not new at all. In fact, after reading the linked article, turns out that 2/3ths of the text are introducing the idea, and the last section details the contributions: Two ideas developed by the group of Li-Shiuan Peh seeking to improve performance (by using virtual bypassing, a form of routing precomputation) and reducing power consumption (using low-swing signaling).
Sort of like the AMD Hypertransport then. Multiple of them on each CPU and packet switched... Add in variable width connections downstream and it's pretty cool. Actually the best of both since it is a common bus, packet switched, and the processors had multiples of them... And suitable for off chip as well as on. Then Intel did a similar thing but sort of didn't exploit it in FBDIMM... Oh and I am an Inmos fan from way back. Variable width operations, 4 5mhz serial bus connections, and a massive matrix switch in the family, not to mention dedicated serial bus connected disk controller and graphics chips... Cool stuff killed off when sgs bought them. And they had a cool language ocean with intrinsics for trimmers and inter processor com channels... And extended c for that too. And it was very easy to farm tasks out through an array of CPUs. They included parallel and serialized operations in the supported languages in the simplest manner ever.
cores should instead communicate the same way computers hooked to the Internet do
apparently never heard of beowulf clustering
Then why do all Intel CPU's, except a very small amount of xeon CPU's, have only 4 cores max, even the new Ivy Bridge ones to be released this year, even though 5 years ago they also had chips with 4 cores already?
... now my mother will finally have Internet in her computer!
But now the patents have expired...
So anyone can implement the solutions.
I was going to say this seems to be the realisation that the Transputer had the answers decades ago, but it seems many others have said exactly the same thing. I shall resume my nap.,....
Donte Alistair Anderson Roberts - hi son!
Karma: Chameleon
YOU HAVE INVENTED A BUS! It's time to start working on the first multitasking OS!
What is it with idiots coming out of the woodwork presenting old (and often obsolete and abandoned such as virtualization) technologies as some kind of new development?
Contrary to the popular belief, there indeed is no God.
The idea only works until one of the cores starts sending spam. Hey core, want Vi@gra?
The Network on Chip has been around as a concept so long we even have an abbreviation (NoC). Maybe this isn't in commodity products, but basically if you want to do an NoC, you don't have to invent anything yourself. There are several conferences and journals that have been publishing papers on this for decades. But, OH, if a professor from MIT mentions it, it must be something NEW. Sheesh.
Looks like a deja-vu, considering MIT's Connection Machine. While the interconnect will be less regular (not a hypercube), the message passing between cores will have to be routed in one way or the other, just as with the CM. So how is that news?
cpghost at Cordula's Web.
Old technology (NoC) being applied by someone famous is not news
NO NO NO YOU RETARDS
A CPU is not an Internet!
Go back to school and get some real education.
Sun pegged it right when they said "The Network Is The Computer."
The specific speed of the network interconnect, the topology of the network fabric, and whether you normally think of it as a network connection are all that distinguish any multi-core system from a distributed cluster. Cloud computing begins to scratch the implications of this at the cluster/site level, and now it would seem some VLSI gearheads are thinking in the same abstract model at the chip level.
Once you start thinking of all your compute and storage resources as nodes in a network, you can start applying some very interesting algorithms and research results to the problem of improving throughput and reducing latency within the network of networks.
But if the network is the computer, I guess that makes a distributed global collection of nodes the Cluster.
I do not fail; I succeed at finding out what does not work.
Been there one that. http://en.wikipedia.org/wiki/Transputer
This is really starting to FINALLY catch up with the Silicon Graphics ccNUma, S2MP and Crossbar architectures. See section 3.1 Network Topology in the following paper: .02.
http://www.cs.washington.edu/education/courses/cse549/07wi/files/sgiorigin.pdf
Until you can MASSIVELY improve the backplane speed of the common PC your bottleneck will be at the back plane. My
XDBus: a high-performance, consistent, packet-switched VLSI bus
This paper appears in:
Compcon Spring '93, Digest of Papers.
Date of Conference: 22-26 Feb 1993
Author(s): Sindhu, P.
Xerox Palo Alto Res. Center, CA
Frailong, J.-M. ; Gastinel, J. ; Cekleov, M. ; Yuan, L. ; Gunning, B. ; Curry, D.
On Page(s): 338 - 344
The XDBus is a low-cost, synchronous, packet-switched VLSI bus designed for use in high-performance multiprocessors. The bus provides an efficient coherency protocol which guarantees processors a consistent view of memory in the presence of caches and IO. Low-voltage swing (GTL) CMOS drivers connected to balanced transmission line traces ensure low power as well as high speed for chip, board, and as backplane applications. The signaling scheme and coherency protocol work together to promote a high level of system integration, while permitting a wide variety of configurations to be realized. These configurations include small single board systems, multiple bus systems, multiboard backplane systems, and multilevel cache systems. The bus is used in several commercial systems including Sun Microsystem's new SPARCcenter 2000 series.
David B. Chorlian
For a good time check out GreenArrays GA-144. 144 forth computers on one chip. No bus, each computer only communicates with neighbors, like described in the article. http://www.greenarraychips.com/