Slashdot Mirror


Multicore Chips As 'Mini-Internets'

An anonymous reader writes "Today, a typical chip might have six or eight cores, all communicating with each other over a single bundle of wires, called a bus. With a bus, only one pair of cores can talk at a time, which would be a serious limitation in chips with hundreds or even thousands of cores. Researchers at MIT say cores should instead communicate the same way computers hooked to the Internet do: by bundling the information they transmit into 'packets.' Each core would have its own router, which could send a packet down any of several paths, depending on the condition of the network as a whole."

132 comments

  1. A fault-tolerant chip? by Anonymous Coward · · Score: 5, Interesting

    This technology that networks different cores can also serve another purpose, to prevent damage from core failure, and diagnose such failures. If the cores are connected to other cores, the same data can be processed by bypassing a damaged core, making over heating or manufacturing problems important, but almost treatable. Who knows, cores might even get replaceable.

    1. Re:A fault-tolerant chip? by Mitchell314 · · Score: 2

      What are the chances you damage the chip without damaging enough of it to be rendered inoperable?

      --
      I read TFA and all I got was this lousy cookie
    2. Re:A fault-tolerant chip? by Osgeld · · Score: 4, Interesting

      pretty good, few years ago I ran for months on a dual core with one blown out, worked fine until I fired up something that used both, then it would die.

    3. Re:A fault-tolerant chip? by AdamHaun · · Score: 4, Interesting

      This sort of technology already exists to an extent. TI's Hercules TMS570 microcontrollers have two CPUs that run in lockstep along with a bus comparison module. I think total fail-tolerance might take three CPUs, but this provides strong hardware fault detection in addition to the usual ECC and other monitoring/correction stuff.

      Note that run-time fault tolerance is mostly needed for safety-critical systems. The customers who buy these products do not do so to get better yield, they do so to guarantee that their airbags, anti-lock brakes, or medical devices won't kill anyone. As such, manufacturing quality is very high. Also, die size is significantly larger than comparable general market (non-safety) devices. This means they cost a small fortune. The PC equivalent would be MLC vs. SLC SSDs. Consumer products usually don't waste money on that kind of reliability unless they need it. Now a super-expensive server CPU, maybe...

      [Disclaimer: I am a TI employee, but this is not an official advertisement for TI. Do not use any product in safety-critical systems without contacting the manufacturer, or at least a good lawyer. I am not responsible for damage to humans, machinery, or small woodland creatures that may result from improper use of TI products.]

      --
      Visit the
    4. Re:A fault-tolerant chip? by Electricity+Likes+Me · · Score: 4, Informative

      Also this is exactly what chip makers already do to a great extent: the binning of CPUs by speeds is not a targeted process. You make a bunch of a chips, test them, and then sell them as whatever clock speed they are robustly stable at.

    5. Re:A fault-tolerant chip? by Osgeld · · Score: 2

      yep, its also why overclocking is popular/popular, robustly stable, and stable are 2 different things depending on where they end up at and testing tolerances. That 2.5Ghz chip may run at 2.7Ghz just fine and dandy, but out of spec with regards to voltage or temperature, even by a little.

      you dont want dell refusing a gigantic pile of chips cause a few bad products, causing a quality alert, which is very costly and time consuming to both parties

    6. Re:A fault-tolerant chip? by Joce640k · · Score: 5, Interesting

      Also this is exactly what chip makers already do to a great extent: the binning of CPUs by speeds is not a targeted process. You make a bunch of a chips, test them, and then sell them as whatever clock speed they are robustly stable at.

      Nope. The markings on a chip do NOT necessarily indicate what the chip is capable of.

      Chips are sorted by ability, yes, but many are deliberately downgraded to fill incoming orders for less powerful chips. Bits of them are disabled/underclocked even though they passed all stability tests simply because that's what the days incoming orders were for.

      --
      No sig today...
    7. Re:A fault-tolerant chip? by Joce640k · · Score: 3, Interesting

      This sort of technology already exists to an extent. TI's Hercules TMS570 microcontrollers have two CPUs that run in lockstep along with a bus comparison module. I think total fail-tolerance might take three CPUs....

      This is just to detect when an individual CPU has failed. To build a fault-tolerant system you need multiple CPUs.

      nb. The 'three CPUs' thing isn't done for detection of hardware faults it's for software faults. The idea is to get three different programmers to write three different programs with a specified output. You then compare the outputs of the programs and if one is different it's likely to be a bug.

      --
      No sig today...
    8. Re:A fault-tolerant chip? by Joce640k · · Score: 1

      nb. The 'three CPUs' thing isn't done for detection of hardware faults it's for software faults.

      ...although it will detect non-catastrophic hardware faults as well, obviously.

      --
      No sig today...
    9. Re:A fault-tolerant chip? by Thiez · · Score: 1

      > nb. The 'three CPUs' thing isn't done for detection of hardware faults it's for software faults. The idea is to get three different programmers to write three different programs with a specified output. You then compare the outputs of the programs and if one is different it's likely to be a bug.

      Why would you need three CPUs when you can just have three threads that run on any number of CPUs?

    10. Re:A fault-tolerant chip? by morgauxo · · Score: 3, Interesting

      Years ago I had a single core chip with a damaged FPU. It took me forever to figure out the problem, my computer could only run Gentoo. Windows and Debian, both which it had ran previously gave me all sorts of weird errors I had never seen before. I had to keep using it because I was in college and didn't have money for another one so I just got used to Gentoo. Even in Gentoo anything which wasn't compiled from scratch was likely to crash in weird ways. (a clue) I finally diagnosed the problem a couple years later when a family member gave me a disk that boots up and runs all sorts of tests on the hardware. It turned out Gentoo worked because when software compiled it recognized the lack of an FPU and compiled in floating point emulation like it was dealing with an old 486sx chip.

      So, anyway, if that can happen I would imagine damaging a single core of a multicore chip is quite possible.

    11. Re:A fault-tolerant chip? by TheLink · · Score: 1

      Also depends on how competitive the market is. Currently AMD isn't a strong competitor so Intel can do stuff like release software upgradeable CPUs. So no surprise if many recent Intel CPUs can be overclocked significantly. Seems like we're back in the days of 50% overclock (anyone remember the Celeron 300A?). Even Intel is officially selling overclockable CPUs.

      --
    12. Re:A fault-tolerant chip? by Anonymous Coward · · Score: 0

      3 different architectures for fault tolerance on the instruction set. So you have a Core i5, a 6502, and a Cell running equivalent code. Bad set of examples, of course, because they don't perform at even near the same speeds, but you get the idea.

    13. Re:A fault-tolerant chip? by lars_stefan_axelsson · · Score: 1

      nb. The 'three CPUs' thing isn't done for detection of hardware faults it's for software faults. The idea is to get three different programmers to write three different programs with a specified output. You then compare the outputs of the programs and if one is different it's likely to be a bug.

      Yes it is. Specifically, you need three to not only detect that one is misbehaving, but also to determine which is more likely to misbehave. This is if you can trust you comparison node. If you cannot, then in general you need at a minimum of 3n+1 nodes to detect 'n' nodes misbehaving given a Byzantine failure formulation. (That's why the Space Shuttle had 4 primary flight control computers all running the same software. And a fifth one that didn't, but that was different.) Many systems, e.g. in telecoms, still make due with two, and then go to a special fautl recovery mode when a failure/error is detected.

      Indeed, having three separate channels, all running the same software, is the most common in e.g. flight control situations. Running "different" software doesn't actually work, (Knight and Leveson demonstrated this quite some time ago, see e.g. http://sunnyday.mit.edu/critics.pdf which contains their response to later critisisms and a ref to the original study), i.e. programmers don't make independent faults. Different software does wreak havoc with running in parallell though, so it's rarely (if ever) done in practice.

      --
      Stefan Axelsson
    14. Re:A fault-tolerant chip? by Raenex · · Score: 1

      That's good reading. Thanks for the link.

  2. Nanotubes! by Kraftwerk · · Score: 1

    This would work perfectly with a series of (very small) tubes.

  3. A glorified name for better bus arbitrators by Anonymous Coward · · Score: 0

    Having worked at some of the technology that is used in bus-arbitrators within SoC's these days, I can understand the need for better bus arbitrators, but terming it as mini-internet, routers, c'mon.

    1. Re:A glorified name for better bus arbitrators by mikkelm · · Score: 1, Informative

      Slashdot in 2012 is largely technical support people and Windows administrators who hold their MCSAs more dear than their first born. This is how it has to be explained.

    2. Re:A glorified name for better bus arbitrators by zAPPzAPP · · Score: 1

      The idea is, that this is not 'a' bus, but many of them, making up several possible alternative routes.
      A device deciding what route to take, is a router.

    3. Re:A glorified name for better bus arbitrators by TheRaven64 · · Score: 1

      The idea is to make people say 'MIT? They're full of really smart people!' As with the last dozen or so MIT press releases published on Slashdot, it describes, in very vague term, an idea that people in the field have been working on in various institutions for a decade or so. I don't know what MIT is like for research these days, but their press office is probably the best of any university in the world.

      --
      I am TheRaven on Soylent News
    4. Re:A glorified name for better bus arbitrators by skids · · Score: 1

      The idea is actually a couple ideas as to how to do that. The idea of meshed connectivity in CPUs is far from news. The news here is the call-based protocol they developed by which one CPU sets up another for cut-through switching, and their power-saving "low swing" wire encoding.

      A problem in this sub-field and in the CPU architecture field at large is the complexity ramps crazily the more interoperating time constraints get thrown into the mix. This means if they want predicatble, real-time results, programmers will need more intimate knowlege of the specific systems one which their code will be running, and along with supporting multiple platforms, this could get unmanageably complex. (For embarassingly parrallelizable throughput-oriented code with few real-time performance expectations, it should not be quite so much of a problem, but even there the potential for overcomplexity exists..)

      I don't doubt the technology they are developing will lay groundwork for on-silicon networking and will be useful at some point. It may even end up being used as they intend, but will also likely be useful for more heterogenious circuits. The holy grail of course is a full mesh (likely using optics), and there's always the chance we might leapfrog straight to that should the right combination of innovation and investment occur.

    5. Re:A glorified name for better bus arbitrators by dyingtolive · · Score: 2

      You'd think someone with a 7-digit UID wouldn't be so arrogant.

      --
      Support the EFF and Creative Commons. The war is coming, and they're supporting you...
    6. Re:A glorified name for better bus arbitrators by zAPPzAPP · · Score: 1

      I did speak about the general idea people have been working on, not MIT in particular.
      The point is, this is not just "a glorified name for bus arbitrators" but a different concept...

    7. Re:A glorified name for better bus arbitrators by mikkelm · · Score: 1

      I was around long before they started handing out 7-digit UIDs. It isn't arrogance; it's an accurate observation.

  4. way back machine by Anonymous Coward · · Score: 5, Insightful

    I guess MIT has forgotten about the Transputer....

  5. Back to the future moment? by GumphMaster · · Score: 4, Insightful

    I started reading an immediately had flashbacks to the Transputer

    --
    Patent litigation: A doctrine of Mutually Assured Destruction... in which everyone seems willing to push the button
    1. Re:Back to the future moment? by tibit · · Score: 4, Interesting

      Alive and well as XMOS products. I love those chips.

      --
      A successful API design takes a mixture of software design and pedagogy.
    2. Re:Back to the future moment? by gman003 · · Score: 1

      Or, more recently, Intel's many-core prototypes used this. At the very least, the "Single-Chip Cloud Computer" used a mesh network, and I think Larrabee had such a thing as well...

    3. Re:Back to the future moment? by Anonymous Coward · · Score: 0

      Exactly. Company I worked at the time developed compiler backends for the T9000 (among others). Weird but elegant stack-based architecture and very integrated CPU networking concept.

    4. Re:Back to the future moment? by jd · · Score: 3, Informative

      The Transputer was a brilliant design. Intel came up with a next-gen variant, called the iWarp, but never did anything with it and eventually abandoned the concept.

      IIRC, each Transputer had four serial lines where each could be in transmit or receive mode. They each had their own memory management (16K on-board, extendable up to 4 gigs - it was a true 32-bit architecture) so there was never any memory contention. Arrays of thousands of Transputers, arranged in a Hypercube topology, were developed and could out-perform the Cray X-MP at a fraction of the cost.

      Having a similar communications system in modern CPUs would certainly be doable. It would have the major benefit over a bus in that it's a local communications channel so you always have maximum bandwidth. Having said that, a switched network would have fewer interconnects and be simpler to construct and scale since the switching logic is isolated and not part of the core. You can also multicast and anycast on a switched network - technically doable on the Transputer but not trivial. Multicasting is excellent for MISD-type problems (multi-instruction, single-data) since you can have the instructions in the L1 cache and then just deliver the data in a single burst to all applicable cores.

      (Interestingly, although PVM and MPI support collective operations of this kind, they're usually done as for loops, which - by definition - means your network latency goes up with the number of processes you send to. Since collective operations usually end in a barrier, even the process you first send to has this extra latency built into it.)

      It's also arguable that it would be better if the networking in the CPU was compatible with the networking on the main bus since this would mean core-to-core communications across SMP would not require any translation or any extra complexities in the support chips. It would also mean CPU-to-GPU communications would be greatly simplified.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    5. Re:Back to the future moment? by Anonymous Coward · · Score: 0

      Want.

    6. Re:Back to the future moment? by WrecklessSandwich · · Score: 2

      Yep, thought of XMOS immediately when I saw the title. 16 quad-core CPUs linked together in a 4D hypercube: https://www.xmos.com/products/development-kits/xmp-64

    7. Re:Back to the future moment? by 91degrees · · Score: 3, Interesting

      My Computer Architecture lecturer at University was David May - lead architect for the Transputer. Our architecture notes consisted of a treatise on transputer design.

      Now multi-processor is becoming standard, it's interesting to see the the same problems being rediscovered, and often the same solutions reinvented. Their next problem will be contention between two cores that happen to be running processes that require a lot of communication. Inmos had a simple solution to this one as well.

      Rather a shame that Inmos came up with the technology a quarter of a century too early. I've known a lot of engineers say wonderful things about them. The reason they weren't a huge success was because nobody had found a need for them yet. Extra silicon could be used to make the current generation faster much more easily than now.

    8. Re:Back to the future moment? by crutchy · · Score: 1

      T9000... sounds like a terminator model. that company you worked for wasn't cyberdyne systems by any chance?

    9. Re:Back to the future moment? by Anonymous Coward · · Score: 0

      Great, let me know when I can run all of my software on it.

    10. Re:Back to the future moment? by Anonymous Coward · · Score: 0

      Let me know when I can run all my software on any single platform.

    11. Re:Back to the future moment? by TheRaven64 · · Score: 1

      The reason they weren't a huge success was because nobody had found a need for them yet

      It was more the fact that processors at the time kept getting faster. The number of transistors doubled every 12-18 months, and this translated to at least a doubling in performance. As with other massively parallel systems, you needed to rewrite your software to take advantage of it, while you could just wait a year and your single-threaded system got faster. This is why multicore is suddenly interesting: chip designers have run out of obvious (and even not-so-obvious) things to do with extra transistors to make existing code faster. Extra cache worked for a while. FPUs, then vector units worked a bit. Wider superscalar systems did until we'd got as much ILP out of the code as was generally possible.

      --
      I am TheRaven on Soylent News
    12. Re:Back to the future moment? by tibit · · Score: 1

      I don't see immediate use for the hypercube, but the individual 1, 2 and 4 core chips are phenomenal for implementing realtime ethernet devices, such as IEEE-1588 switches, realtime industrial ethernet protocols, etc. It's not hard to make a very low latency timestamping switch using one of these. The hardware assisted serialization, deserialization, and time-triggered sampling and update of ports lets you be quite creative because it decouples timing of the I/O with timing of the software. There are many applications where simple MCUs like PICs, or even Parallax's Propeller, are used for software-implements-hardware applications, but those ultimately need cycle counting and you are forced to use assembly. XMOS's XS-1 architecture decouples your software from this, and quite a lot of realtime code can be written in a high level language like C or their quite lovely XC. The latter is a safe variant of C expanded to support transputer communications and hardware-assisted ports. About the only limitation at the moment is the 64kb memory for all code and data, in all threads. Since many realtime applications usually imply little to no buffering, this hasn't been a problem for me, but it needs to be kept in mind. If one wants to have a somewhat slower but larger-memory application code running at a couple MIPS, it's certainly possible to emulate other architectures. ARM Thumb and Zylin Core can be made to work quite reasonably. There's lots of tools that generate code for Thumb, and there's a Zylin gcc port.

      --
      A successful API design takes a mixture of software design and pedagogy.
    13. Re:Back to the future moment? by Anonymous Coward · · Score: 0

      Unless you're running some super obscure software, I think x86-64 fits that description quite well. It can run all Windows, Linux, Mac OS, Android and DOS software just fine. That's not even mentioning the tons of existing emulators for various other legacy systems.

    14. Re:Back to the future moment? by Anonymous Coward · · Score: 0

      Nope, ACE (ace.nl).

    15. Re:Back to the future moment? by WrecklessSandwich · · Score: 1

      Minor correction: It's 64KB of memory per core. There's also software libraries for interfacing with an external SRAM chip, but you need to use something like two 16-bit ports (or a 16 and an 8 for lower capacity chips) and a few 1-bit ports.

    16. Re:Back to the future moment? by tibit · · Score: 1

      Good catch, I forgot to say I meant it per core (a core has up to 8 threads running on it). The "libraries" for SRAM are an overstatement, you need a dozen or two lines of XC for async sram, and maybe 2-3x that for synchronous one, even if you want it running in a separate thread and communicating via a channel with other threads. It's a good tutorial exercise, if one needs a tutorial that is.

      You're free to use a 4-bit port for SRAM control, of course, and it'll be sufficient for async SRAM. For sync SRAM you can IIRC dedicate a 1-bit port for the clock if you want to use a timer to trigger moving data around, but that's not necessary if you can live with a tad lower performance, if that.

      Admittedly my designs are often port-constrained, and on XS-1 there's a limited number of ports. Ports are logical resources that get allocated to physical pins at runtime. If you're brave, you can reallocate ports to different pins dynamically, but the tools essentially provide no support for that.

      That's another aspect of XS-1 that's quite different from most other chips. The typical granularity offered by most MCUs is that you have a pin, and you can twiddle some bits to select which one of a fixed number of alternate functions gets assigned to it. On XS-1, each pin can be accessed from a set of ports (possibly of various output widths), and those pin-to-port assignments are dynamic. There's a machine code instruction to obtain a resource, such as a port, and you then use the handle thus obtained to operate on the port. The ports have built-in width conversion (a.k.a. serialization/deserialization), so if you have a port with, say, 4 bit output width, you can feed it 32 bits at a time. If you need some higher level functionality (timers, UARTS, PWMs, etc), you do it in software. Yet the software is loosely tied to timing of the events on the port. So, there's a static timing analyzer that can prove that your software is fast enough given timing constraints of your application! The analyzer uses the machine code, so it works whatever your source language was (C, C++, XC, assembly, ...).

      --
      A successful API design takes a mixture of software design and pedagogy.
  6. But what does the internet stand on? by keekerdc · · Score: 4, Funny

    Ah, you're clever; but it's internets all the way down.

  7. Say what? by Anonymous Coward · · Score: 2, Insightful

    Errr... the internal "bus" between cores on modern x86 chips already is either a ring of point to point links or a star with a massive crossbar at the center.

    1. Re:Say what? by hamjudo · · Score: 4, Interesting

      Errr... the internal "bus" between cores on modern x86 chips already is either a ring of point to point links or a star with a massive crossbar at the center.

      The researchers can't be this far removed from the state of the art, so I am hoping that it is just a really badly written article. I hope they are comparing their newer research chips with their own previous generation of research chips. Intel and AMD aren't handing out their current chip designs to the universities, so many things have to be re-invented.

    2. Re:Say what? by Cyrano+de+Maniac · · Score: 1

      What AC said. It's the one and only comment on this story you need to read.

      --
      Cyrano de Maniac
    3. Re:Say what? by TheRaven64 · · Score: 3, Insightful

      The researchers can't be this far removed from the state of the art

      They aren't. The way this works is a conversation something like this:

      MIT PR: We want to write about your research, what do you do?
      Researcher: We're looking at highly scalable interconnects for future manycore systems.
      MIT PR: Interconnects? Like wires?
      Researcher: No, the way in which the cores on a chip communicate.
      MIT PR: So how does that work?
      Researcher: {long explanation}
      MIT PR: {blank expression}
      Researcher: You know how the Internet works? With packet switching?
      MIT PR: I guess...
      Researcher: Well, kind-of like that.
      MIT PR: Our researchers are putting the Internet in a CPU!!1!111eleventyone

      --
      I am TheRaven on Soylent News
    4. Re:Say what? by Anonymous Coward · · Score: 0

      When I was at university, AMD gladly gave me their chip manuals for a research project. I still have them.

  8. How long before... by Anonymous Coward · · Score: 0

    ...embedded SOPA and PIPA :-P

  9. Sounds like... by ArchieBunker · · Score: 2

    ccNUMA?

    --
    Only the State obtains its revenue by coercion. - Murray Rothbard
    1. Re:Sounds like... by jd · · Score: 4, Interesting

      For low-level ccNUMA, you'd want three things:

      • A CPU network/bus with a "delay tolerant protocol" layer and support for tunneling to other chips
      • An MTU-to-MTU network/bus which used a compatible protocol to the CPU network/bus
      • MTUs to cache results locally

      If you were really clever, the MTU would become a CPU with a very limited instruction set (since there's no point re-inventing the rest of the architecture and external caching for CPUs is better developed than external caching for MTUs). In fact, you could slowly replace a lot of the chips in the system with highly specialized CPUs that could communicate with each other via a tunneled CPU network protocol.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  10. Internets all the way down? by Gothmolly · · Score: 1

    And then each router, which is a processing unit in its own right, could have multiple cores, which would exhibit the same drawbacks... until you put a network of processors inside that!

    --
    I want to delete my account but Slashdot doesn't allow it.
    1. Re:Internets all the way down? by ExploHD · · Score: 1

      And then each router, which is a processing unit in its own right, could have multiple cores, which would exhibit the same drawbacks... until you put a network of processors inside that!

      We need to go deeper!

  11. Hrmm? by Anonymous Coward · · Score: 0

    So they'll have multiple busses, then. That's a rather goofy way of wording it.

    It's not the packet switching itself that is improving performance, it's the extra bandwidth.

  12. So, why not move from "hub" to "switch"? by ivi · · Score: 1

    Sounds like history... the history of the Hub in LAN technology.

    Maybe it's time to move to a Switch, that can keep multiple core-pairs communicating simultaneously.

    1. Re:So, why not move from "hub" to "switch"? by Osgeld · · Score: 2

      I still think switches on tiny low traffic networks is a silly notion, though now that cost of switches are insignificant(and when was the last time you saw a hub for sale) I just go with the flow.

        Back in the day we had a client who dumped their hubs in each branch for much more expensive at the time switches, then whined that there was no advantage. I replied you insisted on putting your 2 386's and a dot matrix printer on it, and even threatened to take your biz elsewhere, you what you wanted, enjoy

    2. Re:So, why not move from "hub" to "switch"? by PlusFiveTroll · · Score: 1

      Switches are much better when any two or more hosts on the network can use a significant percentage of the total bandwidth at once. Since about every device on a modern network can transfer at a full 100Mbps easily (at least until its memory buffers fill or empty on the slowest) a hubbed network would behave terribly. WiFi kind of works in the same way as the collision domain on a hub and you see this reflected in the raw throughput between hosts.

      The other thing about a hub is it sends all traffic to all hosts on its domain. This is nice to use as a poor mans port mirror to try to figure out what's going on between black box devices, but other then that it's a total security nightmare. At least to capture packets on a switch you have to actively arp poison it or flood it to sniff traffic.

    3. Re:So, why not move from "hub" to "switch"? by Osgeld · · Score: 1

      I agree to a point, in the 2000's people were going ape shit for switches, for what? 2-3 computers on a 768k internet connection?

      Heck even today thats all I really use (ok maybe 4 computer on a 10 meg connection, but not all at the exact same time) its actually rare that I transfer mass amounts of data over my home network, frankly its just faster to pop in a 120 gig hard disk and make backups rather want slog though the network, when I have to do a serious full backup of personal data.

      but heck I was talking about the early 90's people flipping out over switches, where a 40$ hub was compared to a 100$ switch. even in work situations there has only been 2 times where mass amounts of data was flowing all at the exact same time, one was in a refurb center with 60 clients at a time and a ghost server, the other was a national photography development studio for school and church (where they used wifi so apparently it wasnt that important)

      every other time its been a 5 min email check, and maybe a handful of bytes sent to a database at random

      oh well I lost that battle, switches are disposable now, and it doesnt matter anymore

  13. Buses are so '90s by rrohbeck · · Score: 5, Informative

    AMD uses HT and Intel has its ring bus, both of which use point-to-point links. Buses have serious trouble with the impedance jumps at the taps and clock skew between the lines, that's why nobody is using them in high speed applications any more. Even the venerable SCSI and ATA buses went the way of the dodo. The only bus I can see in my system is DDR3 (and I think that will go away with DDR4 due do the same problems.)

    1. Re:Buses are so '90s by eggfoolr · · Score: 1

      Bus? That is so 70's and 80's!

      What about the crossbar switch? They were in fashion in the 90's and are pretty much the core architecture of any multi CPU system.

      Next they'll be saying you can have multiple users on the same computer!!

    2. Re:Buses are so '90s by Anonymous Coward · · Score: 0

      Uh, crossbar switches were used from at least the '70s in telephone exchanges.

    3. Re:Buses are so '90s by Anonymous Coward · · Score: 0

      What a dummy I am, I should have said: crossbar switches have been used since 1919.

    4. Re:Buses are so '90s by eggfoolr · · Score: 1

      If your going to be like that, then Buses have been transporting people for over 100 years.

    5. Re:Buses are so '90s by tamyrlin · · Score: 1

      Actually, even the first computers used buses. For example the Z3, which was built in the early 40's, used buses to transport data. (Actually, the Z3 architecture was very advanced for its time and it is much closer to a modern simple processor than for example ENIAC.)

      Regarding the article summary I could note that it is not only researchers from MIT that says that a network-on-chip (NoC) is a promising concept for the future of chip design. Almost every researcher I've talked to seem to agree that NoCs of some form are needed for future chips. Note that the concept of packet switching networks are not new in computers. It has been used in supercomputers for a long time, and HyperTransport is based on a packet switching architecture.

      That being said, the work the researchers have actually done seem interesting, especially the concept of virtual bypassing which I'll have to read up on at some point.

    6. Re:Buses are so '90s by Anonymous Coward · · Score: 0

      if you want to compare amd:ht you need the pair of intel:qpi.
      it's true that intel has a ring bus within the package, but the
      amd equivalent with bulldozer would be the system request
      queue.

  14. Inefficient... by solidraven · · Score: 2

    That's just plain inefficient use of silicon area. They wish to waste some of that limited space on additional logic that isn't strictly necessary. And it will cause a significant bottleneck to be created. Did they forget about DMA controllers or something? You already need a DMA controller no mater what and it's perfectly capable of accessing the necessary memories as it is. Adding some extra capabilities to the DMA controller would be far more efficient in logic area size and most likely lead to a better performance compared to this bad idea.

    1. Re:Inefficient... by Theovon · · Score: 2

      Silicon AREA is cheap, and it's getting cheaper. Today's processors dedicate half their die space to CACHE. Transistors per die, cores per die, and transistors per core are all increasing at (different) exponential rates. And with power density increasing at a quadratic rate, we're already facing the dark silicon problem, where if we power on the entire chip at nominal voltage, we have trouble delivering the power, and we can't dissipate the heat.

      With 16 cores, a bus is tolerable. At 64, it's a liability, and we NEED a more sophisticated network.

    2. Re:Inefficient... by tlhIngan · · Score: 1

      Silicon AREA is cheap, and it's getting cheaper. Today's processors dedicate half their die space to CACHE. Transistors per die, cores per die, and transistors per core are all increasing at (different) exponential rates. And with power density increasing at a quadratic rate, we're already facing the dark silicon problem, where if we power on the entire chip at nominal voltage, we have trouble delivering the power, and we can't dissipate the heat.

      Actually, no.

      Silicon area is *extremely* expensive. The larger the chip, the less you can fit on a wafer. Plus the wafer has randomly-distributed flaws on it, which means that the larger the chip, the greater chance of a flaw affecting operation. This all combines to reduce yield per wafer, and since a wafer has a fixed cost, the less good chips per wafer, the more they cost because the cost of the entire wafer is amortized over less chips.

      Moore's law helps because reducing the size of a transistor means the silicon area is much lower which gives you more chips per wafer, each chip is less likely to be in a flawed region of silicon, which all increase yield and makes it cheaper to produce.

      What's cheap is transistors - the transistor density of non-memory devices is extremely low purely because most of the area is used up by wiring. Memory devices have a little bit of "random" logic beside a huge array of highly ordered regular memory cells, which can be made extremely dense.

      On a modern CPU, cache is the biggest consumer of transistors, but makes up very little area because it's so dense. The other logic's not dense purely because of wiring. It's gotten to the point where every chip like this is built with a sea of extra transistors and logic gates that are sitting there unused. If a revision is needed, those spares can often be called into action by a metal rework (much cheaper). Or ion rework if it's to confirm if a fix to a problem would suffice (the ion rework lets you rewire parts of the chip around).

      It's also why fixed-area devices like say, a full-frame sensor for a digital camera still cost a LOT of money. And most camera sensors have flaws that the imaging processor has to work around (dead pixels, hot pixels, etc)

    3. Re:Inefficient... by solidraven · · Score: 1

      That's where you're wrong, the cost per area is actually increasing significantly. It is indeed true that the cost per transistor is decreasing (at the moment at least). But since for example Intel and AMD want performance, so they're willing to trade in significant portions of area for an increase in speed. So the cost of your average desktop processor should actually keep increasing; something that luckily hasn't been happening considerably compared to the increasing costs of the new lithography machines and masks. Energy consumption is only a secondary concern when performance is the main factor in most cases as long as it's possible to evacuate the thermal energy and it doesn't have to work off a battery.
      The number you're looking for is somewhere between 30 and 40% actually for most processors. But an additional factor is that memory is a regular structure, this means you can open up the lithography trick book. This allows for a significant decrease in size compared to regular logic. But this not being very relevant to the original subject I'll shut up about it. The actual issue is that you often have to wait for data, this is the major bottleneck of any processor (not to mention it can stall the pipeline). This is why DMA is so interesting, the waiting times are kept to a minimum. On the other hand if you work on a packet based network you're in for quite some problems. You need a lot of additional logic to manage this new bus. Logic that is very expensive in terms of silicon area. And it's questionable EUV lithography will be ready in time to save our asses on this one, ASML says it'll still take a while and I think we can take their word on it. Another concern is the length of the logic chain causing even longer propagation delays.
      To keep it short, a more sophisticated network is indeed necessary. But just saying packet based will fix it is plain wrong. What I do see happening is groups of 4 or 16 cores being interconnected using a more sophisticated network, but even then I question if it'll be packet based due to timing issues.

  15. Yea cause packet transmissions by Osgeld · · Score: 0

    after the data is chopped up, formatted, sent down a narrow serial pipe is so much faster than directly over a parallel link, and besides no a TYPICAL chip has 2 to 4 cores, 6-8 would imply a higher end chip that currently is quite expensive and not in TYPICAL use by TYPICAL people.

    MIT please get out of the dreams lab once in a while

    1. Re:Yea cause packet transmissions by Electricity+Likes+Me · · Score: 1

      No "typical" consumer chip 10 years ago had even 4 cores.

    2. Re:Yea cause packet transmissions by Osgeld · · Score: 1

      who said anything about 10 years ago, and do you think in 10 years we will have typical consumer machines with "chips with hundreds or even thousands of cores"

      in 10 years we will be honestly lucky to have serious machines with "hundreds or even thousands of cores" on the same plane and not strung together with networking.

    3. Re:Yea cause packet transmissions by Electricity+Likes+Me · · Score: 1

      What are you even referring to?

      You're OP was implying this is all garbage because 6-8 cores is a high end chip, not a "typical" one.

      Yet 10 years is not a long time - in the past decade 4 would've been a high-end chip, and before that having 2 physical processors would've been significant as well.

      So I would think, there is in fact a great deal of importance to this kind of work seeing as how the number of cores per chip for consumer items has grown and grown. And then you undermine your own point by implying we might even be getting close to "hundreds" of cores on a chip in the next 10 years. If we are, then the typical consumer chip will be breaching 8-16 easily. Not to mention thing's like the Cell architecture where Sony was thinking about pushing 24 work-cores onto the chip for the PS4 (backed off since then, but it shows where things are headed).

    4. Re:Yea cause packet transmissions by Osgeld · · Score: 1

      what are you replying to, no where does it state "in 10 years"

      here just in case you missed it, the very first sentence of the headline

      ""Today, a typical chip might have six or eight cores, all communicating with each other over a single bundle of wires, called a bus"

      in case you missed it again let me point it out to you TODAY, A TYPICAL CHIP MIGHT HAVE SIX OR EIGHT CORES

    5. Re:Yea cause packet transmissions by tamyrlin · · Score: 1

      > MIT please get out of the dreams lab once in a while

      Actually, no chip-designer wants to use a network-on-chip if they can avoid it due to the added complexity. However, for future SoC designs with hundred of modules it will simply not be efficient to have direct parallel links between every module on the chip. A network will in many cases therefore be the best trade-off between silicon area, bandwidth, and energy efficiency.

      Also, note that a typical SoC used in for example a mobile phone already have significantly more eight cores (although most of these cores are not processors, they still require communication links of some sort). (Take the OMAP4470 as an example [1] - it has at least, two Cortex-A9, one IVA3 accelerator, powervr graphics, a signal processor, SDRAM controller, flash controller, MMC controller, HDMI output, SPI controllers, I2C controllers, SDIO controller, UART controller, USB controller, GPIO controller, etc). So if MIT is in a dream lab, the only thing they are doing is trying to come up with a way to handle the nightmare that future on-chip communication entails.

  16. They aren't doing this already? by DaneM · · Score: 2

    I admit that despite being a technical user, I was not aware that only 2 chips are allowed to "talk" at a given time. I had (erroneously, it would seem) assumed that in order for a 3+-core chip to be fully useful, such a switch/router would have to already be in place.

    So, have Intel, AMD, and others simply tricked us into thinking that a 3+-core chip can actively use all its cores at once (as is the natural assumption), or am I misinterpreting something? If they have, why on earth didn't they include a "router" in the original designs? It seems entirely too obvious for the eggheads in R&D to have missed (or so one would think, anyway). I'm sure there are technical hurdles to overcome, but unless that can be managed, what is really the point of many-core CPUs that can't have all cores acting at once?

    1. Re:They aren't doing this already? by Anonymous Coward · · Score: 2, Informative

      You are misinterpreting it. The chips CAN work independently. It is only when one needs to talk to another or use a shared resource (hard drive, main memory, network) that this becomes a potential issue. It is like a family of three sharing a single bathroom - not such a big deal, Bump that up to 20 using the same bathroom, and you start having serious issues.

    2. Re:They aren't doing this already? by DaneM · · Score: 1

      OK, I see. Thanks for the clarification. (Why post such an intelligent remark as Anonymous Coward?) This being an issue concerned only with shared resources seems to make the lack of concurrent interaction less of an issue, but as with your family/bathroom analogy, it will (predictably) become a major problem as the number of cores/processors in a system continues to increase.

      So, while I yet wonder why this hasn't already been thought-of and solved, I can see that it hasn't been a place that a (typically short-sighted) company would have invested much R&D into, as of yet. I wonder if some independent technology firm has already come up with a solution that will soon be purchased by Intel or AMD. I see another patent battle coming...

    3. Re:They aren't doing this already? by Forever+Wondering · · Score: 4, Insightful

      I admit that despite being a technical user, I was not aware that only 2 chips are allowed to "talk" at a given time. I had (erroneously, it would seem) assumed that in order for a 3+-core chip to be fully useful, such a switch/router would have to already be in place.

      For [most] current designs, Intel/AMD have multilevel cache memory. The cores run independently and fully in parallel and if they need to communicate they do so via shared memory. Thus, they all run full bore, flat out, and don't need to wait for each other [there are some exceptions--read on]. They have cache snoop logic that keeps them up-to-date. In other words, all cores have access to the entire DRAM space through the cache hierarchy. When the system is booted, the DRAM is divided up (so each core gets its 1/N share of it).

      Let's say you have an 8 core chip. Normally, each program gets its own core [sort of]. Your email gets a core, your browser gets a core, your editor gets one, etc. and none of them wait for another [unless they do filesystem operations, etc.] Disjoint programs don't need to communicate much usually [and not at the level we're talking about here].

      But, if you have a program designed for heavy computation (e.g. video compression or transcoding), it might be designed to use multiple cores to get its work done faster. It will consist of multiple sections (e.g. processes/threads). If a process/thread so designates, it can share portions of its memory space with other processes/threads. Each thread takes input data from a memory pool somewhere, does some work on it, and deposits the results in a memory output pool. It then alerts the next thread in the processing "pipeline" as to which memory buffer it placed the result. The next thread does much the same. x86 architectures have some locking primitives to assist this. It's a bit more complex than that, but you don't need a "router". If the multicore application is designed correctly, any delays for sync between pipeline stages occur infrequently and are on the order of a few CPU cycles.

      This works fine up to about 16-32 cores. Beyond that, even the cache becomes a bottleneck. Or, consider a system were you have a 16 core chip (all on the same silicon substrate). The cache works fine there. But now suppose you want to have a motherboard that has 100 of these chips on it. That's right--16 cores/chip X 100 chips for a total of 160 cores. Now, you need some form of interchip communication.

      x86 systems already have this in the form of Hypertransport (AMD) or the PCI Express Bus (Intel) [there are others as well]. PCIe isn't a bus in the classic sense at all. It functions like an onboard store-and-forward point-to-point routing system with guaranteed packet delivery. This is how a SATA host adapter communicates with DRAM (via a PCIe link). Likewise for your video controller. Most current systems don't need to use PCIe beyond this (e.g. to hook up multiple CPU chips) because most desktop/laptop systems have only one chip (with X cores in it). But, in the 100 chip example, you would need something like this and HT and PCIe already do something similar. Intel/AMD are already working on any enhancements to HT/PCIe as needed. Actually, Intel [unwilling to just use HT], is pushing "Quick Path Interconnect" or QPI.

      --
      Like a good neighbor, fsck is there ...
    4. Re:They aren't doing this already? by DaneM · · Score: 1

      Thanks for the enlightening "sip from the fire hose," Forever Wondering. I appreciate the explanation.

    5. Re:They aren't doing this already? by dkf · · Score: 1

      That's right--16 cores/chip X 100 chips for a total of 160 cores.

      16 * 100 = 160?

      You must be a hardware engineer. Did you work for Intel on the early Pentium floating point unit?

      --
      "Little does he know, but there is no 'I' in 'Idiot'!"
    6. Re:They aren't doing this already? by Forever+Wondering · · Score: 1

      That's right--16 cores/chip X 100 chips for a total of 160 cores.

      16 * 100 = 160?

      You must be a hardware engineer. Did you work for Intel on the early Pentium floating point unit?

      Yep, I caught the math error, too, but only after posting. I was debating a one-liner reply to correct it, but didn't want to clutter things up with a reply just to correct the typo.

      I'm a computer engineer, which is 50% software, 50% hardware. While I could forgive a hardware engineer, a software engineer never makes misteaks [pun intended].

      --
      Like a good neighbor, fsck is there ...
    7. Re:They aren't doing this already? by Forever+Wondering · · Score: 1

      Thanks for the enlightening "sip from the fire hose," Forever Wondering. I appreciate the explanation.

      You're quite welcome. It's refreshing to get a thank you on slashdot--and much appreciated.

      It seemed like you had shelled out good money for a multicore system and were concerned that you weren't getting your money's worth.

      In fact, Intel/AMD cores work even harder for you than that using several techniques:
      Hyperthreading (http://en.wikipedia.org/wiki/Hyper-threading)
      out of order execution (http://en.wikipedia.org/wiki/Out-of-order_execution)

      Because of this and the sheer speed of a 3+ GHz CPU, the main bottleneck is actually fetching/storing to/from DRAM.

      --
      Like a good neighbor, fsck is there ...
  17. Remember SGI? by Anonymous Coward · · Score: 0

    SGI did this in just about every computer it produced from the early 90s until they stopped making MIPS machines (or existing, really). You could use Craylink cables and R-bricks to turn multiple C-bricks (full-fledged Origin servers with 1-4 CPUs), into single-system-image ccNUMA machines. They had quite a few big Origin machines in the Top 500 back in the day.

    Bonus points, my capcha was "networking".

    1. Re:Remember SGI? by Cyrano+de+Maniac · · Score: 1

      We still do. The only major difference (other than generational improvements) is that these days it's x86 instead of MIPS.

      --
      Cyrano de Maniac
  18. the worst replaces the best by holophrastic · · Score: 3, Interesting

    Yeah, great idea. Take the very fastest communication that we have on the entire planet, and replace it with the absolute slowest communication we have on the planet. Great idea. And with it, more complexity, more caches, more lookup tables, and more things to go wrong.

    The best part is that it's totally unbalanced. Internet protocols are based on a network that's ever-changing and totally unreliable. The bus, on the other hand, is best on total reliability and static.

    I'd have thought that a pool concept, or a mailbox metaphor, or a message board analog would have been more appropriate. Something where streams are naturally quantized and sending is unpaired from receiving. Where a recipient can operate at it's own rate uncommon to the sender.

    You know, like typical linux interactive sockets, for example. But what do I know.

    1. Re:the worst replaces the best by tamyrlin · · Score: 1

      Actually, the networks used in Network-on-Chips are quite unlike the networks used for TCP/IP. For example, when you develop a System-on-Chip you have a very good idea of your workload, so you can optimize the network topology based on that information. The networks proposed in NoC research typically also have other features not found on the Internet such as guaranteed and in-order delivery of packets. (Which is fairly easy to do in a small network with low latencies.) In many cases you can also reserve bandwidth between nodes so that you can give real-time guarantees. However, in some systems circuit-switching may be better than packet switching, although most researchers seem to focus on packet-switching NoCs.

      A good paper to read for an introduction to NoCs is "Route Packets, Not Wires: On-Chip Interconnection Networks" by Dally and Towles. (You can find it at http://www.cs.berkeley.edu/~vwen/backgrnd_papers/41_4.pdf if you are interested.)

      Anyway, the basic idea behind a NoC is that it is a good trade-off between the two extremes of a bus and a cross-bar. If you implement a chip with just a single bus on it, the silicon-area used for communication will be very low, but the bandwidth will also be relatively low. On the other hand, if you create a huge cross-bar to which every module is connected to, the silicon area used for communication is extremely high (the area for a cross-bar grows quadratically with the number of ports), although the theoretical maximum bandwidth is also very high. In most systems, the optimum point will be somewhere in between, where you have several buses and/or cross-bars connected by a network.

  19. Sounds like an idea of mine by Anonymous Coward · · Score: 0

    I had an idea for a MMO game, where people would use personal computer hardware up against an internet, but everyone would have multiple entry points because of processor complexity.

    Unfortunately, as designed, this means an all out war on the internet, with no security nor privacy.

  20. Really!? It has already been used by Anonymous Coward · · Score: 0

    by Tandem computers, like a long time.

  21. The important bit : No coherent shared cache by Sarusa · · Score: 5, Informative

    As mentioned in other comments, this has been done before. The method of message passing isn't as fundamental as one key point - that it is all explicit message passing.

    Intel and AMD x86/x64 CPUs use coherent cache between cores to make sure that a thread running on CPU 1 sees the same RAM as a thread running on CPU 3. This leads to horrible bottlenecks and huge amounts of die tied up in trying to coordinate the writes, maintain coherency between N cores (N-1 ^2 connections!), and it all just goes to hell pretty fast. Intel has this super new transactional memory rollback thing, but it's turd polishing.

    The next step is pretty obvious (see Barrelfish) and easy: no shared coherency. Everything is done with message passing. If two threads or processes (it doesn't really matter at that point) want to communicate they need to do it with messages. It's much cleaner than dealing with shared memory synchronization, and makes program flow much more obvious (to me at least - I use message queues even on x86/x64). If you need to share BIG MEMORY between threads, which is reasonable for something like image processing, you at least use messages to explicitly coordinate access to shared memory and the cores don't have to worry about coherency.

    This scales extremely well for at least a couple thousand CPUs, which is where the 'local internet' becomes useful.

    Where it becomes not easy is that almost all programs written for x86/x64 assume threads can share memory at will. They'd need to be rewritten for this model or would suddenly run a whole lot slower since you'd have to lock them to one core or somehow do the coordination behind their back. It'd be worth it for me!

    1. Re:The important bit : No coherent shared cache by Anonymous Coward · · Score: 1

      IPC has been a PITA and slow since decades, you don't want that to be the only option in the future.

    2. Re:The important bit : No coherent shared cache by Anonymous Coward · · Score: 1

      Seems like you are talking about switching from a "strong memory model" to a "weak memory model" and TBQH I know my share of developers that can barely handle multithreaded programming as it is... throwing this at them could be a disaster on the software side.

    3. Re:The important bit : No coherent shared cache by dkf · · Score: 2

      Seems like you are talking about switching from a "strong memory model" to a "weak memory model" and TBQH I know my share of developers that can barely handle multithreaded programming as it is... throwing this at them could be a disaster on the software side.

      Depends on the model. If the model is "oh, you got one big space of memory; anything goes but you'd better sprinkle a few locks in" then yes, that will suck boulders when the hardware switches to message passing, but there are other parallelism models in use in programming. Those that have each thread as being essentially isolated and only communicating with the other threads by sending messages will adapt much more easily; that's basically MPI, and that's known to scale massively. It's also a heck of a lot easier to reason about message passing parallelism; that's been known since at least the '80s. What's more, there are actually quite a lot of programmers who have experience with distributed component programming; they just tend to work at a much higher level than a single process (or single computer).

      --
      "Little does he know, but there is no 'I' in 'Idiot'!"
    4. Re:The important bit : No coherent shared cache by Sarusa · · Score: 1

      As part of this the messaging has to be as fast as possible, which is where the article comes in. Newer cores/chips designed for this kind of thing have multi-gigabytes/sec just for the messaging and tiny latencies.

      The threads/processes still shouldn't be so tightly coupled that they're talking more than working (or waiting), or something's probably wrong with the design. Even in a shared memory model it's probably spending massive amounts of time twiddling mutexes and trying to keep memory synced between the cores (if they're running on separate cores).

      There's still the option of shared RAM for passing around large data - readers just have to know when to invalidate their cache, which is where the coordination by message comes in. So messaging isn't the only option, just preferred.

      Finally, current IPC can be slow but doesn't have to be. For instance, when I send a message to another thread with ThreadX it puts the message on the (pre-allocated) queue, checks if the other thread is higher priority and waiting on the queue, and if it is, *boom*, receiver gets the message, near instant context switch. We use this for tiny embedded systems and overhead is noise level.

  22. not news, just PR by markhahn · · Score: 1

    oh, come on. buses have been dead for years (sata and pcie are great examples of the prevalence of point-to-point links). no reason we can't think of cachelines as packets (bigger than ATM packets were!). how about hypertransport and QPI?

    1. Re:not news, just PR by mikkelm · · Score: 1

      Everything you do deals with a bus somewhere. They're still hugely relevant, particularly in very dense, very fast electronics.

  23. Google's on it by bill_mcgonigle · · Score: 2

    I can't seem to find the old story or my comment on it, but when Google acquired a 'stealth' startup a year or so ago the most interesting thing about it was that the primary investigator had a few patents for packet-switched CPU's.

    --
    My God, it's Full of Source!
    OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
  24. This is not like the internet! by Darinbob · · Score: 1

    Come on people. Cores share information and suddenly it's just like the internet? Are these journalist's experiences so narrow that they have no other analogy? It's just a fricking bus! There are networks that exist which are not "the internet". Using the term "internet" implies global connectivity. OK, I expect journalists to be ignorant but please are slashdot editors this confused about basic technology as well?

  25. SeaMicro by Sollord · · Score: 1

    Didn't AMD just buy a company that did something similar to this? While not at the chip or core level it seems kinda realted

  26. Software is the problem by CBravo · · Score: 1

    The problem is not the hardware but the software. The hardware has been parallel for ages, even locally (GPU, GPU-memory, CPU, memory, HDD, DMA - memory processor, ...).

    Software is a different problem across networked/parallel arena. If you really think about an SMS it is not much more than 'hello world'. You type it and you see text (no other function, other than transport which isn't really a function, has been done) and testing it should be easy. This is not even about parallelism but about communication.

    The best way to create software for networking is to not re-write it for all these new parallel architectures/internet (which means you compile it for compartimented execution). This is however, pretty hard to do (I don't know about such an implementation). The alternative is that everybody needs to put all the same glue in its software over and over (RMI, OpenMP, ...). We are doing #2.

    By the way I think there is a big difference between networking, which has a premise that things fail, and local transport of data/code which is specced on its workings. Fundamentally different (price).

    --
    nosig today
  27. sounds like Cell by fikx · · Score: 1

    isn't this a variation on Cell architecture? except, no one could figure out how to write the OS and compiler to fully realize the goal of programs that could be farmed out by the ARM CPU to the special processors on one chip, let alone farm to multiple cell's over a network.

    --
    AB HOC POSSUM VIDERE DOMUM TUUM
  28. Wrong by SmallFurryCreature · · Score: 1

    It is lolcats all the way down, in a pool of porn with an essence of "Me too" posts.

    Anyway, I think the original poster needs to read up on what the Internet is. It is a network of networks. A number of CPU's networking together is just a network. If you could mix many different systems together, it would be an Internet.

    If you could put a Intel cpu next to an AMD and they would just work together seamlessly, THAT would be an Internet.

    --

    MMO Quests are like orgasms:

    You may solo them, I prefer them in a group.

    1. Re:Wrong by crutchy · · Score: 1

      I think the original poster needs to read up on what the Internet is

      i think armchair experts are really just wankers with big hats

  29. Great idea... by FithisUX · · Score: 0

    I believe this is the ultimate solution to parallelization. A total realization of actors.

  30. a bit late by Anonymous Coward · · Score: 0

    That'd be all great and applicable to say... a Pentium D. Those processors had cores that, in fact, communicated via the local bus.

  31. Way old idea! by enriquevagu · · Score: 1

    The seminal paper proposing the use of switched/routed interconnection networks on-chip (NoCs) was published by Dally and Towels 11 years ago in DAC'01: Route packets, not wires: On-chip interconnection networks. The idea of associating a router to each core and replicating it in "tiles" is not new either; Tilera was (IIRC) the first company to sell processors based on a tiled design, which was an evolution of the RAW research project. A related research project, the TRIPs, replicated functional units on each tile, rather than full cores. Intel has used a tiled design in the Polaris, SSC and MIC (which includes the forthcoming Knights Corner).

    So no, the idea of using routed interconnects is not new at all. In fact, after reading the linked article, turns out that 2/3ths of the text are introducing the idea, and the last section details the contributions: Two ideas developed by the group of Li-Shiuan Peh seeking to improve performance (by using virtual bypassing, a form of routing precomputation) and reducing power consumption (using low-swing signaling).

  32. The Simpsons already did it ... by Anonymous Coward · · Score: 0

    Sort of like the AMD Hypertransport then. Multiple of them on each CPU and packet switched... Add in variable width connections downstream and it's pretty cool. Actually the best of both since it is a common bus, packet switched, and the processors had multiples of them... And suitable for off chip as well as on. Then Intel did a similar thing but sort of didn't exploit it in FBDIMM... Oh and I am an Inmos fan from way back. Variable width operations, 4 5mhz serial bus connections, and a massive matrix switch in the family, not to mention dedicated serial bus connected disk controller and graphics chips... Cool stuff killed off when sgs bought them. And they had a cool language ocean with intrinsics for trimmers and inter processor com channels... And extended c for that too. And it was very easy to farm tasks out through an array of CPUs. They included parallel and serialized operations in the supported languages in the simplest manner ever.

  33. erm... by crutchy · · Score: 1

    cores should instead communicate the same way computers hooked to the Internet do

    apparently never heard of beowulf clustering

    1. Re:erm... by benthurston27 · · Score: 1

      Imagine a beowulf cluster of... oh wait that would just be a beowulf cluster.

  34. Typical chip six or eight cores? by Lord+Lode · · Score: 1

    Then why do all Intel CPU's, except a very small amount of xeon CPU's, have only 4 cores max, even the new Ivy Bridge ones to be released this year, even though 5 years ago they also had chips with 4 cores already?

  35. So... by metacell · · Score: 1

    ... now my mother will finally have Internet in her computer!

  36. But now the patents have expired... by Anonymous Coward · · Score: 0

    But now the patents have expired...

    So anyone can implement the solutions.

  37. Deja vu by maroberts · · Score: 1

    I was going to say this seems to be the realisation that the Transputer had the answers decades ago, but it seems many others have said exactly the same thing. I shall resume my nap.,....

    --

    Donte Alistair Anderson Roberts - hi son!
    Karma: Chameleon

  38. CONGRATULATIONS! by Alex+Belits · · Score: 0

    YOU HAVE INVENTED A BUS! It's time to start working on the first multitasking OS!

    What is it with idiots coming out of the woodwork presenting old (and often obsolete and abandoned such as virtualization) technologies as some kind of new development?

    --
    Contrary to the popular belief, there indeed is no God.
    1. Re:CONGRATULATIONS! by Anonymous Coward · · Score: 0

      Network on chip design has *absolutely nothing* to do with a shared system/cpu bus. Specialised processing units are already making use of these new types of architectures. e.g. http://techresearch.intel.com/ProjectDetails.aspx?Id=151 The cores are architected as a 2d mesh with each core having a 5port message passing router. This allows a ~3Ghz clocked CPU to deliver a 1 teraflop performance while consuming 62W power.

    2. Re:CONGRATULATIONS! by Alex+Belits · · Score: 1

      Any modern motherboard has that, PCIe is developed around that, HyperTransport uses it, CPU cache architectures use it.

      --
      Contrary to the popular belief, there indeed is no God.
    3. Re:CONGRATULATIONS! by Anonymous Coward · · Score: 0

      But not mainstream CPUs. So, researchers are constantly finding new ways to reduce interconnect latency , increase bandwidth, and reduce power consumption. Like anything else in electronics progress is incremental. Maybe you should actually wait for them to present their paper before bashing them. If the article was technical maybe 1% of people on slashdot would understand it.

    4. Re:CONGRATULATIONS! by Alex+Belits · · Score: 1

      But not mainstream CPUs.

      It was in "mainstream CPUs" since the first multicore designs. Better yet, HyperTransport, a CPU-attached bus, implemented routing. As in, rule-based packet forwarding.

      --
      Contrary to the popular belief, there indeed is no God.
  39. Traffic by bjs555 · · Score: 1

    The idea only works until one of the cores starts sending spam. Hey core, want Vi@gra?

  40. It's NEWS when MIT quotes ancient literature by Theovon · · Score: 1

    The Network on Chip has been around as a concept so long we even have an abbreviation (NoC). Maybe this isn't in commodity products, but basically if you want to do an NoC, you don't have to invent anything yourself. There are several conferences and journals that have been publishing papers on this for decades. But, OH, if a professor from MIT mentions it, it must be something NEW. Sheesh.

    1. Re:It's NEWS when MIT quotes ancient literature by Anonymous Coward · · Score: 0

      a ferrari has four wheels , a steering wheel, gears and a combustion engine ? it must be no different from a ford model T !

      jesus.. learn to look beyond superficial similarities or do you lack the mental capacity to understand cpu design?

  41. Connection Machines? by cpghost · · Score: 1

    Looks like a deja-vu, considering MIT's Connection Machine. While the interconnect will be less regular (not a hypercube), the message passing between cores will have to be routed in one way or the other, just as with the CM. So how is that news?

    --
    cpghost at Cordula's Web.
  42. Not News by Anonymous Coward · · Score: 0

    Old technology (NoC) being applied by someone famous is not news

  43. NO! by Anonymous Coward · · Score: 0

    NO NO NO YOU RETARDS

    A CPU is not an Internet!
    Go back to school and get some real education.

  44. Ah, Grasshopper. You are learning... by msobkow · · Score: 1

    Sun pegged it right when they said "The Network Is The Computer."

    The specific speed of the network interconnect, the topology of the network fabric, and whether you normally think of it as a network connection are all that distinguish any multi-core system from a distributed cluster. Cloud computing begins to scratch the implications of this at the cluster/site level, and now it would seem some VLSI gearheads are thinking in the same abstract model at the chip level.

    Once you start thinking of all your compute and storage resources as nodes in a network, you can start applying some very interesting algorithms and research results to the problem of improving throughput and reducing latency within the network of networks.

    But if the network is the computer, I guess that makes a distributed global collection of nodes the Cluster.

    --
    I do not fail; I succeed at finding out what does not work.
  45. We could call it a Transputer! by strangeattraction · · Score: 1
  46. Multi-processor scalable communication by Anonymous Coward · · Score: 0

    This is really starting to FINALLY catch up with the Silicon Graphics ccNUma, S2MP and Crossbar architectures. See section 3.1 Network Topology in the following paper:
    http://www.cs.washington.edu/education/courses/cse549/07wi/files/sgiorigin.pdf
    Until you can MASSIVELY improve the backplane speed of the common PC your bottleneck will be at the back plane. My .02.

  47. Sun Microsystems did it in '93 by chorlian · · Score: 1

    XDBus: a high-performance, consistent, packet-switched VLSI bus

    This paper appears in:
    Compcon Spring '93, Digest of Papers.
    Date of Conference: 22-26 Feb 1993
    Author(s): Sindhu, P.
    Xerox Palo Alto Res. Center, CA
    Frailong, J.-M. ; Gastinel, J. ; Cekleov, M. ; Yuan, L. ; Gunning, B. ; Curry, D.
    On Page(s): 338 - 344
    The XDBus is a low-cost, synchronous, packet-switched VLSI bus designed for use in high-performance multiprocessors. The bus provides an efficient coherency protocol which guarantees processors a consistent view of memory in the presence of caches and IO. Low-voltage swing (GTL) CMOS drivers connected to balanced transmission line traces ensure low power as well as high speed for chip, board, and as backplane applications. The signaling scheme and coherency protocol work together to promote a high level of system integration, while permitting a wide variety of configurations to be realized. These configurations include small single board systems, multiple bus systems, multiboard backplane systems, and multilevel cache systems. The bus is used in several commercial systems including Sun Microsystem's new SPARCcenter 2000 series.

    --
    David B. Chorlian
  48. Greenarrays GA144 144 cores by Anonymous Coward · · Score: 0

    For a good time check out GreenArrays GA-144. 144 forth computers on one chip. No bus, each computer only communicates with neighbors, like described in the article. http://www.greenarraychips.com/