Slashdot Mirror


MIT Startup Unveils New 64-Core CPU

single-threaded writes "Tilera, a startup out of MIT, has announced that it is shipping a 64-core CPU. Called the TILE64, the CPU is fabbed on a 90nm process and is clocked at anywhere from 600MHz to 900MHz. 'What will make or break Tilera is not how many peak theoretical operations per second it's capable of (Tilera claims 192 billion 32-bit ops/sec), nor how energy-efficient its mesh network is, but how easy it is for programmers to extract performance from the device. That's the critical piece of TILE64's launch story that's missing right now, and it's what I'll keep an eye out for as I watch this product make its way in the market. Though there are any number of questions about this product that remain to be answered, one thing is for certain: TILE64 has indeed brought us into the era of 64 general-purpose, mesh-networked processor cores on a single chip, and that's a major milestone.'"

3 of 213 comments (clear)

  1. This was my companys idea in 2001 by John+Sokol · · Score: 4, Interesting

    It's was called Enumera www.enumera.com

    I started to work with Chuck Moore, the author of the FORTH Language on a 7X7 array of very fast small processors.

    From at talk I did, February 16, 2001
    From http://www.dnull.com/~sokol/amorp/emtalk.ppt

    On this size Chip a 7x7 array (49 CPU's) with ram could be
    build. Co-processors could also be added.
    Each CPU's would be operating at 2400 MIPS x 49 for a total of 117 Billion operations per second.
    The power consumption would be 1 watt 1.8 Volts a 500 mA.
    With this level of computing power new applications that were unthinkable before, now become possible. Also mention earlier on Slashdot:
    http://developers.slashdot.org/comments.pl?sid=138 584&threshold=0&commentsort=0&mode=thread&cid=1160 0799

    And earlier here:
    http://www.colorforth.com/ 25x Multicomputer Chip

    This eventually became IntellaSys after Enumera failed.

    IntellaSys CTO Chuck Moore to Present at In-Stat Spring Processor Forum; Scalable Embedded Array Platform for Implementing Asynchronous, Scalable Multicore Solutions Using Elegant VentureForth Programming to Be Discussed in Detail http://www.intellasys.net/products/24c18/SEAforth- 24A-3.pdf
    http://www.findarticles.com/p/articles/mi_m0EIN/is _2005_Oct_24/ai_n15730157
    http://www.findarticles.com/p/articles/mi_m0EIN/is _2006_May_1/ai_n16135032

    Also for older info see:
    Specifically look at the P21 / I21/ F21 chips...

    http://www.enumera.com/chip/
    http://www.ultratechnology.com/ml0.htm
    http://www.ultratechnology.com/f21.html#f21
    http://www.ultratechnology.com/store.htm#stamp
    http://www.ultratechnology.com/cowboys.html#cm

    --
    I am always doing that which I can not do, in order that I may learn how to do it. - Pablo Picasso
  2. Re:Instruction Set by ultranova · · Score: 3, Interesting

    You're hoping they're doing something to make it easier to program, and I doubt they are. The choke point is rapidly becoming scheduling rather than number of cores.

    The solution, of course, is to move away from the imperative programming model to dataflow or functional one. That way the compiler can automatically parallelize the task, instead of the programmer having to do so manually.

    --

    Forget magic. Any technology distinguishable from divine power is insufficiently advanced.

  3. Re:Instruction Set by networkBoy · · Score: 3, Interesting

    The best way, IMHO, is to build a two-layer chip - one layer being RAM, the other being the CPU cores Both those require transistors. You can not stack transistors with any current process technology, physics gets in the way.
    A chip is basically built as follows

    metal
    poly
    metal
    poly
    Si Where the poly is the insulator and metal is the same as traces on a PCB. Just like you can not place components in the middle of a PCB you can not place transistors on top of the metal, it would require a second silicon layer that you could dope transistors into.
    While there are some technologies (SOI for example) that may allow this in theory, you start to run into other issues like trying to punch through the insulator in specific areas and with high precision (neither of which is easy), heat dissipation (transistors are transistors, and switching produces heat, doesn't matter if it's an ALU or a SRAM). And finally before someone suggests using the other side of the wafer, how do you connect the two sides? A wafer is *very* thick in the scale we are discussing. It would be like mining a hole through the earth.
    More useful would perhaps be distributing L0 cache (register memory) a little more liberally in key areas of the processor, but then addressing gets in the way. In theory having a MCM (multi chip module) with Cache - Processor - Cache so there is ample L3 cache running at core/4 clock may help, but costs get prohibitive.

    There is no really good solution to moving data around once you start getting to these kinds of density. Eventually wire delay may be the limiting factor to CPU throughput.
    -nB
    --
    whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump