Slashdot Mirror


MIT Startup Unveils New 64-Core CPU

single-threaded writes "Tilera, a startup out of MIT, has announced that it is shipping a 64-core CPU. Called the TILE64, the CPU is fabbed on a 90nm process and is clocked at anywhere from 600MHz to 900MHz. 'What will make or break Tilera is not how many peak theoretical operations per second it's capable of (Tilera claims 192 billion 32-bit ops/sec), nor how energy-efficient its mesh network is, but how easy it is for programmers to extract performance from the device. That's the critical piece of TILE64's launch story that's missing right now, and it's what I'll keep an eye out for as I watch this product make its way in the market. Though there are any number of questions about this product that remain to be answered, one thing is for certain: TILE64 has indeed brought us into the era of 64 general-purpose, mesh-networked processor cores on a single chip, and that's a major milestone.'"

7 of 213 comments (clear)

  1. Oblig... by Bentov · · Score: 5, Funny

    No one will ever need more than 64 cores.

  2. Rumored... by SeanMon · · Score: 5, Funny

    It's rumored to be able to run 16 whole instances of Vista simultaneously!*

    *Required 32 GB of RAM not included.

    --
    "Scud Storm!" -- Jeremy of PurePwnage.com
  3. But does it... by niceone · · Score: 5, Informative

    well, yes it does run Linux - full SMP 2.6 according to the blurb on their site.

    1. Re:But does it... by Eponymous+Bastard · · Score: 5, Informative

      One thing the blurb doesn't make clear is that this is not a workstation CPU. It's designed for embedded systems and system on a chip applications. They mention video compression as an example.

      If you look at their block diagram this looks more like an FPGA-on-drugs than a CPU.

      The individual blocks are probably programmed with GCC, since it should be trivial to port it to a MIPS-like architecture. I wonder if the interconnect uses a VHDL type language or if they rely on their weird cache to build efficient shared memory.

      Either way, it looks like you have to keep in mind the architecture while designing your software. I doubt they can build a compiler that can manage the division of labor.

      Unlike a typical multicore design you wouldn't use this to parallelize a multithreaded application or a multiprocess workload. The center processors will have a very different latency characteristic than the edge ones, and you want the parts that interact with the network to be on the points adjacent to the controllers, for example.

      So it should work great for an especially designed system, but not so great as a general purpose CPU

  4. Re:Instruction Set by dfedfe · · Score: 5, Informative

    FWIW:

    ""If you have an application written for any multi-core or single processor architecture that's written to work with Linux, you can take it, compile it and have it running on our chip in minutes," he said. "Now, if you want to ratchet up the performance, we provide libraries and interface mechanisms that customers can use to tune code."" from here

  5. Re:Instruction Set by Mex · · Score: 5, Funny

    ""I'm due to talk to the head of Tilera's software team, which is actually larger than the company's hardware team.""

    He must be a really fat guy!

  6. Re:This was my companys idea in 2001 by John+Sokol · · Score: 5, Informative

    Parallel processors on a single die (chip) is very different from Thinking Machines & beowulf clusters.

    Up till now there were only 2 types of Parallel processing.

    1.) loosely coupled. Thinking Machines & beowulf clusters for example are using this, these are interconnected with Ethernet or some other Network medium and send messages back and forth.

    2.) Tightly coupled, this is SMP, NUMA, SNOOPY, basically shared memory system where each processor shares the same global memory space.

    Each requires very different programming strategies and are limited to certain types of problems.

    There is also a third form that is lesser know. This systolic arrays. An example of this is TimeLogic, and many DOD type projects.
    This is usually done with a bunch of FPGA's and the math computations are done as a series of hardware pipelines without any CPU.

    With the parallel core processor it's possible to make it like an SMP (share memory) type system, but you really get hammer with the memory bottleneck so after about 4 CPU's you don't really gain much.

    What I had proposed with doing systolic array type of processing but with Simple but fast CPU's on one chip.
    They would be connected with CPU registers that would pass data directly from one CPU to the next.
    It's design would allow super tight coupling between each processor, so a programming problem wouldn't need to process a buffer at a time but could tackle problems that can't normally be broken up into parallel operations. For example a bignum math operation like multiplying 2 number that are 1024 bits long. Or large FFT, fast DVT, or matrix operations where each cpu could process part of a single operation that must be done serially, and can not be done using traditional parallel processing.

    Specifically my interest was in video compression and image processing in real time. This is where DCT, motion vector searches Huffman coding and other operations that don't parallelize well would really get a boost using this type of processor.

    --
    I am always doing that which I can not do, in order that I may learn how to do it. - Pablo Picasso