MIT Startup Unveils New 64-Core CPU

← Back to Stories (view on slashdot.org)

MIT Startup Unveils New 64-Core CPU

Posted by ryuzaki0 on Monday August 20, 2007 @08:57AM from the tech-is-neat-but-using-it-is-neater dept.

single-threaded writes "Tilera, a startup out of MIT, has announced that it is shipping a 64-core CPU. Called the TILE64, the CPU is fabbed on a 90nm process and is clocked at anywhere from 600MHz to 900MHz. 'What will make or break Tilera is not how many peak theoretical operations per second it's capable of (Tilera claims 192 billion 32-bit ops/sec), nor how energy-efficient its mesh network is, but how easy it is for programmers to extract performance from the device. That's the critical piece of TILE64's launch story that's missing right now, and it's what I'll keep an eye out for as I watch this product make its way in the market. Though there are any number of questions about this product that remain to be answered, one thing is for certain: TILE64 has indeed brought us into the era of 64 general-purpose, mesh-networked processor cores on a single chip, and that's a major milestone.'"

8 of 213 comments (clear)

Min score:

Reason:

Sort:

I'm ready for it by Anonymous Coward · 2007-08-20 09:19 · Score: 2, Interesting

On my laptop right now:

> ps aux | wc -l
281

Of course not all those processes are in runnable state. On the other hand, many of those processes have multiple threads. A typical Java Swing GUI app may have a dozen threads, for example. A web server process can easily have dozens of runnable threads. Software is going to take a little bit of catching up, but nothing huge.
Re:Instruction Set by SatanicPuppy · 2007-08-20 09:24 · Score: 1, Interesting

You're hoping they're doing something to make it easier to program, and I doubt they are. The choke point is rapidly becoming scheduling rather than number of cores.

This is the same problem we've been working with on clusters forever...How do you tune and load balance the jobs to the point where you're getting the most out of your hardware, and nothing is sitting idle while other parts of the system are running at 100%? What do you do when the task is already reduced to the simplest level and there is no benefit from throwing extra processors at it?

Someone smarter than me is going to have to figure it out...The only way I can think of doing it is grafting more scheduling crap on top of all the processors to break down tasks and assign them to cores, and adding another layer of complexity is almost always a bad idea.

--
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
This was my companys idea in 2001 by John+Sokol · 2007-08-20 09:59 · Score: 4, Interesting

It's was called Enumera www.enumera.com

I started to work with Chuck Moore, the author of the FORTH Language on a 7X7 array of very fast small processors.

From at talk I did, February 16, 2001
From http://www.dnull.com/~sokol/amorp/emtalk.ppt
On this size Chip a 7x7 array (49 CPU's) with ram could be
build. Co-processors could also be added.
Each CPU's would be operating at 2400 MIPS x 49 for a total of 117 Billion operations per second.
The power consumption would be 1 watt 1.8 Volts a 500 mA.
With this level of computing power new applications that were unthinkable before, now become possible. Also mention earlier on Slashdot:
http://developers.slashdot.org/comments.pl?sid=138 584&threshold=0&commentsort=0&mode=thread&cid=1160 0799

And earlier here:
http://www.colorforth.com/ 25x Multicomputer Chip

This eventually became IntellaSys after Enumera failed.
IntellaSys CTO Chuck Moore to Present at In-Stat Spring Processor Forum; Scalable Embedded Array Platform for Implementing Asynchronous, Scalable Multicore Solutions Using Elegant VentureForth Programming to Be Discussed in Detail http://www.intellasys.net/products/24c18/SEAforth- 24A-3.pdf
http://www.findarticles.com/p/articles/mi_m0EIN/is _2005_Oct_24/ai_n15730157
http://www.findarticles.com/p/articles/mi_m0EIN/is _2006_May_1/ai_n16135032

Also for older info see:
Specifically look at the P21 / I21/ F21 chips...

http://www.enumera.com/chip/
http://www.ultratechnology.com/ml0.htm
http://www.ultratechnology.com/f21.html#f21
http://www.ultratechnology.com/store.htm#stamp
http://www.ultratechnology.com/cowboys.html#cm

--
I am always doing that which I can not do, in order that I may learn how to do it. - Pablo Picasso
1. Re:This was my companys idea in 2001 by John+Sokol · 2007-08-20 11:07 · Score: 2, Interesting
  
  I an not sure really what the point is, I guess I am just venting out of frustration. Also adding some information to anyone interested similar work I had done, showing this isn't a new idea.
  
  I put $100,000 Cash and almost 2 years worth of work into this and got nothing, no one was even interested.
  But then I see a Bunch of MIT weenies do it and they get all kinds of attention as something new and revolutionary 6 1/2 years later.
  
  There is also a real chance they took the idea right off my web site or slashdot post or maybe even present at my talk and never even gave me some credit for the concepts. There design really looks like it was lifted straight off my paper.
  
  So I guess at least I am exposing some plagiarisms.
  
  I mean what the heck is the point of having an incredibly good idea and investing so much time and money into it just to watch someone else profit from it without so much as a thank you.
  
  I was at least trying not to whine and complain in my post and keep it purely informative and provide links to my very similar earlier works.
  
  --
  I am always doing that which I can not do, in order that I may learn how to do it. - Pablo Picasso
Re:Instruction Set by ultranova · 2007-08-20 10:02 · Score: 3, Interesting

You're hoping they're doing something to make it easier to program, and I doubt they are. The choke point is rapidly becoming scheduling rather than number of cores.

The solution, of course, is to move away from the imperative programming model to dataflow or functional one. That way the compiler can automatically parallelize the task, instead of the programmer having to do so manually.

--
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
Let the geeks solve the problem by rbanffy · 2007-08-20 10:21 · Score: 2, Interesting

"'What will make or break Tilera is not how many peak theoretical operations per second it's capable of (Tilera claims 192 billion 32-bit ops/sec), nor how energy-efficient its mesh network is, but how easy it is for programmers to extract performance from the device. That's the critical piece of TILE64's launch story that's missing right now"

Build a USD1000 desktop workstation, port Debian Linux to run on it and let the geeks out there adopt it.

There is no better way to explore a device's capabilities than to let the market do it.

I want one for myself. I am tired of the x86 architecture.

--
http://www.dieblinkenlights.com
Re:Instruction Set by networkBoy · 2007-08-20 10:53 · Score: 3, Interesting

The best way, IMHO, is to build a two-layer chip - one layer being RAM, the other being the CPU cores Both those require transistors. You can not stack transistors with any current process technology, physics gets in the way.
A chip is basically built as follows
metal
poly
metal
poly
Si Where the poly is the insulator and metal is the same as traces on a PCB. Just like you can not place components in the middle of a PCB you can not place transistors on top of the metal, it would require a second silicon layer that you could dope transistors into.
While there are some technologies (SOI for example) that may allow this in theory, you start to run into other issues like trying to punch through the insulator in specific areas and with high precision (neither of which is easy), heat dissipation (transistors are transistors, and switching produces heat, doesn't matter if it's an ALU or a SRAM). And finally before someone suggests using the other side of the wafer, how do you connect the two sides? A wafer is *very* thick in the scale we are discussing. It would be like mining a hole through the earth.
More useful would perhaps be distributing L0 cache (register memory) a little more liberally in key areas of the processor, but then addressing gets in the way. In theory having a MCM (multi chip module) with Cache - Processor - Cache so there is ample L3 cache running at core/4 clock may help, but costs get prohibitive.

There is no really good solution to moving data around once you start getting to these kinds of density. Eventually wire delay may be the limiting factor to CPU throughput.
-nB

--
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
Deep packet inspection by Slavidian · 2007-08-20 11:25 · Score: 2, Interesting

Tilera will succeed because the packet pushers want to be able to do deep packet inspection. Pay close attention to the first three in the apps list from their website:

Unified Threat Management
Network Security Appliances
In-line L4-7 deep packet inspection
Network Monitoring
Digital Video:
Video Conferencing
Video-on-Demand (VoD) Servers
Video surveillance
Media 'Head-End' services

The engineers in charge of this company should be ashamed of themselves. They are creating exactly the type of product that will help the telcos destroy the internet. DPI and UTM are completely at odds with the intentions of networking protocols. Tilera is handing over control of everything that you and I do online to the telcos. Where is Google? They should be diametrically opposed to the success of this company. Buy them up and quash them.