MIT Startup Unveils New 64-Core CPU
single-threaded writes "Tilera, a startup out of MIT, has announced that it is shipping a 64-core CPU. Called the TILE64, the CPU is fabbed on a 90nm process and is clocked at anywhere from 600MHz to 900MHz. 'What will make or break Tilera is not how many peak theoretical operations per second it's capable of (Tilera claims 192 billion 32-bit ops/sec), nor how energy-efficient its mesh network is, but how easy it is for programmers to extract performance from the device. That's the critical piece of TILE64's launch story that's missing right now, and it's what I'll keep an eye out for as I watch this product make its way in the market. Though there are any number of questions about this product that remain to be answered, one thing is for certain: TILE64 has indeed brought us into the era of 64 general-purpose, mesh-networked processor cores on a single chip, and that's a major milestone.'"
On my laptop right now:
> ps aux | wc -l
281
Of course not all those processes are in runnable state. On the other hand, many of those processes have multiple threads. A typical Java Swing GUI app may have a dozen threads, for example. A web server process can easily have dozens of runnable threads. Software is going to take a little bit of catching up, but nothing huge.
You're hoping they're doing something to make it easier to program, and I doubt they are. The choke point is rapidly becoming scheduling rather than number of cores.
This is the same problem we've been working with on clusters forever...How do you tune and load balance the jobs to the point where you're getting the most out of your hardware, and nothing is sitting idle while other parts of the system are running at 100%? What do you do when the task is already reduced to the simplest level and there is no benefit from throwing extra processors at it?
Someone smarter than me is going to have to figure it out...The only way I can think of doing it is grafting more scheduling crap on top of all the processors to break down tasks and assign them to cores, and adding another layer of complexity is almost always a bad idea.
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
It's was called Enumera www.enumera.com
I started to work with Chuck Moore, the author of the FORTH Language on a 7X7 array of very fast small processors.
From at talk I did, February 16, 2001
From http://www.dnull.com/~sokol/amorp/emtalk.ppt On this size Chip a 7x7 array (49 CPU's) with ram could be
build. Co-processors could also be added.
Each CPU's would be operating at 2400 MIPS x 49 for a total of 117 Billion operations per second.
The power consumption would be 1 watt 1.8 Volts a 500 mA.
With this level of computing power new applications that were unthinkable before, now become possible. Also mention earlier on Slashdot:
http://developers.slashdot.org/comments.pl?sid=13
And earlier here:
http://www.colorforth.com/ 25x Multicomputer Chip
This eventually became IntellaSys after Enumera failed. IntellaSys CTO Chuck Moore to Present at In-Stat Spring Processor Forum; Scalable Embedded Array Platform for Implementing Asynchronous, Scalable Multicore Solutions Using Elegant VentureForth Programming to Be Discussed in Detail http://www.intellasys.net/products/24c18/SEAforth
http://www.findarticles.com/p/articles/mi_m0EIN/i
http://www.findarticles.com/p/articles/mi_m0EIN/i
Also for older info see:
Specifically look at the P21 / I21/ F21 chips...
http://www.enumera.com/chip/
http://www.ultratechnology.com/ml0.htm
http://www.ultratechnology.com/f21.html#f21
http://www.ultratechnology.com/store.htm#stamp
http://www.ultratechnology.com/cowboys.html#cm
I am always doing that which I can not do, in order that I may learn how to do it. - Pablo Picasso
The solution, of course, is to move away from the imperative programming model to dataflow or functional one. That way the compiler can automatically parallelize the task, instead of the programmer having to do so manually.
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
"'What will make or break Tilera is not how many peak theoretical operations per second it's capable of (Tilera claims 192 billion 32-bit ops/sec), nor how energy-efficient its mesh network is, but how easy it is for programmers to extract performance from the device. That's the critical piece of TILE64's launch story that's missing right now"
Build a USD1000 desktop workstation, port Debian Linux to run on it and let the geeks out there adopt it.
There is no better way to explore a device's capabilities than to let the market do it.
I want one for myself. I am tired of the x86 architecture.
http://www.dieblinkenlights.com
A chip is basically built as follows metal
poly
metal
poly
Si Where the poly is the insulator and metal is the same as traces on a PCB. Just like you can not place components in the middle of a PCB you can not place transistors on top of the metal, it would require a second silicon layer that you could dope transistors into.
While there are some technologies (SOI for example) that may allow this in theory, you start to run into other issues like trying to punch through the insulator in specific areas and with high precision (neither of which is easy), heat dissipation (transistors are transistors, and switching produces heat, doesn't matter if it's an ALU or a SRAM). And finally before someone suggests using the other side of the wafer, how do you connect the two sides? A wafer is *very* thick in the scale we are discussing. It would be like mining a hole through the earth.
More useful would perhaps be distributing L0 cache (register memory) a little more liberally in key areas of the processor, but then addressing gets in the way. In theory having a MCM (multi chip module) with Cache - Processor - Cache so there is ample L3 cache running at core/4 clock may help, but costs get prohibitive.
There is no really good solution to moving data around once you start getting to these kinds of density. Eventually wire delay may be the limiting factor to CPU throughput.
-nB
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
Tilera will succeed because the packet pushers want to be able to do deep packet inspection. Pay close attention to the first three in the apps list from their website:
Unified Threat Management
Network Security Appliances
In-line L4-7 deep packet inspection
Network Monitoring
Digital Video:
Video Conferencing
Video-on-Demand (VoD) Servers
Video surveillance
Media 'Head-End' services
The engineers in charge of this company should be ashamed of themselves. They are creating exactly the type of product that will help the telcos destroy the internet. DPI and UTM are completely at odds with the intentions of networking protocols. Tilera is handing over control of everything that you and I do online to the telcos. Where is Google? They should be diametrically opposed to the success of this company. Buy them up and quash them.