The Father of Multi-Core Chips Talks Shop
pacopico writes "Stanford professor Kunle Olukotun designed the first mainstream multi-core chip, crafting what would become Sun Microsystems's Niagra product. Now, he's heading up Stanford's Pervasive Parallelism Lab where researchers are looking at 100s of core systems that might power robots, 3-D virtual worlds and insanely big server applications. The Register just interviewed Olukotun about this work and the future of multi-core chips. Weird and interesting stuff."
That strikes me as crackpottery. The stuff that link describes as "nonalgorithmic" is also easily algorithmic, just in a process calculus.
And guess what? Non-kooks in the compsci community are busily working on process calculi and languages or language-facilities built around them.
Multi-core chips will be constrained by, among other things, the memory bandwidth going off-chip. Maybe they need larger caches. Maybe they just need to put all the RAM on the chip itself instead of so many other cores. How about 4GB of RAM at 1st level cache speed.
Ultimately, we'll end up with PCs made from SoCs, and direct SATA, USB, Firewire, and DVI interfaces coming out instead of a RAM access bus. By the time they are ready to make 256 core CPUs, software still won't be ready to work well on that. So in the interim, they might as well just do tighter integration (that can also run faster there, too). No more north bridge or south bridge. Just a few capacitors, resistors, and maybe a buffer amp or two, around the big CPU.
About the only thing that won't be practical to put in the CPU for a long time is the power supply. They could even put the disk drive in there (flash SSD).
now we need to go OSS in diesel cars
I can see your point... I can imagine a thing that looks a whole lot like an FPGA whose cells are designed to accept new functional definitions extremely dynamically.
(As you can tell, I don't agree with using the name "non-algorithmic": It's algorithmic by any reasonable theoretical definition. This is why I refer to it as being an extremely fine-grained data flow model.)
However, if you look at modern FPGAs, you will discover that even there, the macrocells are fairly large objects.
I guess that when it comes down to it, the "non-algorithmic" model proposed in the page you cite seems so fine-grained that benefits would be overwhelmed by connectivity issues. By this I mean not simply bandwidth among functional components, but in defining "who talks with whom under what dynamically changing circumstances". Any attempt to discuss fine-grained data flow must face the issue of efficiency in connecting the interacting data and control "elements".
There's the possibly even more interesting question about how many of each sort of functional module should be built.
What do you say to meeting in the middle, and thinking about a system that isn't so fine-grained, while also thinking of "control functions" as being just as movable as the data elements? Here's why I ask: In my opinion, there might well be some very good research work to be done in applying techniques related to functional programming to a system of extremely large number of simple functional units that know how to move functionality around with the data.
We already have servers for INSANELY HUGE internet apps, its called a main-frame.
It amazes me to no end, how many people still think its about the CPU. It about throughput, ok? Can we just get that fucking settled already? I don't give a rats ass how many damn cores you have running or if the are running 100 gigahertz, if you are still reading data across a bus, over an ethernet connection, ANYTHING that does not work at CPU speed then it makes little difference, that damn CPU will be sitting there spinning waiting for the data to come popping through so it can do something!
Mainframes use 386 chips for I/O controllers and even those sit there and loaf, talk about a waste of electricity! About .01% of the worlds computers need the kind of power that a CPU with more then say 4 cores provide. Those that do are rather busy doing insanely complex mathematics, but even then I doubt that the CPU(s), even when running at "100%" utilization are actually doing the work that they were programmed to do, rather they are waiting for I/O to a database or RAM and fetching data.
Until someone figures out how to move data in a far far more efficient manner then we currently understand, these mega-core CPU's, while nice to think about, are simply a waste of time and silicon with the possible exception of research.
Hey KID! Yeah you, get the fuck off my lawn!
Right, so you split your computation up into small units that can be efficiently allocated to the many core array. This allows you to express the parallelism in the program properly, because you're not constrained by the coarse granularity of a thread model. Cool.
But the problem here is how you write the code itself. Purely functional code maps really well onto this model, but nobody wants to retrain all their programmers to use Haskell. We're going to end up with a hybrid C-based language: but what restrictions should exist in it? This depends on what is easy to implement in hardware - because if we wanted to stick with what was easy to implement in software, we'd carry on trying to squeeze a few extra instructions per second out of a conventional CPU architecture.
The biggest restriction turns out to be the "R" in RAM. Most of our programs use memory in an unpredictable way, pulling data from all over the memory space, and this doesn't map well to a many core architecture. You can put caches at every core, but the cache miss penalty is astronomical, not to mention the problems of keeping all the caches coherent. Random access won't scale; we will need something else, and it will break lots of programs.
This is going to lead to some really shitty science, because:
I think that the eventual winning architecture will be the one that is easiest to write programs for. But it will have to be so much better at running those programs that it is worth the effort of porting them. So it will have to be a huge improvement, or easy to port to, or some combination of the two. However, those are qualitative properties. Anyone could argue that their architecture is better than another - and they will.
>north
You're an immobile computer, remember?