Panic in Multicore Land
MOBE2001 writes "There is widespread disagreement among experts on how best to design and program multicore processors, according to the EE Times. Some, like senior AMD fellow, Chuck Moore, believe that the industry should move to a new model based on a multiplicity of cores optimized for various tasks. Others disagree on the ground that heterogeneous processors would be too hard to program. The only emerging consensus seems to be that multicore computing is facing a major crisis. In a recent EE Times article titled 'Multicore puts screws to parallel-programming models', AMD's Chuck Moore is reported to have said that 'the industry is in a little bit of a panic about how to program multicore processors, especially heterogeneous ones.'"
I think "panic" is a bit of an over-reaction. I use a multicore CPU. I write software that runs on it. I'm not panicking.
Follow me
What Mr Moore is saying does have a grain of truth, that generic will be beaten by specific in key functions. The Amiga proved that in 1985, being able to deliver a better graphical solution than workstations costing tens of thousands more. The key now is to figure out which specifics you can use without driving up the cost nor without compromizing the design ideal of a general purpose computer.
Karma Whoring for Fun and Profit.
Perhaps, panic is a little strong. At the same time, programing languages such as Occam, that are built from the ground up seem very provocative now. Perhaps Occam's syntax could modified to a Python-type syntax for a more popularity.
[Although, personally, I prefer Occam's syntax over that of C's.]
http://en.wikipedia.org/wiki/Occam_programming_language
I think that a tread aware programming language would be good in our multi-core world.
https://www.youtube.com/c/BrendaEM
You're not the only person using heterogeneous cores, however. In fact, the Cell is a minority. Most people have a general purpose core, a parallel stream processing core that they use for graphics and an increasing number have another core for cryptographic functions. If you've ever done any programming for mobile devices, you'll know that they have been using even more heterogeneous cores for a long time because they give better power usage.
I am TheRaven on Soylent News
For servers the real problem is I/O. Disks are slow, network bandwidth is limited (if you solve that then memory bandwidth is limited ;) ).
;).
;).
For most typical workloads most servers don't have enough I/O to keep 80 cores busy.
If there's enough I/O there's no problem keeping all 80 cores busy.
Imagine a slashdotted webserver with a database backend. If you have enough bandwidth and disk I/O, you'll have enough concurrent connections that those 80 cores will be more than busy enough
If you still have spare cores and mem, you can run a few virtual machines.
As for desktops - you could just use Firefox without noscript, after a few days the machine will be using all 80 CPUs and memory just to show flash ads and other junk
Heterogeneous cores are already in almost every PC I've seen so far this millennium. Anyone with a GPU is running heterogeneous cores in their machine. How do we handle it? The first half of your second sentence; libraries and frameworks. OpenGL, DirectX and whatnot provide the frameworks we need while the various manufacturers provide the drivers to maintain compatibility with the various APIs. We'll see soon enough (as a result of the Cell) if the same thing (2 or more different libraries for the same processor; one for each of it's core-types) becomes the norm for other heterogeneous core system. I think so, but it may be overlooked by manufacturers who want to view a processor as a unit instead of a compilation of various units. They'll figure it out, these guys aren't MBAs, they're the truly educated. :-D
Back in 2000 I realized that 50 Million transistors of 4004 the first processor ever created, would out perform a P4 with the same transistor count done in the same fab running at the same clock rates. it would be over 10x faster I work out. But how to use such a device?
I had been working with a 100 PC cluster of P4 based systems to do H.264 HDTV compression in realtime. I spread the compression function across the cluster using each system to work on a small part of the problem and flow the data across the CPU's.
Based on this I wanted to build an array of processors on one chip, but I am not a silicon person, just software, driver and some basic electronics. So I looked at various FPGA cores, Arm, MIPS, etc. Then I went to a talk giving by Chuck Moore, author of the language FORTH. He had been building his own CPU's for many years using his own custom tools.
I worked with Chuck Moore for about a year in 2001/2002 on creating a massive multi core processor based on Chucks stack processor.
The Idea was instead of having 1,2 or 4 large processor to have 49 (7 * 7) small light but fast processors in one chip. This would be for tacking a different set of problems then your classic cpus'. It wouldn't be for running and OS or word processing, but for Multimedia, and cryptography, and other mathematic problems.
The idea was to flow data across the array of processors.
Each processor would run at 6Ghz, with 64K word of Ram each.
21 Bit wide words and bus (based off of F21 processor)
this allows for 4x 5bit instructions on a stack processor that only has 32 instructions.
Since it's a stack processor they run more efficiently. So in 16K transistors, 4000 gates,
the F21 at 500 Mhz performed about the same as a 500Mhz 486 with JPEG compress and decompress.
With the parallel core design instead of a common bus or network between the processors there would only be 4 connections into and out of each processor. These would be 4 registers that are shared with it's 4 neighboring processors that are laid out in a grid. So each chip would have a north, south, east and west register.
Data would be processed in whats called a systolic array, where each core would pick up some data, perform operations on it and pass it along to the next core.
The chips with a 7x7 grid of processors would expose the 28(4x7) bus lines off the edge processors, so that these could be tiled into a much larger grid of processors.
Each chip could perform around 117 Billion instructions per second at 1 Watt of power.
Unfortunately I was unable to raise money, partly because I couldn't' get any commitment from Chuck.
below is some links and other misc information on this project. Sorry it's not better organized.
This was my project.
---------
http://www.enumera.com/chip/
http://www.enumera.com/doc/Enumeradraft061003.htm
http://www.enumera.com/doc/analysis_of_Music_Copyright.html
http://www.enumera.com/doc/emtalk.ppt
--------
This was Jeff foxes independent web site, he work on the F21 with Chuck.
http://www.ultratechnology.com/ml0.htm
http://www.ultratechnology.com/f21.html#f21
http://www.ultratechnology.com/store.htm#stamp
http://www.ultratechnology.com/cowboys.html#cm
------
http://www.colorforth.com/ 25x Multicomputer Chip
Chucks site. 25x has been pulled down, but it's accessible on archive.org.
http://web.archive.org/web/*/www.colorfo
I am always doing that which I can not do, in order that I may learn how to do it. - Pablo Picasso