Grid Processing
c1ay writes "We've all heard the new buzzword, "grid computing" quite a bit in the news recently. Now the EE Times reports that a team of computer architects at the University of Texas here plans to develop prototypes of an adaptive, gridlike processor that exploits instruction-level parallelism. The prototypes will include four Trips(Tera-op Reliable Intelligently Adaptive Processing System) processors, each containing 16 execution units laid out in a 4 x 4 grid. By the end of the decade, when 32-nanometer process technology is available, the goal is to have tens of processing units on a single die, delivering more than 1 trillion operations per second. In an age where clusters are becoming more prevalent for parallel computing I've often wondered where the parallel processor was. How about you?"
I use parallel computing on a cluster, in which I divide up my computational domain into a number of chunks, and each chunk is farmed out to a processor. Communication between the processes is required at the chunk boundaries.
For this case, I see how my code is partitioned, and I also understand (on a general level, at least) what the limitations on speed are: information based between the chunks.
Now, how will this processor do its 'instruction level' parallelization? Will it be great at do loops (one 'do' per processer)? Will it be like a mini vector processor? What will break down the efficiency of the parallelization?
I have found that efficiency in parallelization is very application dependent after about 8-32 procesors. Will this break that barrier?
Most importantly, will it kick butt for MY applications?
Scientific and financial computing, especially modelling and simulation, are where parallel computers can make a difference.
Many of the approaches to these problems take the form of a grid of elements that have local and possibly non-local interactions with each other. Each processor gets a subset of the points to work with and has to communicate with the neighboring processor's memory space to get information about neighboring points.
In a cluster, handling the points at the edges (or any non-local effects) requires a network and possibly disk request. Compared to local memory, this is incredibly slow and can temporarily starve the processor.
Big iron parallel systems address this by giving more processors access to the same memory and other shared resources, avoiding the costly network requests.
Of course, the current super computers (ASCII *, etc) are all clusters, just with incredibly fast network connections.
-Chris