Supercomputers To Move To Specialization?
lucasw writes "The Japan Earth Simulator outperformed a computer at Los Alamos (previously the world's fastest) by a factor of three while using fewer, more specialized processors and advanced interconnect technology. This spawned multiple government reports that many suspected would ask for more funding in the U.S. for custom supercomputer architectures and less emphasis on clustering commodity hardware. One report released yesterday suggests a balanced approach."
Ordinary off the shelf microprocessors don't have the bandwidth to memory or bandwidth to other processors to simulate complex problems. NEC's machine is a Vector architecture (SX-6), similar to the kind you see from the Cray X1. Vector architectures are a SIMD-style processor.
I assume that hard-coding trig functions into the tertiary processors would be advantagious for this. I know it violates the spirit of RISC in general-pupose computing, but for such a large scale system with so many processors it coould be advantagious.
Do HP's Saturn or other such special-purpose processors have hard-coded higher-level functions?
You can't judge a book by the way it wears its hair.
The interconnects are (usually) not commodity parts -- just the servers.
As an example, the first IBM SP "supercomputers" were essentially just common Power workstations bolted into racks, but connected with a custom made SP switch.
Nevertheless, EarthSimulator has shown what can be done by designing the entire server from the ground-up with the application in mind.
We'll have to see how ASCI Purple performs...
Because some problems don't work on clusters--things like large-scale molecular dynamcis simulations with long-range spatial interactions.
Problems that require the nodes to share massive amounts of data between nodes (gigabytes per second and up--these problems often have N^2 behaviors) don't do so well on a cluster since they tend to saturate the network. A shared-memory system, like a supercomputer, on the other hand, can provide much better memory access times (top of the line Cray's have a peak memory transfer rate of 204 GB per sec per node [yes, 204 gigabytes per second]) and since there's only one copy of the memory, there can often be a lower peak bandwidth requirement.
In short, it all depends on the problem you need to solve. Some problems work very well on clusters, others do not.
-JS
Vanity of vanities, all is vanity...
computers like the earth simulator go vastly under utilized for the most part
From first-hand experience, such computers are running jobs almost 24x7. Due to job scheduling details there are times when some of the machine is idle, but this is still a small percentage. These machines are used for a vast array of applications, not just the advertized ones.
Now the utilization as a percentage of peak theoretical is another matter. For some algorithms, 20% of peak performance (IIRC) is considered good (ie. a particular code might only get 2 TFlops on a machine rated for 10).
There seems to be an impression in some comments that this machine has some sort of special design that's only applicable to climate modeling problems. In fact, this is a vector-based supercomputer, applicable to any problem where you need to perform vector operations (i.e., operating on large arrays of numbers in parallel).
Certain numerical operations can be performed blindingly fast on these types of machines. Each arithmetic processor on this machine has 72 vector registers, each of which can hold 256 elements. Then you can perform operations on all 256 elements of 1 or more registers simultaneously! If the algorithm can keep the vector units fed, they will scream.
Since keeping data flowing to the processors is critical to speed, the high-speed interconnects (~12GB/s) are a must for any problem that is not completely localized. It's all about matching the problem to the hardware. There may well be problems for which a commodity cluster just can't get the job done like this can. Remember that each node of a cluster consumes power, produces heat, and takes up space. The raw cost of hardware is not the only consideration.