Sandia Wants To Build Exaflop Computer
Dan100 brings us an announcement that Sandia and Oak Ridge National Laboratories are setting their sights on an exaflop supercomputer. Researchers from the two laboratories jointly launched the Institute for Advanced Architectures to facilitate development. One of the problems they hope to solve is how to provide each core of each processor with enough data so that cycles aren't going to waste.
"The idea behind the institute — under consideration for a year and a half prior to its opening — is 'to close critical gaps between theoretical peak performance and actual performance on current supercomputers,' says Sandia project lead Sudip Dosanjh. 'We believe this can be done by developing novel and innovative computer architectures.' The institute is funded in FY08 by congressional mandate at $7.4 million."
You don't usually run one program on these type of systems. The compute cycles are bidded out to researchers and they get x number of compute hours. The system is partitioned out to a few nodes and given to the researcher to run their codes on. You could have on a system like this hundreds of jobs running simultaneously. Also, with the tens of thousands of cores needed to reach this status, a node failure, or other hardware failure is inevitable. Right now if a node fails in the middle of the job, everything is lost from the last checkpoint. The chances of failures impeding work go up greatly the more nodes and cores you run the job on.