SGI Demos 64-Proc Linux Box

← Back to Stories (view on slashdot.org)

Posted by ryuzaki0 on Monday September 9, 2002 @04:54AM from the hardware-to-lust-after dept.

foobar104 writes "Details are scarce, but SGI announced this morning that their prototype Itanium 2 system has demonstrated more than 120 GB/s to and from main memory on the STREAM TRIAD benchmark, which is the fourth best result in the world. For comparison, the Cray C90 sustains 105 GB/s, while an even larger Sun Fire 15K clocks a measly 55 GB/s. The interesting part? The system wasn't running IRIX, SGI's proprietary version of UNIX. It was running Linux. More information on STREAM TRIAD, including results from other systems, is available here. The system, incidentally, was an Origin 3800 straight out of manufacturing equipped with Itanium 2 processor modules. SGI will start selling the systems early next year."

8 of 253 comments (clear)

Min score:

Reason:

Sort:

What is this good for? by Anonymous Coward · 2002-09-09 04:57 · Score: 3, Insightful

To me, it would seem that the primary purpose of being able to push info that fast to and from memory is useful for very few problems these days. I was under the impression that the majority of "super-computing" problems were of the sort that required lots of calculations, not lots of parsing of information in storage.

Am I wrong about what this benchmark means? Or am I missing something basic?
1. Re:What is this good for? by Jhan · 2002-09-09 05:17 · Score: 5, Insightful
  
  Typical super-computing problems are weather prediction, air flow computations and nuclear reaction modelling. Physical models in other words.
  
  Generally, you attack these kinds of problem by partitioning 3-d space into many small cells, and then running relatively simple calculations on every cell. The better the resolution, the better the model.
  
  The thing about three dimensions is, storage space increases with resolution^3... For instance, I believe the weather guys are currently pushing 1kmx1kmx100m resolutions. That means about 3,2e11 cells. If each cell has 1 kB of state, the total memory usage would be about 320 TB.
  
  Super computing problems eat memory like Takeru Kobayashi eats hot dogs. In many (most?) cases the calculations are simple. Hence, bandwidth is King.
  
  --
  I choose to remain celibate, like my father and his father before him.
2. Re:What is this good for? by foobar104 · 2002-09-09 05:19 · Score: 4, Insightful
  
  Am I wrong about what this benchmark means? Or am I missing something basic?
  
  With no disrespect intended, I think you might be missing something basic.
  
  Any activity that involves moving data into and out of RAM will benefit from the ability to do it faster. That includes things such disparate things as database processing (if you're lucky, you can cache your indices in RAM), media encoding, hell, even compiling. Memory bandwidth is one of the few aspects of computer design that touches just about every application, with the exception of those that are small enough-- or sufficiently well optimized-- to fit into cache.
3. Re:What is this good for? by stienman · 2002-09-09 05:47 · Score: 3, Insightful
  
  Primarily this is good for marketting, company image, press releases, and selling potential customers on smaller systems.
  
  Chances are good that they will build very few full scale machines. Those that are built go towards data-warehousing, research (atmospheric, oceanic and space science, nuclear modeling, etc) and to the government. Factoring large primes is a use, for instance, as it's a problem that can be performed in parallel.
  
  But they will have the ability to say that x, y, and z companies/ gov't agencies have our equipment, it can't be exported (so it must be good), and our lower end machines will suit your job until you need an upgrade - in other words we can be with you for the whole ride and promise application compatability.
  
  -Adam
Make the demo Open Source! by Lieutenant_Dan · 2002-09-09 05:09 · Score: 2, Insightful

If we could work together (plus Mr Perens who is currently looking for a good cause to lead) we could take the demo to greater heights.

What is to say that the demo's code isn't buggy and shoddy, holding the power Itanium processors back?

If we realize the vast potential that the Open Source developer community provides then we can tackle such complex tasks as this Itanium performance measurement.

--
Wearing pants should always be optional.
Re:Well, that's nice, but what about... by foobar104 · 2002-09-09 05:16 · Score: 3, Insightful

Anyone have an educated guess of what the actual score would be?

Zero. Origin servers don't have graphics cards. Which means, unfortunately, the Slashdot community is going to have to try to wrap its collective head around a more meaningful measurement of potential performance.
Re:Historical comparison... by ivan256 · 2002-09-09 05:20 · Score: 3, Insightful

SGI didn't choos to comapre this to a C90, the slashdot submitter did. SGI primarily compared it to the "IBM® eServer p690 and Sun Microsystems Sun Fire"

The part that I really find interesting is that the top three in the list all outperform this by twice as much, the #1 spot being held by a machine that can do over 500GB/sec.

It's still over 12x faster then the quad Itaniums I used to work with, and probably much cheeper then the NEC machines and the Cray...
Impressive memory crossbar by Animats · 2002-09-09 05:43 · Score: 5, Insightful

First of all, the OS doesn't matter for this benchmark. This is a memory-to-memory copying test.
That said, it's an impressive result. And it's done in an unusual way. SGI has a 1.6GB/s channel running through routers connecting the processors and memory. A computer is made up of multiple rackmount "bricks" connected by cables and routers. The "router" is a 2U rackmount device.
Processors and memory reside in rackmount boxes with 4 CPUs and 8 GB (max) of local memory. These boxes interconnect through a single 1.6GB/s link per box, which, in a big system, goes through several layers of routers. So a memory access to another box is routed through what is essentially a fast LAN. All this is cached, of course.
It's not clear to what extent application programs have to be aware of this. Clearly, if you lay things out in memory badly, with lots of CPUs reading and writing the same memory from all over the memory net, the system will bottleneck. (Everybody reading the same stuff is OK; it's cached. But writes have to propagate back to the home location of the data.)
Since the whole monster crashes all at once, you don't want to build your web server farm this way. It's for applications that really need all that crunch power in one machine.