Slashdot Mirror


SGI Demos 64-Proc Linux Box

foobar104 writes "Details are scarce, but SGI announced this morning that their prototype Itanium 2 system has demonstrated more than 120 GB/s to and from main memory on the STREAM TRIAD benchmark, which is the fourth best result in the world. For comparison, the Cray C90 sustains 105 GB/s, while an even larger Sun Fire 15K clocks a measly 55 GB/s. The interesting part? The system wasn't running IRIX, SGI's proprietary version of UNIX. It was running Linux. More information on STREAM TRIAD, including results from other systems, is available here. The system, incidentally, was an Origin 3800 straight out of manufacturing equipped with Itanium 2 processor modules. SGI will start selling the systems early next year."

19 of 253 comments (clear)

  1. Re:So what is faster than it in the TRIAD? by Durinia · · Score: 3, Informative

    Interesting...Looks like a T932 has got about a 3x performance on it, and the NECs (understandably, since they are the most modern) get like 5x. Still pretty impressive for a MPP machine, I would think. Were you able to find stats on MPP systems (such as the T3E or SP) anywhere?

  2. Re:What is this good for? by Falrick · · Score: 2, Informative

    Its good for, as another poster put it, simulations. Specifically simulations with lots of tightly coupled entities. If you are simulating, say 100 different entities, and the action of each of those entities has an affect on all of the other 99 entities, you gain greatly from a massively parallel shared memory environment. Sending state changes through a cluster can kill these kinds of applications.

    --
    something clever
  3. Re:Stock Kernel? by Jobe_br · · Score: 2, Informative

    I, too, was wondering if SGI has produced a patch for this or if its running a linus kernel. Chances are, though, it isn't 2.4.x which is in maintenance mode, but rather the 2.5.x series, which is concentrating on enhancing scaleability. Surprising, however, that the 2.5.x line would have gotten such impressive results so early. 2.5.x has only been in the works for a short time now, right?!?

  4. Re:So what is faster than it in the TRIAD? by brejc8 · · Score: 3, Informative

    These results are quite old. The SGI MIPS based machines seem to be much faster.
    512 processor Origin 3000 quoated as 716 GB/sec.
    I have no idea why they are using Itanics for this but its not because they are better processors.

  5. Re:What would you do with it? by tvalley000 · · Score: 2, Informative

    At a company I worked for in 1997, we used an SGI box of comperable power (well, not _that_ much power) to do real-time rendering of geological resevoirs of data. Typical data points were about 40MB of data, directly measured from the field of study. The purpose was a "fly through" for geologists to tell where oil could be found.

    Everyone on the team used SGIs (I used an Indigo 2, arguably the slowest box in the office) running IRIX. The Origin system sat two floors below us, with the 3D programmer only having the keyboard, mouse and monitor in his office. It made it difficult when we wanted to run a game of Quake, as everyone could easily sneak up on him.

  6. Re:Stock Kernel? by GigsVT · · Score: 2, Informative

    SGI is actually the driving force behind a lot of work on linux scalability. SGI submits patches to the kernel, everyone benefits, etc.

    Linux isn't really optimized for a lot of processors, but companies like SGI are working to change that, and contributing a lot to the community in the process.

    --
    I've had enough abrasive sigs. Kittens are cute and fuzzy.
  7. STREAM and SGI past history by dprice · · Score: 4, Informative

    It's not surprising that the SGI machine runs STREAM well. Back in the mid-1990's, John McCalpin, who worked for SGI at that time, was a regular contributor to comp.sys.super, and he would frequently brag about the superiority of SGI running STREAM. McCalpin is one of the primary advocates for STREAM. You can optimize a computer architecture to run a particular benchmark well. The question is whether the SGI machine runs a wider variety of real-world problems well.

  8. Re:Historical comparison... by foobar104 · · Score: 3, Informative

    ...interesting that SGI chose the Cray C90 - a system released in *1991* - to compare against. It's nice to know that it's only taken them 10+ years to catch up. :)

    If you read the STREAM TRIAD web site linked above, you'll see that SGI didn't compare itself to the C90 exactly; it just ran a benchmark and published the results. Also in that approximate rank are other machines from NEC and Cray and, further down, Sun.

    But you're right. Cray was way ahead of their time when it came to things like memory bandwidth. I remember a friend (ex-Crayon) telling me once that access to main memory on the T-90 was faster than access to the on-chip cache on the Pentium III. That sounds implausible, though, so he might have been exaggerating.

    I'm curious how the new X1 (nee SV2) does on the STREAM suite.

    The last word I got is that X1 is still in the PCB design phase. It's only running as a simulator right now. So it'll be a while before you see those numbers. ;-)

    (That info is several months old, so I may be wrong.)

  9. Re:What is this good for? by ericman31 · · Score: 5, Informative

    One of the areas this is meaningful is data warehousing. There are three major competitors in the very large data warehousing environment and one wanna be competitor:

    • NCR Teradata and Worldmark MPP servers
    • IBM DB2 and IBM pSeries clusters (MPP again)
    • Sun SunFire 15K and Sybase IQ Multiplex (SMP)
    • Oracle is trying to compete in this space and not really succeeding. Their model is sort of MPP, based on Oracle Real Application Clusters
    MPP, or massively parallel processing, is the typical solution for very large (generally anything over 3 or 4 terabytes) data warehouses. Sun and Sybase are trying hard to crack the market with their SMP (symmetric multi-processing) solution, which is actually very promising. The major benefit to SMP processing is simplicity, one server to maintain, one OS, no cluster, no cluster interconnect. With Linux potentially pushing into the large SMP space we will have the potential for competition to the MPP data warehouse solutions, which are incredibly expensive to purchase and maintain.

    One of the biggest drawbacks to Linux adoption in the commercial Enterprise space is its lack of SMP scalability. If the SGI platform works out we will start seeing Linux scaling into an arena that will allow for acceptance in the Enterprise.

    --
    In my universe I'm perfectly normal, it's not my fault you don't live in my universe.
  10. Re:impressive w/Linux by CMonk · · Score: 5, Informative

    Given that they list "scalability" as one of the open source projects that they contribute to I would say they are playing nice with the community. (http://oss.sgi.com/projects/).

    They are working hard to get a number of their changes into the offical kernel, I imagine this is one of them .

  11. Statics, Benchmarks, and lies... by AtariDatacenter · · Score: 5, Informative

    I think it is pretty interesting that the benchmark that they used measured memory throughput, as opposed to, say, an actual workload. In other words, this is a synthetic benchmark, versus a real-world benchmark. They say, "Look! We can do memory transfers really really fast!"

    Unfortunately, memory transfers are not the world when it comes to large multiprocessor boxes. The overhead comes in when you're trying to synchronize a large number of threads/CPUs to do a large task. For example, an Oracle database.

    Sun has proven that it scales up the tree very well with large numbers of processors. But from my understanding, Linux is more efficient with a low processor count, and less and less efficient with more processors.

    I question its ability to do anything with a real workload. And I've even more suspicious because they use a benchmark I've never heard of (STREAM TRIAD) to push its superiority on a single-aspect synthetic benchmark.

    Good. The machine looks like it has a decent memory bus, and memory modules with a good configuration and speed rating. Now, what can the machine actually do well that makes it a real winner?

    1. Re:Statics, Benchmarks, and lies... by foobar104 · · Score: 5, Informative

      Good. The machine looks like it has a decent memory bus, and memory modules with a good configuration and speed rating.

      You know, before you piss in SGI's Cheerios, you might want to do a little reading. The Origin 3000 architecture, on which this prototype system was based, has no memory bus at all. It uses a fabric of switched multi-gigabyte-per-second interconnects to attach CPUs to RAM and to other CPU nodes.

      CPU benchmarks (like SPEC) are synthetic and irrelevant, because they fit in cache. Virtually no real application fits in cache, and the sort of applications you run on a machine this big deal with data sets no the order of tens or even hundreds of gigabytes. Memory-to-CPU bandwidth is probably the only real indicator of the ability of the system to handle real-world workloads.

      It's also the only thing-- other than the dimensions and the color of the plastics-- that differentiates SGI's big Itanium 2 server from everybody else's big Itanium 2 servers.

  12. it's the other way 'round by halfelven · · Score: 2, Informative

    Actually, it's precisely because of lack of superfast mem-IO machines that many people tried to work around the problem and create algorithms that are CPU-bound.
    In fact, most of the computationally-intensive problems require LOTS of mem-IO.

    And there's one more thing: there's a huge difference between the 64-CPU SGI machine, and a Mosix cluster of 64 1-CPU nodes: the SGI has one single memory space contiguous on the same machine. That means you can actually use a very large matrix to process your data, instead of shoving bits of it over the network back and forth.
    There are entire classes of problems that will be solved orders of magnitude faster on the SGI server than on a network-distributed Mosix cluster (or any other kind of cluster, Beowulf, etc.). That's the advantage of true SMP systems (all CPUs on the same hardware) as opposed to networked clusters.

  13. Re:MIPS is to IA64 as Irix is to Linux? by foobar104 · · Score: 4, Informative

    Anybody else see that as the main reason this is running Linux instead of Irix?

    SGI started working on porting IRIX to the IA-64 architecture back in (I think it was) 1995 or 1996. Not long after, they found that it would be easier and cheaper to get Linux to scale more efficiently and to port some key libraries and services from IRIX than it would be to port all of IRIX over to the new architecture.

    It's all about time and money.

  14. Re:Impressive memory crossbar by foobar104 · · Score: 5, Informative

    It's not clear to what extent application programs have to be aware of this. Clearly, if you lay things out in memory badly, with lots of CPUs reading and writing the same memory from all over the memory net, the system will bottleneck.

    Speaking as somebody who's done his share of IRIX programming, I'd say "none at all."

    In some cases, on Origin 2000 hardware with older versions of IRIX, you could see notable performance differences if you went out of your way to place memory in banks adjacent to the running processors. But the Origin 3000 architecture, with its significant reductions in memory latency, and newer versions of IRIX, with their improved page replication algorithms, have made manual memory placement almost obsolete. Almost.

    SGI spent a lot of time and trouble trying to reduce the impact of accessing remote memory. The caching mechanisms and page replication stuff are really well thought-out.

  15. Re:stats? by dohcvtec · · Score: 2, Informative

    One nitpick: IIRC it would be CPU #0 - CPU #63

    --
    -- Never hit a man with glasses. Hit him with a baseball bat.
  16. Re:So what is faster than it in the TRIAD? by Durinia · · Score: 2, Informative
    512 processor Origin 3000 quoated as 716 GB/sec.

    That's a peak speed, not a STREAM speed. Some of these machines (like the NEC SX-6) have peak speeds that are *much* higher. STREAM is an attempt at showing how a system performs on a somewhat more realistic workload.

  17. Re:What is this good for? by littleRedFriend · · Score: 3, Informative

    I work for a company that writes software for those kinds of genomic computations (yes, it runs on Linux, MPI & SMP). We recently did a large computation on the 4th largest super computer in the world. The results are freely available.

    Most of these computations are pretty intensive in CPU and memory usage. Network speed and disk speed are less important (although you need lots of storage). I would like to try one of these babies, must be fast.

    --
    IANAL, but imagine a beowulf cluster of in Soviet Russia all your belong are base to us welcoming the new SCO overlords.
  18. Re:What is this good for? by Anonymous Coward · · Score: 3, Informative

    1 km x 1 km x 100 m for Numerical Weather Prediction
    is a bit much for today's (affordable) supers.

    We use a 22 km x 22 km horizontal grid for
    predicting the weather 48 hours ahead over the
    North Atlantic + Europe (406 x 324 cells).

    We use 31 layers in the vertical (from ~30 meters
    thick in the lowest level to ~2 km for the few in
    the stratosphere.

    This is for a so-called "limited area" model. A
    global model such as the model of the European
    Centre uses about half the resolution (40 km)
    over the entire globe.

    Toon Moene.