SGI Demos 64-Proc Linux Box
foobar104 writes "Details are scarce, but SGI announced this morning that their prototype Itanium 2 system has demonstrated more than 120 GB/s to and from main memory on the STREAM TRIAD benchmark, which is the fourth best result in the world. For comparison, the Cray C90 sustains 105 GB/s, while an even larger Sun Fire 15K clocks a measly 55 GB/s. The interesting part? The system wasn't running IRIX, SGI's proprietary version of UNIX. It was running Linux. More information on STREAM TRIAD, including results from other systems, is available here. The system, incidentally, was an Origin 3800 straight out of manufacturing equipped with Itanium 2 processor modules. SGI will start selling the systems early next year."
It's not surprising that the SGI machine runs STREAM well. Back in the mid-1990's, John McCalpin, who worked for SGI at that time, was a regular contributor to comp.sys.super, and he would frequently brag about the superiority of SGI running STREAM. McCalpin is one of the primary advocates for STREAM. You can optimize a computer architecture to run a particular benchmark well. The question is whether the SGI machine runs a wider variety of real-world problems well.
One of the areas this is meaningful is data warehousing. There are three major competitors in the very large data warehousing environment and one wanna be competitor:
- NCR Teradata and Worldmark MPP servers
- IBM DB2 and IBM pSeries clusters (MPP again)
- Sun SunFire 15K and Sybase IQ Multiplex (SMP)
- Oracle is trying to compete in this space and not really succeeding. Their model is sort of MPP, based on Oracle Real Application Clusters
MPP, or massively parallel processing, is the typical solution for very large (generally anything over 3 or 4 terabytes) data warehouses. Sun and Sybase are trying hard to crack the market with their SMP (symmetric multi-processing) solution, which is actually very promising. The major benefit to SMP processing is simplicity, one server to maintain, one OS, no cluster, no cluster interconnect. With Linux potentially pushing into the large SMP space we will have the potential for competition to the MPP data warehouse solutions, which are incredibly expensive to purchase and maintain.One of the biggest drawbacks to Linux adoption in the commercial Enterprise space is its lack of SMP scalability. If the SGI platform works out we will start seeing Linux scaling into an arena that will allow for acceptance in the Enterprise.
In my universe I'm perfectly normal, it's not my fault you don't live in my universe.
Given that they list "scalability" as one of the open source projects that they contribute to I would say they are playing nice with the community. (http://oss.sgi.com/projects/).
They are working hard to get a number of their changes into the offical kernel, I imagine this is one of them .
I think it is pretty interesting that the benchmark that they used measured memory throughput, as opposed to, say, an actual workload. In other words, this is a synthetic benchmark, versus a real-world benchmark. They say, "Look! We can do memory transfers really really fast!"
Unfortunately, memory transfers are not the world when it comes to large multiprocessor boxes. The overhead comes in when you're trying to synchronize a large number of threads/CPUs to do a large task. For example, an Oracle database.
Sun has proven that it scales up the tree very well with large numbers of processors. But from my understanding, Linux is more efficient with a low processor count, and less and less efficient with more processors.
I question its ability to do anything with a real workload. And I've even more suspicious because they use a benchmark I've never heard of (STREAM TRIAD) to push its superiority on a single-aspect synthetic benchmark.
Good. The machine looks like it has a decent memory bus, and memory modules with a good configuration and speed rating. Now, what can the machine actually do well that makes it a real winner?
Anybody else see that as the main reason this is running Linux instead of Irix?
SGI started working on porting IRIX to the IA-64 architecture back in (I think it was) 1995 or 1996. Not long after, they found that it would be easier and cheaper to get Linux to scale more efficiently and to port some key libraries and services from IRIX than it would be to port all of IRIX over to the new architecture.
It's all about time and money.
It's not clear to what extent application programs have to be aware of this. Clearly, if you lay things out in memory badly, with lots of CPUs reading and writing the same memory from all over the memory net, the system will bottleneck.
Speaking as somebody who's done his share of IRIX programming, I'd say "none at all."
In some cases, on Origin 2000 hardware with older versions of IRIX, you could see notable performance differences if you went out of your way to place memory in banks adjacent to the running processors. But the Origin 3000 architecture, with its significant reductions in memory latency, and newer versions of IRIX, with their improved page replication algorithms, have made manual memory placement almost obsolete. Almost.
SGI spent a lot of time and trouble trying to reduce the impact of accessing remote memory. The caching mechanisms and page replication stuff are really well thought-out.