SGI Demos 64-Proc Linux Box
foobar104 writes "Details are scarce, but SGI announced this morning that their prototype Itanium 2 system has demonstrated more than 120 GB/s to and from main memory on the STREAM TRIAD benchmark, which is the fourth best result in the world. For comparison, the Cray C90 sustains 105 GB/s, while an even larger Sun Fire 15K clocks a measly 55 GB/s. The interesting part? The system wasn't running IRIX, SGI's proprietary version of UNIX. It was running Linux. More information on STREAM TRIAD, including results from other systems, is available here. The system, incidentally, was an Origin 3800 straight out of manufacturing equipped with Itanium 2 processor modules. SGI will start selling the systems early next year."
To me, it would seem that the primary purpose of being able to push info that fast to and from memory is useful for very few problems these days. I was under the impression that the majority of "super-computing" problems were of the sort that required lots of calculations, not lots of parsing of information in storage.
Am I wrong about what this benchmark means? Or am I missing something basic?
Like any good press writeup, it lacks any details that are useful to techies. I want to see a dmesg from this thing, as well as pretty pictures of what's under the hood.
There is no reasonable defense against an idiot with an agenda
:wq
That was my first though. So it beats a C90, but what is faster?
Found the answer here.
And if you were wondering about a Beowolf cluster of these, the top ten ranking excludes "cluster results".
Linux running at 120 GB/s with 64 processors is impressive for an OS that has been criticized as inefficient when running on more than 8.
I would be very interested to know what version of the kernel they are using.
They also mention the SV1, which is a "low-end" Cray. I'm curious how the new X1 (nee SV2) does on the STREAM suite.
It's good to see that their "scalable linux" work seems to be doing pretty well! I'm sure it was much easier for them to use the IA-64 port of Linux than to port IRIX...
Hmm. I wonder if I can parallellize the app I work on enough to use all those 64 processors? I know my bosses would wet themselves if I did. Of course, I am mainly disk bound. Anyone got a disk system to match?
Stop the brainwash
Does the current 2.4.x series kernel scale to 64 procs effectively, or are they using some "enterprise patch" to fine tune for this particular hardware? I was under the impression that since most kernel developers don't have access to this kind of ultra-high end hardware that Linux isn't really optimized for it. Correct me if I'm wrong.
A musician without the RIAA, is like a fish without a bicycle.
SGI make loads of 64 processor machines. And I believe Linux runs fine on multprocessor MIPS 14000s.
Mouse powered Chips, Open source Processors and Lego
Why can't it run Windows XP?
Ow!... ow, ow, ow, OW! stop throwing rock at me!
Ok so it was a bad joke....
Do not look at laser with remaining good eye.
If we could work together (plus Mr Perens who is currently looking for a good cause to lead) we could take the demo to greater heights.
What is to say that the demo's code isn't buggy and shoddy, holding the power Itanium processors back?
If we realize the vast potential that the Open Source developer community provides then we can tackle such complex tasks as this Itanium performance measurement.
Wearing pants should always be optional.
This sounds very cool, but I would really like more info than this. Plus, it isn't going to be released until next year. Within that time frame there will be the usual delays and then final release to a couple customers. Don't get me wrong, I think this is cool. Especially the linux part. This could go a long way to helping Linux scale better on massive machines.
The second thought is: can it be partitioned? This is a rather big machine and goes against the trend I have witnessed to use many smaller machines to accomplish your goal. I'll have to ask some of the guys at Oracle if they've looked at Linux installs of this size, but as far as I know they only make x86 ports right now. So, I wonder what linux apps would someone run on a system this big? (I know. Insert obligatory Quake, Beowolf and porn server reference here.)
Disclaimer: I work for an SGI competitor. But I have personally installed Linux on every piece of harware I can get my hands on. Just to play usually, but still. They just pay my mortgage.
_damnit_
It's my job to freeze you. -- Logan's Run
I saw a few comments along the lines of "wowee, powerful!". I'm just curious what somebody'd want with a machine that powerful.
Me, personally, I do lotsa 3D stuff and would love to see what it'd take to bring that machine to it's knees. However, I get the impression I'm but of a few 3D dudes here. So what would you non-3D dudes wanna do with it?
Anyone have an educated guess of what the actual score would be?
Zero. Origin servers don't have graphics cards. Which means, unfortunately, the Slashdot community is going to have to try to wrap its collective head around a more meaningful measurement of potential performance.
from Maddog:
:)
"For those applications that need to scale, SGI has just proven that Linux need not be synonymous with clutter."
cluster? or clutter? a good cluster is not cluttered
-- Who is the bigger fool? The fool or the fool who follows him? --
to meet the system requirements for Doom III.
-R
Stuff that matters: circuitbreakers, vacuum-cleaners coffee makers, calculators generators, matching salt+pepper shakers
....but seriously what are the applications for boxes like these. I mean - other than uses for lawrence livermore labs etc... big ass iron like this seems to only really be useful for 1. nuclear modelling 2. benchmark testing press releases.
I know that someone somewhere is going to use a box like this - but tell me for what real world application will you use it. (serious question - curious. I want to know the reall apps these are used for)
It's not surprising that the SGI machine runs STREAM well. Back in the mid-1990's, John McCalpin, who worked for SGI at that time, was a regular contributor to comp.sys.super, and he would frequently brag about the superiority of SGI running STREAM. McCalpin is one of the primary advocates for STREAM. You can optimize a computer architecture to run a particular benchmark well. The question is whether the SGI machine runs a wider variety of real-world problems well.
http://ask.slashdot.org/article.pl?sid=01/10/10/15 49223
0 9
/ 1252218
http://slashdot.org/article.pl?sid=99/06/27/13462
href="http://slashdot.org/article.pl?sid=01/02/09
EXPENSIVE
Which means, unfortunately, the Slashdot community is going to have to try to wrap its collective head around a more meaningful measurement of potential performance.
So how man LOC's/sec is that?
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
... weather modeling, for one. Here in the US, NCEP (National Centers for Environmental Prediction) runs all the forecast weather models on an IBM-SP (used to run on a Cray C90, I think). In Europe, the ECMWF model is run on a Fujitsu supercomputer, I think.
Models for plasma dynamics and astrophysics are also run on these heavy-duty machines. I'm sure others have had some experience running other things, but I know that the NCEP IBM-SP gets a workout at least 2 times a day running at least four different weather models that have average runtimes around an hour each.
-Jellisky
"Which means, unfortunately, the Slashdot community is going to have to try to wrap its collective head around a more meaningful measurement of potential performance."
Hmm. (rubs chin). Well, there's that ASCII graphics version of Quake for the console- that should work!
graspee
This sure would run a select statement on a database of all of our info pretty damn fast. but, who would believe we'd ever adopt any kind of national id, you know, like drivers licenses, social security cards, membership cards at grocery stores, etc.
What are we going to do tonight Brain?
...and it's called 64 CPUs.
Perhaps they should update the song
That said, it's an impressive result. And it's done in an unusual way. SGI has a 1.6GB/s channel running through routers connecting the processors and memory. A computer is made up of multiple rackmount "bricks" connected by cables and routers. The "router" is a 2U rackmount device.
Processors and memory reside in rackmount boxes with 4 CPUs and 8 GB (max) of local memory. These boxes interconnect through a single 1.6GB/s link per box, which, in a big system, goes through several layers of routers. So a memory access to another box is routed through what is essentially a fast LAN. All this is cached, of course.
It's not clear to what extent application programs have to be aware of this. Clearly, if you lay things out in memory badly, with lots of CPUs reading and writing the same memory from all over the memory net, the system will bottleneck. (Everybody reading the same stuff is OK; it's cached. But writes have to propagate back to the home location of the data.)
Since the whole monster crashes all at once, you don't want to build your web server farm this way. It's for applications that really need all that crunch power in one machine.
I think it is pretty interesting that the benchmark that they used measured memory throughput, as opposed to, say, an actual workload. In other words, this is a synthetic benchmark, versus a real-world benchmark. They say, "Look! We can do memory transfers really really fast!"
Unfortunately, memory transfers are not the world when it comes to large multiprocessor boxes. The overhead comes in when you're trying to synchronize a large number of threads/CPUs to do a large task. For example, an Oracle database.
Sun has proven that it scales up the tree very well with large numbers of processors. But from my understanding, Linux is more efficient with a low processor count, and less and less efficient with more processors.
I question its ability to do anything with a real workload. And I've even more suspicious because they use a benchmark I've never heard of (STREAM TRIAD) to push its superiority on a single-aspect synthetic benchmark.
Good. The machine looks like it has a decent memory bus, and memory modules with a good configuration and speed rating. Now, what can the machine actually do well that makes it a real winner?
"Zero. Origin servers don't have graphics cards. Which means, unfortunately, the Slashdot community is going to have to try to wrap its collective head around a more meaningful measurement of potential performance."
;)
NOOOOOOOOOoooooooooo...
The Kruger Dunning explains most post on
Actually, it's precisely because of lack of superfast mem-IO machines that many people tried to work around the problem and create algorithms that are CPU-bound.
In fact, most of the computationally-intensive problems require LOTS of mem-IO.
And there's one more thing: there's a huge difference between the 64-CPU SGI machine, and a Mosix cluster of 64 1-CPU nodes: the SGI has one single memory space contiguous on the same machine. That means you can actually use a very large matrix to process your data, instead of shoving bits of it over the network back and forth.
There are entire classes of problems that will be solved orders of magnitude faster on the SGI server than on a network-distributed Mosix cluster (or any other kind of cluster, Beowulf, etc.). That's the advantage of true SMP systems (all CPUs on the same hardware) as opposed to networked clusters.
This new system is news, but it's hardly groundbreaking news. Back in '99, SGI spun off MIPS and announced they would do commodity systems -- including supercomputers with commodity processors. At that they had a choice: port IRIX to the Itanium, or teach Linux to scale so they could use it on their supercomputers. It's been no secret that they chose the latter. Or why: it was less expensive, and catered to an established user community.
Note that Itanium/Linux systems are not meant to replace MIPS/Irix systems. Unless they've changed their strategy since I worked there, SGI plans to keep developing Irix systems for another 10 years, at least. Of course, that depends on maintaining loyalty to Irix solutions, and the buzz is that they're having trouble with that.
Anybody else see that as the main reason this is running Linux instead of Irix?
SGI started working on porting IRIX to the IA-64 architecture back in (I think it was) 1995 or 1996. Not long after, they found that it would be easier and cheaper to get Linux to scale more efficiently and to port some key libraries and services from IRIX than it would be to port all of IRIX over to the new architecture.
It's all about time and money.
The whole point with the SGI supercomputers (there are Origin servers running Irix on 1024 processors) is that there's one single copy of the OS running across all those CPUs, and the entire memory is available to all CPUs on the same piece of hardware. That means, any CPU can access any piece of information at the speed of mem-IO, and you can easily create a large matrix (think many tens or hundreds of GB) to keep all your data in one piece.
Networked clusters (Mosix, Beowulf) split the CPU bunch across the network, and the memory is split too. That means there's a huge latency when a CPU wants to access data that happens to be on a different node on the network: the network latency is many times larger than memory latency.
There are problems that simply cannot be solved on networked clusters, precisely because of network latency. While true supercomputers (all CPUs on the same machine) do not have this limitation.
Well, ok, so you can split the matrix across nodes in a Beowulf, but even if you have the same CPU power as the SGI supercomp, you're going to solve the problem several times slower (if not several orders of magnitude slower). Such is the importance of latency.
This is why there's no point in clusterising this kind of computers: you lose their biggest advantage: single OS copy, all memory on the same machine.
The Cray C90 came out like in 1990 or 1991, and this new fangled SGI box just barely beats it? wow!
Actually, I would rather imagine a cluster of people with baseball bats chasing down people who post "Imagine a Beowulf cluster of these" to slashdot and beating some sense into them. ;-)
I Am My Own Worst Enemy
"Through its experience and expertise in high-performance computing, SGI will offer customers of the highest quality 64-bit operating environments."
Well, Hmph! The rest of us low-life customers wouldn't want it anyways!
SGI never thought to replace Irix with Windows! That's ridiculous. :-)
Irix can scale up to 1024 CPUs and beyond. Solaris can scale up to 100. Here's Linux, now it's scaling close to 100. How much to you think Windows can scale? 10 CPUs? 20?
SGI's thing was always that it had machines running one single copy of the OS across hundreds (or thousands) of CPUs on the same machine (not in a cluster). You simply cannot do that with Windows, period.
They had some graphics workstations running Windows, but that was on the lowest end of things, and now those systems are not available anymore.
Comment removed based on user account deletion
i think the point of the question was to see how a software opengl implementation would perform on a 64proc machine.
Comment removed based on user account deletion
Make bigger machines and people will just make bigger problems. Don't laugh or groan, someone WILL come up with some application that could exploit a couple hundred of these monsters.
A Pirate and a Puritan look the same on a balance sheet.
Intel must be pleased. If SGI could manage to sell one of these that would double the number of Itaniums that Intel has managed to flog.
Comment removed based on user account deletion
In any case, the poster made it sound like you can just plug Itanium 2's into an Origin 3000 and *bang* you've got a Linux system which is not correct.
Go Badgers! -- #include "std/disclaimer.h"
Doesn't SGI own Cray? (At least until recently?)