SGI Demos 64-Proc Linux Box
foobar104 writes "Details are scarce, but SGI announced this morning that their prototype Itanium 2 system has demonstrated more than 120 GB/s to and from main memory on the STREAM TRIAD benchmark, which is the fourth best result in the world. For comparison, the Cray C90 sustains 105 GB/s, while an even larger Sun Fire 15K clocks a measly 55 GB/s. The interesting part? The system wasn't running IRIX, SGI's proprietary version of UNIX. It was running Linux. More information on STREAM TRIAD, including results from other systems, is available here. The system, incidentally, was an Origin 3800 straight out of manufacturing equipped with Itanium 2 processor modules. SGI will start selling the systems early next year."
Quake3 timedemo scores? Just how many frames/sec can that bad boy generate, even in software mode?
To me, it would seem that the primary purpose of being able to push info that fast to and from memory is useful for very few problems these days. I was under the impression that the majority of "super-computing" problems were of the sort that required lots of calculations, not lots of parsing of information in storage.
Am I wrong about what this benchmark means? Or am I missing something basic?
Nuff said...
Imagine a Beowulf cluster of these. . .
[ducks quickly]
Like any good press writeup, it lacks any details that are useful to techies. I want to see a dmesg from this thing, as well as pretty pictures of what's under the hood.
There is no reasonable defense against an idiot with an agenda
:wq
That was my first though. So it beats a C90, but what is faster?
Found the answer here.
And if you were wondering about a Beowolf cluster of these, the top ten ranking excludes "cluster results".
We don't even have Itanium 1 products on the CompUSA shelves.
When I worked at IEG (no longer in biz, but at one time THE biggest Internet porn producer on the planet), we used 8 SGI boxes to dish it out over 2 OC3 lines. Of course the real reason they bought them was for the Quake server...
Linux running at 120 GB/s with 64 processors is impressive for an OS that has been criticized as inefficient when running on more than 8.
I would be very interested to know what version of the kernel they are using.
They also mention the SV1, which is a "low-end" Cray. I'm curious how the new X1 (nee SV2) does on the STREAM suite.
It's good to see that their "scalable linux" work seems to be doing pretty well! I'm sure it was much easier for them to use the IA-64 port of Linux than to port IRIX...
Hmm. I wonder if I can parallellize the app I work on enough to use all those 64 processors? I know my bosses would wet themselves if I did. Of course, I am mainly disk bound. Anyone got a disk system to match?
Stop the brainwash
Does the current 2.4.x series kernel scale to 64 procs effectively, or are they using some "enterprise patch" to fine tune for this particular hardware? I was under the impression that since most kernel developers don't have access to this kind of ultra-high end hardware that Linux isn't really optimized for it. Correct me if I'm wrong.
A musician without the RIAA, is like a fish without a bicycle.
SGI make loads of 64 processor machines. And I believe Linux runs fine on multprocessor MIPS 14000s.
Mouse powered Chips, Open source Processors and Lego
That's not funny, it is redundant, and has been for a couple of years. But so is this...
Why can't it run Windows XP?
Ow!... ow, ow, ow, OW! stop throwing rock at me!
Ok so it was a bad joke....
Do not look at laser with remaining good eye.
+1 funny. Seriously.
If we could work together (plus Mr Perens who is currently looking for a good cause to lead) we could take the demo to greater heights.
What is to say that the demo's code isn't buggy and shoddy, holding the power Itanium processors back?
If we realize the vast potential that the Open Source developer community provides then we can tackle such complex tasks as this Itanium performance measurement.
Wearing pants should always be optional.
This sounds very cool, but I would really like more info than this. Plus, it isn't going to be released until next year. Within that time frame there will be the usual delays and then final release to a couple customers. Don't get me wrong, I think this is cool. Especially the linux part. This could go a long way to helping Linux scale better on massive machines.
The second thought is: can it be partitioned? This is a rather big machine and goes against the trend I have witnessed to use many smaller machines to accomplish your goal. I'll have to ask some of the guys at Oracle if they've looked at Linux installs of this size, but as far as I know they only make x86 ports right now. So, I wonder what linux apps would someone run on a system this big? (I know. Insert obligatory Quake, Beowolf and porn server reference here.)
Disclaimer: I work for an SGI competitor. But I have personally installed Linux on every piece of harware I can get my hands on. Just to play usually, but still. They just pay my mortgage.
_damnit_
It's my job to freeze you. -- Logan's Run
I saw a few comments along the lines of "wowee, powerful!". I'm just curious what somebody'd want with a machine that powerful.
Me, personally, I do lotsa 3D stuff and would love to see what it'd take to bring that machine to it's knees. However, I get the impression I'm but of a few 3D dudes here. So what would you non-3D dudes wanna do with it?
from Maddog:
:)
"For those applications that need to scale, SGI has just proven that Linux need not be synonymous with clutter."
cluster? or clutter? a good cluster is not cluttered
-- Who is the bigger fool? The fool or the fool who follows him? --
Step 1. Build massively parallel Beowulf system of SGI supercomputers
Step 2. ?????
Step 3. PROFIT!!!
to meet the system requirements for Doom III.
-R
Stuff that matters: circuitbreakers, vacuum-cleaners coffee makers, calculators generators, matching salt+pepper shakers
....but seriously what are the applications for boxes like these. I mean - other than uses for lawrence livermore labs etc... big ass iron like this seems to only really be useful for 1. nuclear modelling 2. benchmark testing press releases.
I know that someone somewhere is going to use a box like this - but tell me for what real world application will you use it. (serious question - curious. I want to know the reall apps these are used for)
I don't think he means 1 for every workstation and server. Instead, I think he means JUST 1 of these to act as every workstation and server. Kind of like the first main frames, with them in one room, and have access points around the building (Go go punch cards!)
It's not surprising that the SGI machine runs STREAM well. Back in the mid-1990's, John McCalpin, who worked for SGI at that time, was a regular contributor to comp.sys.super, and he would frequently brag about the superiority of SGI running STREAM. McCalpin is one of the primary advocates for STREAM. You can optimize a computer architecture to run a particular benchmark well. The question is whether the SGI machine runs a wider variety of real-world problems well.
http://ask.slashdot.org/article.pl?sid=01/10/10/15 49223
0 9
/ 1252218
http://slashdot.org/article.pl?sid=99/06/27/13462
href="http://slashdot.org/article.pl?sid=01/02/09
Correct: Run Win32 applications.
Nobody will buy IA64/Windows-machines because there is no Win64 software and nobody will make Win64 software because nobody buys IA64/Windows machines...
EXPENSIVE
Well, hopefully they won't be too bad. Carmack has said they have it "running" on Xbox.
... weather modeling, for one. Here in the US, NCEP (National Centers for Environmental Prediction) runs all the forecast weather models on an IBM-SP (used to run on a Cray C90, I think). In Europe, the ECMWF model is run on a Fujitsu supercomputer, I think.
Models for plasma dynamics and astrophysics are also run on these heavy-duty machines. I'm sure others have had some experience running other things, but I know that the NCEP IBM-SP gets a workout at least 2 times a day running at least four different weather models that have average runtimes around an hour each.
-Jellisky
This sure would run a select statement on a database of all of our info pretty damn fast. but, who would believe we'd ever adopt any kind of national id, you know, like drivers licenses, social security cards, membership cards at grocery stores, etc.
What are we going to do tonight Brain?
Anybody else see that as the main reason this is running Linux instead of Irix? There's already a lot of development towards that that they can take advantage of. It just makes more sense then porting over another OS on your own.
My fifth of a dime...
...and it's called 64 CPUs.
Perhaps they should update the song
That said, it's an impressive result. And it's done in an unusual way. SGI has a 1.6GB/s channel running through routers connecting the processors and memory. A computer is made up of multiple rackmount "bricks" connected by cables and routers. The "router" is a 2U rackmount device.
Processors and memory reside in rackmount boxes with 4 CPUs and 8 GB (max) of local memory. These boxes interconnect through a single 1.6GB/s link per box, which, in a big system, goes through several layers of routers. So a memory access to another box is routed through what is essentially a fast LAN. All this is cached, of course.
It's not clear to what extent application programs have to be aware of this. Clearly, if you lay things out in memory badly, with lots of CPUs reading and writing the same memory from all over the memory net, the system will bottleneck. (Everybody reading the same stuff is OK; it's cached. But writes have to propagate back to the home location of the data.)
Since the whole monster crashes all at once, you don't want to build your web server farm this way. It's for applications that really need all that crunch power in one machine.
I think it is pretty interesting that the benchmark that they used measured memory throughput, as opposed to, say, an actual workload. In other words, this is a synthetic benchmark, versus a real-world benchmark. They say, "Look! We can do memory transfers really really fast!"
Unfortunately, memory transfers are not the world when it comes to large multiprocessor boxes. The overhead comes in when you're trying to synchronize a large number of threads/CPUs to do a large task. For example, an Oracle database.
Sun has proven that it scales up the tree very well with large numbers of processors. But from my understanding, Linux is more efficient with a low processor count, and less and less efficient with more processors.
I question its ability to do anything with a real workload. And I've even more suspicious because they use a benchmark I've never heard of (STREAM TRIAD) to push its superiority on a single-aspect synthetic benchmark.
Good. The machine looks like it has a decent memory bus, and memory modules with a good configuration and speed rating. Now, what can the machine actually do well that makes it a real winner?
A lot of simulation problems in science and engineering (FEA, weather prediction, computational fluid dynamics, electromagnetics, etc) reduce to inverting a *very, very large* matrix for the solution. You may have inverted 3x3 matricies by hand in school, but these matricies might be many millions x many millions in size. Depending on the algorithm, this means *huge* blocks of data are being swapped around.
Actually, it's precisely because of lack of superfast mem-IO machines that many people tried to work around the problem and create algorithms that are CPU-bound.
In fact, most of the computationally-intensive problems require LOTS of mem-IO.
And there's one more thing: there's a huge difference between the 64-CPU SGI machine, and a Mosix cluster of 64 1-CPU nodes: the SGI has one single memory space contiguous on the same machine. That means you can actually use a very large matrix to process your data, instead of shoving bits of it over the network back and forth.
There are entire classes of problems that will be solved orders of magnitude faster on the SGI server than on a network-distributed Mosix cluster (or any other kind of cluster, Beowulf, etc.). That's the advantage of true SMP systems (all CPUs on the same hardware) as opposed to networked clusters.
This new system is news, but it's hardly groundbreaking news. Back in '99, SGI spun off MIPS and announced they would do commodity systems -- including supercomputers with commodity processors. At that they had a choice: port IRIX to the Itanium, or teach Linux to scale so they could use it on their supercomputers. It's been no secret that they chose the latter. Or why: it was less expensive, and catered to an established user community.
Note that Itanium/Linux systems are not meant to replace MIPS/Irix systems. Unless they've changed their strategy since I worked there, SGI plans to keep developing Irix systems for another 10 years, at least. Of course, that depends on maintaining loyalty to Irix solutions, and the buzz is that they're having trouble with that.
Hey, it's only money! I'll take two! Call me with pricing info: 323-856-6322
The whole point with the SGI supercomputers (there are Origin servers running Irix on 1024 processors) is that there's one single copy of the OS running across all those CPUs, and the entire memory is available to all CPUs on the same piece of hardware. That means, any CPU can access any piece of information at the speed of mem-IO, and you can easily create a large matrix (think many tens or hundreds of GB) to keep all your data in one piece.
Networked clusters (Mosix, Beowulf) split the CPU bunch across the network, and the memory is split too. That means there's a huge latency when a CPU wants to access data that happens to be on a different node on the network: the network latency is many times larger than memory latency.
There are problems that simply cannot be solved on networked clusters, precisely because of network latency. While true supercomputers (all CPUs on the same machine) do not have this limitation.
Well, ok, so you can split the matrix across nodes in a Beowulf, but even if you have the same CPU power as the SGI supercomp, you're going to solve the problem several times slower (if not several orders of magnitude slower). Such is the importance of latency.
This is why there's no point in clusterising this kind of computers: you lose their biggest advantage: single OS copy, all memory on the same machine.
The Cray C90 came out like in 1990 or 1991, and this new fangled SGI box just barely beats it? wow!
That's funny? It's not even the first beowulf cluster comment on this story. jeez.
Damn. I checked before I posted, but someone submited after I checked and before I posted. Wouldn't have done it if I knew it wouldn't be the first.
Yeah, it's dumb anyway. But I posted it from work so I got paid for it.
It got quickly modded down after +1 Funny. Later got a +1 Underrated but the moderators drug it back down quickly. I'm in no danger of getting above- or below-average karma anytime soon.
Ooh, just got another +1 Funny on it. But I bet it will be modded down again in another minute.
Just to put something sort of on-topic, software OpenGL *could* be fast enough with this kind of memory bandwidtch if the video bandwidth was also high and the processor were fast enough. "Hardware" OpenGL is just a specialized CPU with specialized firmware and high-speed pathways from video memory to the CRT. But I doubt this machine was designed for Quake III or Doom III.
Actually, I would rather imagine a cluster of people with baseball bats chasing down people who post "Imagine a Beowulf cluster of these" to slashdot and beating some sense into them. ;-)
I Am My Own Worst Enemy
"Through its experience and expertise in high-performance computing, SGI will offer customers of the highest quality 64-bit operating environments."
Well, Hmph! The rest of us low-life customers wouldn't want it anyways!
SGI never thought to replace Irix with Windows! That's ridiculous. :-)
Irix can scale up to 1024 CPUs and beyond. Solaris can scale up to 100. Here's Linux, now it's scaling close to 100. How much to you think Windows can scale? 10 CPUs? 20?
SGI's thing was always that it had machines running one single copy of the OS across hundreds (or thousands) of CPUs on the same machine (not in a cluster). You simply cannot do that with Windows, period.
They had some graphics workstations running Windows, but that was on the lowest end of things, and now those systems are not available anymore.
Large machines like this are suited to massively parallel tasks, such as seismic data processing, time to dept inversion, etc, on massive quantities of data.
large jobs can typically still take days to complete, and if an incorrect parameter is entered, then the job may have to be re-run.... so speed is important.
These jobs typically do not require extremely fast disk, just alot of number crunching capabilities.
how many FPS it could do in Quake III
Comment removed based on user account deletion
Comment removed based on user account deletion
I am the proud owner of a SGI Indy running Debian-mips, so I keep a close eye on SGI's linux work. The oss.sgi.com website hadn't been updated for over a year (linux section) and I was worried that they'd kill it. Still, it doesn't seem like they're going to be doing any more work on the ip22 MIPS port. (the one that you and I can actually afford the hardware for.)
"A language that doesn't affect the way you think about programming, is not worth knowing" - Alan Perlis
SGI started a downward spiral when it moved to PCs and linux. If they had picked FreeBSD, they might have had a chance.
It's got nothing to do with the number of processors that GNU/Linux can run. This has to do with the fact that this machine scales _linearly_, meaning you add more CPU's and you get the # of CPU's worth of processing power. Solaris and AIX scale beyond 32 processors as well, but in a non-linear fashion. It's an accomplishment that SGI has Linux scaling linearly, not that GNU/Linux scales to 64p
Chess Computer! Run chess software on this and take advantage of the low memory latency. Hey, did anyone notice that this is a 64 cpu beast, as in 8x8, 64 squares on a chess board?
You enjoy much the trolling in similar ways!
It is a shame SGI can get all the press whenever they mention Linux, but when they release innovative products that don't have the word "Linux" attached they largely go unnoticed. I admin a Origin 3800 or two and they are definitly nice boxes. SGI should have more of Sun's customers, their boxes are superior. If only they knew how to market... (And they need to ditch the duck, and ditch Linux. Linux people are poor and run crappy hardware. They get all excited and joyful over their peiced together miserable OS. FreeBSD is nice. IRIX is nice. Linux is for the 14 year olds!) And no, you don't run Doom III on an Origin. It isn't a joke, it isn't funny. You run applications that require large amounts of I/O. Weather modeling and non-clustered porn sites. THE DUCK WILL DIE!
Southeastern Virginia REPRESENT!
Make bigger machines and people will just make bigger problems. Don't laugh or groan, someone WILL come up with some application that could exploit a couple hundred of these monsters.
A Pirate and a Puritan look the same on a balance sheet.
Intel must be pleased. If SGI could manage to sell one of these that would double the number of Itaniums that Intel has managed to flog.
BSD can barely run on 2 CPUs. Linux is lightyears ahead of it.
Comment removed based on user account deletion
In any case, the poster made it sound like you can just plug Itanium 2's into an Origin 3000 and *bang* you've got a Linux system which is not correct.
Go Badgers! -- #include "std/disclaimer.h"
Imagine a Beowolf Cluster of THESE!!!
Just out of curiousity, and on an only remotely related note, how many people here have ever actually seen a beowulf cluster? I think they're like nerd unicorns.
There should be a slashdot "whack-a-mole" (whack-a-troll?) game where nerds pop up instead of moles and say things like Imagine a Beowulf cluster of these, and Yeah, but can you run linux on it.
:) "I promote Linux and Ogg Vorbis but use Windows and WMP for my MP3's. But I'm leet, really." <whack!>
There already is. It's here. But you get at least three hammers: the "-1 Troll", "-1 Flamebait" and "-1 Overrated" hammers. You have to sign up and then watch for a few weeks, however, before you can use the hammers, or at least that's how I understand it.
Just out of curiousity, and on an only remotely related note, how many people here have ever actually seen a beowulf cluster? I think they're like nerd unicorns.
I went to the Beowulf site once but didn't inhale^H^H^H^H^H^Hdownload.
Maybe you're right. Maybe The Beowulf Cluster is geekdom's response to all the broken flying car promises of Popular Mechanics.
On second thought, no one here's seen one because it doesn't improve Quake III fps and doesn't run on Windows.
Doesn't SGI own Cray? (At least until recently?)
If you read the release they sneak this in:
"This result, derived from initial internal testing, marks a significant milestone"
This is only a derived milestone!
I've seen one. We installed a small cluster of 3 P166MMX nodes as a course project for my University (Computer Science) while studying Computer Networks. It was a test cluster, in order to implement a bigger one over our laboratories (more than 200 computers running RedHat). The result was that a cluster is really useless, unless your network is REALLY fast (the 3 nodes migrated processes only if they were connected over a 100mb network, with a 10mb network they didn't migrate anything, due to the high cost of migration). Then we realized that, if we wanted to implement a cluster over those three laboratories we should have doubled our internal network, to sustain the high network traffic (believe me: you have never seen "network load" unless you have seen a cluster), so we will probably buy a small bunch of Athlon XPs and we will deploy them as dedicated cluster machines.
And this is only a matter of latency.
Just cleaning my nerdish horn.
I'm fat, you're ugly. I can get slimmer, and you?
is concerned. :)
Linux's _kernel_ is years behind Solaris and AIX in scalability. What the benchmark shows is that if you had an application where many processes access memory concurrently (not using OS services), the SGI machine may be a good choice. As far as a memory bandwidth benchmark is concerned, you could have any multiprocessor OS on that and get the same performance, even with Big Kernel Lock architectures.
So it seems to me that using Linux was just a way to bootstrap the machine cheaply (yes, I know SGI had invested a lot in Linux (however superior IRIX may be), and this is probably where it's paying off). Good way to get the geeks closer to management excited
As long as it's less than 2500 GBP a CPU (inc chassis); welcome to my new renderfarm.
Everything in one box. Schweet. I'll take 5.
Of course you can guarantee on SGI pricing. Will they have learn't from the 230/320/540 saga?
So maybe I'll stick to 100 dual CPU 1u boxes. Easier to upgrade I suppose as well. I.E. You won't have to change the whole box; you could phase upgrades in.