Slashdot Mirror


SGI to Scale Linux Across 1024 CPUs

im333mfg writes "ComputerWorld has an article up about an upcoming SGI Machine, being built for the National Center for Supercomputing Applications, "that will run a single Linux operating system image across 1,024 Intel Corp. Itanium 2 processors and 3TB of shared memory.""

19 of 360 comments (clear)

  1. Re:Solaris by mrm677 · · Score: 4, Interesting

    It seems that if they pull this off one of the dtrongholds of solaris (namely massivly parralell computing) will have been conqurered by linux. I wonder how sun are feeling at the moment?

    Solaris scales to hundreds of processors out-of-the-box. Until the vanilla Linux kernel accepts these changes and scale, Solaris still has a big edge in this area.

    Lame analogy: many people have demonstrated that they can hack their Honda Civic to outperform a Corvette, however I can walk into a dealership and purchase the latter which performs quite well without mods.

  2. Similar software available? by Pierce · · Score: 2, Interesting

    With the exception of the NUMA stuff, is there software available to re-create this? I'm not even sure what to search for; would this still be considered a "cluster"?

  3. from MPI to multithreaded ? by InodoroPereyra · · Score: 3, Interesting
    From the article:
    Earlier cluster supercomputers at the NCSA used multiple images of the Linux operating system -- one for each node -- along with dedicated memory allocations for each CPU. What makes this system more powerful for researchers is that all of the memory will be available for the applications and calculations, helping to speed and refine the work being done, Pennington said.

    "The users get one memory image they have to deal with," he said. "This makes programming much easier, and we expect it to give better performance as well."

    So, anyone has any insights as to why/how this matters for the programmers ? Does this mean that the applications running on the "old" clusters, presumably using some flavor of MPI to communicate between nodes, will have to be ported somehow to become multithreaded applications ? Or maybe they will still run using MPI on the big shared memory pool, and each process will be sent to the appropriate node by the OS on demand ? Thanks !
  4. Re:What happened to RISC? by Epistax · · Score: 4, Interesting

    RISC and CISC offer no final advantage over the other, so the one that dominated is the one that was here first.

    Quick examples: RISC use less power because it has less logic? No, it needs to run at a higher frequency to maintain the same speed as a slower CISC.
    RISC is easier to program? Depends on the person. A compiler can take advantage of large instructions very well which are hardware optimized.
    RISC easier to develop/manage? I'll say yes for RISC on this one. There's simply less logic on the chip so less logical errors possible. There's plenty more cache which can break but broken parts can be fused off.
    RISC is physically smaller? No. RISC needs a higher clock frequency because many more instructions need to be executed. The result of this is that a much larger instruction cache is needed on chip.

    I don't remember every comparison but it pretty much comes out that neither is better than the other. That being said RISC is better than x86. Everything is better than x86. However CISC vs RISC is much harder to judge. Having done x86, 68k, and MIPS I must say that RISC is a pleasure.

  5. Re:Solaris by kasperd · · Score: 5, Interesting

    Until the vanilla Linux kernel accepts these changes and scale, Solaris still has a big edge in this area.

    I wouldn't be surprised to see these changes in the 2.8 kernel. And what will people do until then I hear some people ask. I can tell you that right now it is very few people that actually have the need to scale to 1024 CPUs. And that will probably also be true by the time Linux 2.8.0 is released. AFAIK Linux 2.6 does scale well to 128 CPUs, but I don't have hardware to test it, neither does any of my friends. So I'd say there is no need for a rush to get this in mainstream, the few people that need this can patch their kernels. My guess is that in the time from now until 2.8.0 is released, we will see less than 1000 such machines worldwide.

    --

    Do you care about the security of your wireless mouse?
  6. Du-uh by passthecrackpipe · · Score: 1, Interesting

    As everybody that has read the IBM redbooks about mainframe linux knows, Sendmail is the service of choice! Of course, you could run Postfix on a decrepid old pentium-1 and get the same level of perfomance, but that won't help IBM with their Mainframe income, will it?

    --
    People who think they know everything are a great annoyance to those of us who do.
  7. Re:Solaris by Waffle+Iron · · Score: 3, Interesting
    Solaris scales to hundreds of processors out-of-the-box. Until the vanilla Linux kernel accepts these changes and scale, Solaris still has a big edge in this area.

    If someone buys one of these clusters from SGI, then it does scale "out of the box" as far as they're concerned.

  8. Re:Solaris by isorox · · Score: 2, Interesting
    My guess is that in the time from now until 2.8.0 is released, we will see less than 1000 such machines worldwide.

    640 CPUs are enough for anyone? :)


    A better retort would be "There's a world market for maybe 5 computers" by the IBM dude.

    Claims are very difficult to make, and impossible to proove. However putting a time limit on a claim is easy. 2.8.0 will be released in 05 or 06, maybe we'll all have 1024CPU boxes in 20 years, but in 20 months?
  9. Re:Ok by djcapelis · · Score: 1, Interesting

    Other than a little issue called price... I don't think so!

    Alright, pass around the hat.

    --
    I touch computers in naughty places
  10. Re:Sun does more than that by Jeff+DeMaagd · · Score: 3, Interesting

    Hot swapping components sounds great, but what if the screwdriver slips out of the finger of the engineer and causes a short?

    The systems I've seen that have hot-swap PCI cards have plastic partitions between the slots to prevent the cards from touching each other when hot swapping them.

    I'm not sure why the hypothetical screwdriver in such a tech's hands. Many systems have non-screw means of retaining memory, PCI cards, CPUs and such.

  11. Re:What happened to RISC? by Anonymous Coward · · Score: 1, Interesting

    Once again people are missing the fundamental nature of RISC programming vs. CISC programming. RISC architectures are very much "load-store" machines, where you load data into its registers, operate on it in fairly complex ways, and then store the results. With CISC chips, your operations tend to take the form of modifying or fetching operands or results that are in memory.

    The fears of RISC instruction bloat are unfounded: the instructions are going to be in L1 i-cache 99% of the time, and won't slow anything down.

    What shorter/simpler instructions enable is much smaller pipelines. My G4 does a fused mulitply-ad op in 7 stages, a P4 does it in 2 passes through a 20 stage pipeline (40 cycles, since the result of the mult isn't availible until the end.) The P4 pipeline has to fetch operands from somewhere on the stack and write them back. This means CISC cpu's are more prone to memory-bottleknecking in worst-case scenarios (of course, in most cases, the working data set for both archs will be in L1.)

    In conclusion, CISC vs. RISC is EASY to tell apart: if its operating on data in registers and memory simultaneously, its CISC. If its loading the working data into an expansive register set, operating on it locally, and then storing it back, its RISC.

  12. Re:Sun does more than that by ddmau · · Score: 2, Interesting

    Yes... I was an engineer for SGI for over 16 years (laid off about a year ago)....I have hot-swapped modules on a running system many many times without problems. With the Numaflex archetechure, you have "modules" that house a seperate set of CPUs, memmory, power supply, etc. You shut the offending module down (after the OS has migrated all process's off of it-on the fly). After parts are replacesd, you run a diagnostic off of your laptop/terminal, and bring the module back into the system (OS "sees" the change on the fly and re-integrates the module). It works extreamly well.

  13. Re:Sun and/or IBM zseries hardware by Anonymous Coward · · Score: 1, Interesting

    Anything you want.

    For instance were I work we have a older s/390 mainframe that runs a database.

    We have 1. Win2000 server running IIS web server and MS SQl that is used online to form Queries automagicly for the mainframe stuff for our customers. 2. We have a Linux based firewall 2. other Linux servers 3. routers 4. networks 5. numerious other insudry Linux machines.

    All this could be replaced by Linux running in a single partition in the mainframe. All the network, all the server.

    So don't be a dipshit. Obviously there is reasons for running linux in a Mainframe, especially WHEN YOU ALREADY OWN ONE FOR DOING SOMETHING ELSE.

    Now ZSeries isn't just a mainframe. It makes a great server. There are different pricing levels, different setups.

    Now go find a big Corporate Windows server farm (rarer then you'd think) now look at the hundreds of Windows servers, Hundreds of support personal, experts, then the rest of the A+ certified service geeks.

    Now delete all that, replace it with one server, running various things in it's many partitions. It's run by 2 admins and some assistants.

    It will be faster, more reliable, and probably much cheaper. However the benifits go far beyond just elimating hundreds of redundant personals, and dozens of high maintainance PC servers running a unreliable OS, you have something that is easy to deal with supported by a company that will bend over backwards for you, instead of being beholdent to the assholes in MS.

    NOW if you don't end up liking it, then you could move to solaris, or run a Server clusters of Linux PCs. And since your already running Linux, moving to any other Unix platform running any other hardware, or running Linux on commodity hardware, is much much easier then migrating from Windows in the first place.

  14. Re:Sun and/or IBM zseries hardware by flok · · Score: 2, Interesting

    What happens if both pipelines make the same mistake because the L1-cache feeds them both the same corrupted data?

    --

    www.vanheusden.com - home of Multitail, HTTPing, CoffeeSaint, EntropyBroker, rsstail, bsod, listener, nagcon, nagi
  15. Re:Sun does more than that, but SGI always has by kscguru · · Score: 1, Interesting
    Sun very well could support that many CPUs. Sun just doesn't sell hardware that has that many (and therefor won't claim to support that many) - mainly because that kind of hardware is so expensive as to make SPARC look cheap!

    My opinion is that Linux on a 1024-way is a spectacularly stupid idea, introduced more for the sexiness of having a 1024-way machine than for any practical benefits. Linux is simply not designed for scaling that large. And there is a huge difference between an OS designed to scale that large, and an OS hacked up to support something that large, without actually making the appropriate design choices. SGI may know about those choices (and probably better than Sun), but I highly doubt they'd throw them into a GPLed Linux kernel - they still want to sell their own version of Unix!

    I expect (yes, a wild pie-in-the-sky guess) that the advantage of a 1024-way machine over a 512-way machine, both running Linux, is going to be maybe 20-30% performance, far from the 100% the numbers might claim or the 70-80% that might be tolarable. For a supercomputer where that 20-30% is irrelevant because no other machine can crunch the data, cool; for everyone else, two 512-ways running unconnected will be better, cheaper, and faster. [At least, until Linux can scale that large... maybe in 5 years or so?]

    --

    A witty [sig] proves nothing. --Voltaire

  16. Not likely - Same Machine for $1k in 14 years. by DanielJH · · Score: 2, Interesting

    The point here is that if performance continues to grow like it is today, they will be selling these machines for $1,000 at Walmart in just 14 years. It will be about the same size as the computer you own now.

    The problem with 1024CPU is much more then just the operating system. It is a mess of communication hardware needed to wire everything together. It is about special power feeds and air conditioning, and sometimes floor loading requirements.

    Take a quick look at the end of this PDF. It talks about heat output and the need for 3 phase 240V power coming into this computer. It is not unusual to hire both an electricial and a cooling expert when you talk about installing one of these babies. Not for the Home user, and never will be, however, idential compute power comming in just 14 years, so get ready...

    1. Re:Not likely - Same Machine for $1k in 14 years. by isorox · · Score: 2, Interesting

      Indeed, we're implementing a 24 bay system at the moment, in a brand new apps room off one of our current ones (which happens to have about 100 bays, most of the overflowing), so, yes, power is a problem, and cooling doubly so. (One apps room is currently responsible for two 24 hour tv channels and barely has a backup AC unit (it may work if we shut down some of the less-essential equipment).

  17. Location, Location... by jaybird144 · · Score: 1, Interesting

    I wonder where it will be housed...NCSA's new building isn't complete yet. And it doesn't seem like they would install it only to move it a few months later, does it?

  18. Re:Sun does more than that by dsouth · · Score: 2, Interesting
    I don't have a URL, but was involved in several HPC procurements at the time (and knew some insiders at SGI and Cray). The poster is basically correct. The sequence of events was:
    • SGI (at the time still called Silicon Graphics Inc) purchased Cray Research.
    • Well before the purchase, Cray had a hand in developing and marketing Suns larger machine, the "Super Dragon", sold by Sun as the 64SC, and referred to within by Cray as the 64CS -- I'm probably messing up the number, but I do recall the difference in the letter ordering. :-)
    • Prior to the purchase, Cray had completed the design for a new shared memory system based on a high speed switch and single image OS.
    • Prior to the purchase SGI had already completed the design of the first NUMAflex systems, the Origin2000 and Onyx2.
    • So after the purchase, the new merged Cray/SGI had two large SMP/NUMA systems, the Origin line and the Cray developed line. Since they didn't need two, they sold the Cray design to Sun, where it was marketed as the E10000. They also called they NUMA fabric on the Origin2K "CrayLink" even though Cray had little or nothing to do with its design.
    • For a few years afterwards, there were a few within Cray CF (Chippewa Falls) that were somewhat bitter about SGI's decision to pawn off the E10000 design, pointing out repeatedly that Sun was selling plenty of E10000s...
    If it matters, the HPC procurement I was involved in opted for the SGI, which was probably the correct decision. As unstable as the SGI hardware was, the sites I knew running E10000's for general HPC loads had far worse stability problems (though the E10K's undoubtedly better at running Oracle).

    As far as the E10000 being NUMA or SMP -- depends on how you look at it. The Origin line used a bristled hypercube interconnect topology, so memory on the same node as a CPU was one hop thru the fabric, memory on another node connected to the same router was three hops, on a distant node might be multiple routre hops. The E10K (and I think the E15K) used a star topology where memory was ether on the same bus as the CPU or was on another bus that had to go through the switch. So the Sun has basically two levels of memory latency, whereas the SGI could have many levels. The SGI is definitely NUMA, the Sun is either SMP or "slightly NUMA", or however you want to parse it.

    If you've never seen it, the tech papers on how the SGI NUMA systems work are worth reading. Build a fast 8-port crossbar chip (the "spyder chip"), then use it to glue CPUs, memory, and peripherals together. Keep a couple ports open, and you can glue the crossbars together in a fabric. Presto, you can now build a system with 200 CPUs or 100 PCI busses. Pretty cool, even if it was expensive, proprietary, and all the rest.