Slashdot Mirror


SGI to Scale Linux Across 1024 CPUs

im333mfg writes "ComputerWorld has an article up about an upcoming SGI Machine, being built for the National Center for Supercomputing Applications, "that will run a single Linux operating system image across 1,024 Intel Corp. Itanium 2 processors and 3TB of shared memory.""

86 of 360 comments (clear)

  1. Whoa! by rylin · · Score: 5, Funny

    Sweet, now we'll be able to run Doom3 at highest detail in *SOFTWARE*-rendering mode!

  2. Ok by CableModemSniper · · Score: 5, Funny

    But does it run--crap. I mean what about a Beowulf--doh!
    Damn you SGI!

    --
    Why not fork?
    1. Re:Ok by biounlogical · · Score: 2, Funny

      In Soviet Russia 1024 Itaniums run a single image of you!
      ha!

    2. Re:Ok by jc42 · · Score: 4, Funny

      Hey, any reason we couldn't build, say, 1024 of these things, and make a beowulf cluster of them?

      --
      Those who do study history are doomed to stand helplessly by while everyone else repeats it.
  3. Longhorn by Anonymous Coward · · Score: 3, Funny

    Yeah, but can it run Longhorn?

    1. Re:Longhorn by arvindn · · Score: 3, Funny

      I hereby nominate this to be the next standard in-joke of slashdot. The previous candidate, evil overlords, never really took off in popularity, leaving us in the pathetic situation that every single bad joke available is soooo 2002! I particularly like "but can it run Longhorn?" because it will be funny until Longhorn is out, which is (hopefully) a long long time from now ;-)

    2. Re:Longhorn by TheScienceKid · · Score: 2, Funny

      You can always rant about Duke Nukem Forever

  4. In other news... by b1t+r0t · · Score: 4, Funny

    Intel's sales figures for Itanic^Hum CPUs more than doubled as a result.

    --

    --
    "Open source is good." - Steve Jobs
    "Open source is evil." - Microsoft
    1. Re:In other news... by levram2 · · Score: 5, Informative

      The limit for Windows Server 2003, Datacenter edition for 64 bit Itaniums is actually 64 processors and 512 GB RAM. http://www.microsoft.com/windowsserver2003/64bit/i pf/datacenter.mspx

    2. Re:In other news... by caluml · · Score: 5, Funny

      We don't care about your actual facts for Windows - here at Slashdot we have FUD, rumour, and downright persistence. I think you will find if you read up on it more closely that 2003 Datacentre can only support up to 2 CPUs, and 256Mb maximum.
      Please stop letting facts get in the way of a good MS bashing session.

      Minister for Dis-Information.

    3. Re:In other news... by killjoe · · Score: 2, Insightful

      Oooh you told him! Way to stick up for MS! They need help from you. They can't counteract FUD by themselves with the billions they spend on advertising, astroturfing, financing lawsuits by SCO, and paying for ADTI studies. Thank god MS has people like to you run to their aid whenever somebody says something bad about windows.

      Still though the fact that linux can scale to 1024 processors while windows can only scale to 64 is enough reason to bash windows isn't it? I mean wasn't bill gates recently bashing linux because it was a "toy" and wouldn't scale?

      --
      evil is as evil does
  5. Solaris by MrWim · · Score: 3, Insightful

    It seems that if they pull this off one of the dtrongholds of solaris (namely massivly parralell computing) will have been conqurered by linux. I wonder how sun are feeling at the moment?

    1. Re:Solaris by justins · · Score: 4, Informative

      Solaris is not a leader in supercomputing, never has been.

      http://top500.org/list/2004/06/

      There's no "stronghold" for Sun to lose.

      --
      Now before I get modded down, I be to remind whoever might read this that what I am saying is FACT. - bogaboga
    2. Re:Solaris by mrm677 · · Score: 4, Interesting

      It seems that if they pull this off one of the dtrongholds of solaris (namely massivly parralell computing) will have been conqurered by linux. I wonder how sun are feeling at the moment?

      Solaris scales to hundreds of processors out-of-the-box. Until the vanilla Linux kernel accepts these changes and scale, Solaris still has a big edge in this area.

      Lame analogy: many people have demonstrated that they can hack their Honda Civic to outperform a Corvette, however I can walk into a dealership and purchase the latter which performs quite well without mods.

    3. Re:Solaris by kasperd · · Score: 5, Interesting

      Until the vanilla Linux kernel accepts these changes and scale, Solaris still has a big edge in this area.

      I wouldn't be surprised to see these changes in the 2.8 kernel. And what will people do until then I hear some people ask. I can tell you that right now it is very few people that actually have the need to scale to 1024 CPUs. And that will probably also be true by the time Linux 2.8.0 is released. AFAIK Linux 2.6 does scale well to 128 CPUs, but I don't have hardware to test it, neither does any of my friends. So I'd say there is no need for a rush to get this in mainstream, the few people that need this can patch their kernels. My guess is that in the time from now until 2.8.0 is released, we will see less than 1000 such machines worldwide.

      --

      Do you care about the security of your wireless mouse?
    4. Re:Solaris by Waffle+Iron · · Score: 3, Interesting
      Solaris scales to hundreds of processors out-of-the-box. Until the vanilla Linux kernel accepts these changes and scale, Solaris still has a big edge in this area.

      If someone buys one of these clusters from SGI, then it does scale "out of the box" as far as they're concerned.

    5. Re:Solaris by Nasarius · · Score: 3, Funny
      My guess is that in the time from now until 2.8.0 is released, we will see less than 1000 such machines worldwide.

      640 CPUs are enough for anyone? :)

      --
      LOAD "SIG",8,1
    6. Re:Solaris by isorox · · Score: 2, Interesting
      My guess is that in the time from now until 2.8.0 is released, we will see less than 1000 such machines worldwide.

      640 CPUs are enough for anyone? :)


      A better retort would be "There's a world market for maybe 5 computers" by the IBM dude.

      Claims are very difficult to make, and impossible to proove. However putting a time limit on a claim is easy. 2.8.0 will be released in 05 or 06, maybe we'll all have 1024CPU boxes in 20 years, but in 20 months?
    7. Re:Solaris by timeOday · · Score: 2, Insightful
      A better retort would be "There's a world market for maybe 5 computers" by the IBM dude.
      I've heard that one. I think the guy was right! It was 1943 after all. Somehow we interpret this as, "There will only ever be a market for 5 computers, even if they change so completely that nothing is left of current technology and only the name stays the same."
  6. Re:Will it be done in time for Quake 3? by Gleng · · Score: 2, Funny
    Will it be done in time for Quake 3?

    Hmm, quite possibly.

    --
    "Proudly Posting Without Reading The Article"
  7. Why gaming? by wyldwyrm · · Score: 2, Funny

    Obviously this would be overkill for doom3(altho I'd still like to have it in my apartment as a space heater/server)! Ok, so it would be more than a space heater; I'd have to run my a/c 24/7/365.25, with all my windows open in the winter. But rendering would be sooooo sweet.

    1. Re:Why gaming? by Ari_Haviv · · Score: 2, Informative

      think real-time radiosity

      --
      Join Team Mozilla #38050 Folding@home
  8. Press Release by foobsr · · Score: 3, Informative

    The link to the press release as of July 14.

    CC.

    --
    TaijiQuan (Huang, 5 loosenings)
  9. Re:really fast? by jhunsake · · Score: 4, Funny

    so does this mean KDE and Openoffice will finally run at decent speed?

    No, you're going to need quantum computing for that.

  10. The big question is... by mangu · · Score: 4, Funny

    ...how easy it is to install printer and sound drivers?

    1. Re:The big question is... by carlmenezes · · Score: 4, Funny

      Well on Windows you'd get a message saying...

      "Windows has detected 1024 new sound cards and is installing them..."

      and then the inevitable..

      "Windows needs to restart your computer. Click OK to restart"

      and then on system restart ...

      1024 sound control apps in the system tray! =)

      --
      Find a job you like and you will never work a day in your life.
  11. In other news... by k4_pacific · · Score: 4, Funny

    Microsoft made a statement today reminding everyone that Windows Server 2003 can handle as many as 32 processors, at the same time even.

    When shown the report about Linux running on 1024 processors, Gates purportedly responded, "32 processors ought to be enough for anybody."

    --
    Unknown host pong.
  12. Re:What happened to RISC? by CableModemSniper · · Score: 4, Funny

    They decided it was too RISCy maybe?

    --
    Why not fork?
  13. Re:really fast? by iggymanz · · Score: 3, Funny

    yes, according to the project leader "on this supercomputer, OpenOffice will finally *run* at decent speed, but waiting for the JVM to start up will still be a bitch" As for KDE, he stated "we're still waiting for the qt toolkit to initialize, but we're confident we can be fully logged in before August"

  14. Re:in time for.... by Ari_Haviv · · Score: 2

    you should see the specs for longhorn's minimum install...

    --
    Join Team Mozilla #38050 Folding@home
  15. Re:What happened to RISC? by DAldredge · · Score: 3, Informative

    AMD and Intel happened. What do you think is running your computer right now (assuming it's an x86)? It a RISC chip that has x86 translater attached, the core of the chip is RISC.

  16. Re:really fast? by darkjedi521 · · Score: 2, Funny

    They said Itanium cluster, not VAX cluster!

  17. Re:What happened to RISC? by Jeff+DeMaagd · · Score: 2, Informative

    Well, this system is neither RISC nor CISC. Itaniums are VLIW. IIRC, it too does have an x86 translator somewhere, but they work far better with native code.

  18. Sun != scientific computing by vlad_petric · · Score: 4, Informative
    Sun processors execute server workloads (database, app server) very well, but that's pretty much it. The emphasis with such workloads is on the memory system. Boatloads of caches do the job. It's also more effective to have tons of processors that are very simple, than just a couple of them that are complex and powerful.

    Scientific computing means data crunching (floating point). Complex, powerful processors are needed. The "stupider, but more" tradeoff doesn't work anymore. Sun processors have fallen behind in this respect.

    --

    The Raven

  19. It became obsolete by mangu · · Score: 3, Informative

    RISC stands for "reduced instruction set computer". It made sense in the 1980's when the "CISC", complex instruction set computers, took tens or hundreds of clock cycles to execute some instructions. With RISC one had less instructions, but each instruction executed in less clock cycles, resulting in a faster computer. Today, CPU's with full-size instruction sets execute most of them as fast as a RISC CPU does, so there is no need to limit the instruction set anymore. Even such complex instructions as multyplying double-precision floating point numbers are executed in a single clock cycle in a Pentium 4.

    1. Re:It became obsolete by Johan+Veenstra · · Score: 4, Informative

      Actually RISC is a bad name for what it stand for, it should have been SISC (Simplified Instruction Set Computer), since the key difference between the two are the complexity of the instructions and not the quantity.

      A CISC instruction could do things like: take the value in register BP, add 4, get the value from the memory at the address you just computed, add the value in the register AX, and put the result back at the same memory location. Execution would take several clock-ticks.

      To do the same in RISC, you would need several instructions (add 4, get from memory, add ax, store to memory). The execution of the individual instructions would take one tick each, so the sequence would take several. But on average RISC was a bit faster.

      CISC was invented in a time that the memory was small, in the CISC way you could store larger programs in the same amount of memory.

      RISC was invented when memory-size was not limited anymore, and looked to displace CISC in the long run.

      CISC was still around when the memory bandwidth became a limiting factor. And since fewer instructions needed to be fetched from memory, more bandwidth was left for other data traffic. RISC lost some of it's speed advantage.

      Modern CISC processors, get CISC instructions from memory, chop them up in smaller instructions, and executes those smaller instructions really fast. So in fact they can be seen as RISC processors, posing as CISC processors, ie the best of both worlds.

      So CISC is a way of compressing RISC instructions, so they take up less memory/bandwidth.

    2. Re:It became obsolete by AKAImBatman · · Score: 3, Informative

      With RISC one had less instructions, but each instruction executed in less clock cycles, resulting in a faster computer.

      Technically, RISC chips were supposed to execute all instructions in ONE cycle. This simplified the chip architecture, allowing it to scale up much farther. The downside was that it put the onus on the compiler writer to produce efficient code. (MIPS is a perfect example of this architecture.) All he had to do was make sure that fewer instructions were executed per task, and the code would run faster.

      That is, until the chip designers started introducing SuperScaler and Out of Order execution. You see, simplifying the chip design provided chip designers with a way to add new optimizations in how instructions were loaded and executed. Unfortunately, this again meant more work for the compiler writer. Now he not only had to optimize the number of instructions, but he also had to optimize the ordering so that multiple instructions could be executed simultaneously or out of order.

  20. Sun does more than that by puppetluva · · Score: 4, Insightful

    Sun hardware has additional, wonderful resiliency features like - allowing cpu's to "fail-over" to other cpus in case of failure. The same holds true for memory, network interfaces, etc. Solaris is aware of these hardware features and can "map out" the bad memory and cpus on the fly (or allow swap-in replacements). The engineers can then replace the broken cpus/memory/interfaces WITHOUT BRINGING THE MACHINE DOWN. This lends itself to an environment than can enjoy nearly 100% uptime. Finally, since Sun has been doing the "lots of cpus" thing for many years, their process management and scalability tends to be much better.

    I don't work for Sun, I'm just an SA that deals with both Solaris and Linux boxes. You don't pick sun for just "lots of cpus", you pick it for a very scalable OS and amazing hardware that allows for a very, very solid datacenter. If downtime costs a lot (ie. you lose a lot of money for being down), you should have Sun and/or IBM zseries hardware. Unfortunately those features cost a lot and most times you can use Linux clustering instead for a fraction of the cost and a high percentage of the availability.

    1. Re:Sun does more than that by Jeff+DeMaagd · · Score: 3, Interesting

      Hot swapping components sounds great, but what if the screwdriver slips out of the finger of the engineer and causes a short?

      The systems I've seen that have hot-swap PCI cards have plastic partitions between the slots to prevent the cards from touching each other when hot swapping them.

      I'm not sure why the hypothetical screwdriver in such a tech's hands. Many systems have non-screw means of retaining memory, PCI cards, CPUs and such.

    2. Re:Sun does more than that by ddmau · · Score: 2, Interesting

      Yes... I was an engineer for SGI for over 16 years (laid off about a year ago)....I have hot-swapped modules on a running system many many times without problems. With the Numaflex archetechure, you have "modules" that house a seperate set of CPUs, memmory, power supply, etc. You shut the offending module down (after the OS has migrated all process's off of it-on the fly). After parts are replacesd, you run a diagnostic off of your laptop/terminal, and bring the module back into the system (OS "sees" the change on the fly and re-integrates the module). It works extreamly well.

    3. Re:Sun does more than that by dsouth · · Score: 2, Interesting
      I don't have a URL, but was involved in several HPC procurements at the time (and knew some insiders at SGI and Cray). The poster is basically correct. The sequence of events was:
      • SGI (at the time still called Silicon Graphics Inc) purchased Cray Research.
      • Well before the purchase, Cray had a hand in developing and marketing Suns larger machine, the "Super Dragon", sold by Sun as the 64SC, and referred to within by Cray as the 64CS -- I'm probably messing up the number, but I do recall the difference in the letter ordering. :-)
      • Prior to the purchase, Cray had completed the design for a new shared memory system based on a high speed switch and single image OS.
      • Prior to the purchase SGI had already completed the design of the first NUMAflex systems, the Origin2000 and Onyx2.
      • So after the purchase, the new merged Cray/SGI had two large SMP/NUMA systems, the Origin line and the Cray developed line. Since they didn't need two, they sold the Cray design to Sun, where it was marketed as the E10000. They also called they NUMA fabric on the Origin2K "CrayLink" even though Cray had little or nothing to do with its design.
      • For a few years afterwards, there were a few within Cray CF (Chippewa Falls) that were somewhat bitter about SGI's decision to pawn off the E10000 design, pointing out repeatedly that Sun was selling plenty of E10000s...
      If it matters, the HPC procurement I was involved in opted for the SGI, which was probably the correct decision. As unstable as the SGI hardware was, the sites I knew running E10000's for general HPC loads had far worse stability problems (though the E10K's undoubtedly better at running Oracle).

      As far as the E10000 being NUMA or SMP -- depends on how you look at it. The Origin line used a bristled hypercube interconnect topology, so memory on the same node as a CPU was one hop thru the fabric, memory on another node connected to the same router was three hops, on a distant node might be multiple routre hops. The E10K (and I think the E15K) used a star topology where memory was ether on the same bus as the CPU or was on another bus that had to go through the switch. So the Sun has basically two levels of memory latency, whereas the SGI could have many levels. The SGI is definitely NUMA, the Sun is either SMP or "slightly NUMA", or however you want to parse it.

      If you've never seen it, the tech papers on how the SGI NUMA systems work are worth reading. Build a fast 8-port crossbar chip (the "spyder chip"), then use it to glue CPUs, memory, and peripherals together. Keep a couple ports open, and you can glue the crossbars together in a fabric. Presto, you can now build a system with 200 CPUs or 100 PCI busses. Pretty cool, even if it was expensive, proprietary, and all the rest.

  21. Similar software available? by Pierce · · Score: 2, Interesting

    With the exception of the NUMA stuff, is there software available to re-create this? I'm not even sure what to search for; would this still be considered a "cluster"?

    1. Re:Similar software available? by dwgranth · · Score: 5, Informative

      well, sgi uses/hacks NUMA, spinlocks, etc to make this happen in a more efficient manner. We recently had a SGI rep come and explain their 512CPU architechture at our LUG meeting... and he basically said that SGI has their own implementation of all of the clustering/cpu stacking techs... which they will eventually feed back into the community.. all good stuff.. understandably they will wait for a year or so so they can get their money's worth before they release their changes.

    2. Re:Similar software available? by shaitand · · Score: 2, Insightful

      If they sell you a copy, they've then distributed it and the gpl requires them to license those changes to you under the gpl.

      Any SGI customer can then contribute the changes back to the kernel long before a year is up.

    3. Re:Similar software available? by diegocgteleline.es · · Score: 2, Informative

      SGI publishes their code. It's just that their changes are so radical and "dirty" that they're not useful/mainteinable for the rest. Remember, SGI has sold 256 CPU machines with their 2.4 kernel - where 2.4 vanilla doesn't works very well beyond 8 cpus

  22. from MPI to multithreaded ? by InodoroPereyra · · Score: 3, Interesting
    From the article:
    Earlier cluster supercomputers at the NCSA used multiple images of the Linux operating system -- one for each node -- along with dedicated memory allocations for each CPU. What makes this system more powerful for researchers is that all of the memory will be available for the applications and calculations, helping to speed and refine the work being done, Pennington said.

    "The users get one memory image they have to deal with," he said. "This makes programming much easier, and we expect it to give better performance as well."

    So, anyone has any insights as to why/how this matters for the programmers ? Does this mean that the applications running on the "old" clusters, presumably using some flavor of MPI to communicate between nodes, will have to be ported somehow to become multithreaded applications ? Or maybe they will still run using MPI on the big shared memory pool, and each process will be sent to the appropriate node by the OS on demand ? Thanks !
    1. Re:from MPI to multithreaded ? by Sangui5 · · Score: 4, Informative

      Does this mean that the applications running on the "old" clusters, presumably using some flavor of MPI to communicate between nodes, will have to be ported somehow to become multithreaded applications ?

      NCSA still has plenty of "old" style clusters around. Two of the more aging clusters, Platinum and Titan are being retired, to make room for newer systems like Cobalt. Indeed, the official notice was made just recently--they're going down tommorrow. However, as the retirement notice points out, we still have Tungsten, Copper, and Mercury (Terragrid). Indeed, Tungsten is number 5 on the Top 500, so it should provide more than enough cycles for any message-passing jobs people require.

      So, anyone has any insights as to why/how this matters for the programmers ?

      What it means is that programming big jobs is easier. You no longer need to learn MPI, or figure out how to structure your job so that individual nodes are relatively loosely-coupled. Also, jobs that have more tightly-coupled parallelism are now possible. The older clusters used high-speed interconnects like Myrinet or Infiniband (NCSA doesn't own any Infiniband AFAIK, but we're looking at it for the next cluster supercomputer). Although they provided really good latency and bandwidth, they aren't as high-performing as shared memory. Also, Myrinet's ability to scale to huge numbers of nodes isn't all that great--Tugsten may have 1280 compute nodes, but a job that uses all 1280 nodes isn't practical. Indeed, untill recently the Myrinet didn't work at all, even after partitioning the cluster into smaller subclusters.

      This new shared-memory machine will be more powerful, more convienient, and easier to maintain than the cluster-style supercomputers. Hopefully it will allow better scheduling algorithms than on the clusters too--an appaling number of cycles get thrown away because cluster scheduling is non-preemptive.

      I'd also like to point out some errors in the Computerworld article. NCSA is *currently* storing 940 TB in near-line storage (Legato DiskXtender running on an obscenely big tape library), and growing at 2TB a week. The DiskXtender is licenced for up to 2 petabytes--we're coming close to half of that now. The article therefore vastly understates our storage capacity. On the other hand, I'd like to know where we're hiding all those teraflops of compute--35 TFLOPS after getting 6 TFLOPS from Cobalt sounds more than just a little high. That number smells of the most optimistic peak performance values of all currently connected compute nodes. I.e. - how many single-precision operations could the nodes do if they didn't have to communicate, everything was in L1 cache, we managed to schedule something on all of them, and they were all actually functioning. Realistically, I'd guess that we can clear maybe a quarter of that figure, given machines being down, jobs being non-ideal, etc. etc. etc.

      As a disclaimer, I do work at NCSA, but in Security Research, not High-Performance Computing.

    2. Re:from MPI to multithreaded ? by kscguru · · Score: 3, Informative
      Caveat: I think MPI itself is very recent (standardized only w/in the past few years), before that everyone used custom message-passing libraries.

      It's a tradeoff. MPI is "preferred" because a properly written MPI program will run on both clusters and shared-memory equally fast, because all communication is explicit. It's also much harder to program, because all communication must be made explicit.

      Shared-memory (e.g. pthreads) is easier to program in the first place (since you don't have to think about as many sharing issues) and more portable. However, it is very error-prone - get a little bit off on the cache alignment or contend too much for a lock, and you've lost much of the performance gain. And it can't run it on a cluster without horrible performance loss.

      If it's the difference between spending two months writing the shared-memory sim and four months writing the message-passing sim that runs two times faster on cheaper hardware, well, which would you choose? Is the savings in CPU time worth the investment in programmer time?

      Alas, the latencies on a 1024-way machine are pretty bad anyway. If they use the same interconnect as the SGI Origin, it's 300-3000 cycles for each interconnect transaction (depending on distance and number of hops in the transaction). Technically that's low-latency... but drop below 32 processors or so, and the interconnect is a bus with 100 cycle latencies, so those extra processors cause a lot of lost cycles.

      --

      A witty [sig] proves nothing. --Voltaire

  23. HP Overstock by bayerwerke · · Score: 2, Funny

    They bought HP's overstock of them for pennies on the dollar.

  24. 3TB of memory? by gsasha · · Score: 3, Funny

    I wish I had that much disk space...

  25. Re:What happened to RISC? by DAldredge · · Score: 2, Informative

    True. There are at least two different x86 emulators available. There is the HW one that is built in and the newer and faster IA-32 Execution Layer (currently only available for windows).

  26. Re:What happened to RISC? by Epistax · · Score: 4, Interesting

    RISC and CISC offer no final advantage over the other, so the one that dominated is the one that was here first.

    Quick examples: RISC use less power because it has less logic? No, it needs to run at a higher frequency to maintain the same speed as a slower CISC.
    RISC is easier to program? Depends on the person. A compiler can take advantage of large instructions very well which are hardware optimized.
    RISC easier to develop/manage? I'll say yes for RISC on this one. There's simply less logic on the chip so less logical errors possible. There's plenty more cache which can break but broken parts can be fused off.
    RISC is physically smaller? No. RISC needs a higher clock frequency because many more instructions need to be executed. The result of this is that a much larger instruction cache is needed on chip.

    I don't remember every comparison but it pretty much comes out that neither is better than the other. That being said RISC is better than x86. Everything is better than x86. However CISC vs RISC is much harder to judge. Having done x86, 68k, and MIPS I must say that RISC is a pleasure.

  27. Sun and/or IBM zseries hardware by r00t · · Score: 3, Informative
    Linux runs on both of these, with official IBM support on the zSeries. On the IBM hardware, go ahead and swap out CPUs and memory. It's supported, today, with Linux.

    The Sun hardware is more difficult to deal with, since there isn't a virtual machine abstraction. You can't do everything below the OS. Still, Linux 2.6 has hot-plug CPU support that will do the job without help from a virtual machine. Hot-plug memory patches were posted a day or two ago. Again, this is NOT required for hot-plug on the zSeries. IBM whips Sun.

    I'd trust the zSeries hardware far more than Sun's junk. A zSeries CPU has two pipelines running the exact same operations. Results get compared at the end, before committing them to memory. If the results differ, the CPU is taken down without corrupting memory as it dies. This lets the OS continue that app on another CPU without having the app crash.

    1. Re:Sun and/or IBM zseries hardware by flok · · Score: 2, Interesting

      What happens if both pipelines make the same mistake because the L1-cache feeds them both the same corrupted data?

      --

      www.vanheusden.com - home of Multitail, HTTPing, CoffeeSaint, EntropyBroker, rsstail, bsod, listener, nagcon, nagi
    2. Re:Sun and/or IBM zseries hardware by r00t · · Score: 2, Informative

      There isn't any part of the system without some
      sort of error correction. The cache generally
      has ECC for this. Since L1 is innermost and small,
      it may well be duplicated along with the pipelines,
      but I think they use ECC for that as well.

      This is full-path protection. Cables have ECC
      and/or a protocol with checksums. Disks are RAID.
      Methods of error correction vary by component,
      but nowhere are they missing.

    3. Re:Sun and/or IBM zseries hardware by kasperd · · Score: 2, Informative
      If the results differ, the CPU is taken down without corrupting memory as it dies.

      A few questions:
      • What if an error happens in the comparison unit?
      • What happens to the program that was running on the CPU as it is taken down? (The CPU registers is part of the program state, so you cannot just continue on another CPU).
      --

      Do you care about the security of your wireless mouse?
  28. The solution! by Sidicas · · Score: 5, Funny

    "will run a single Linux operating system image across 1,024 Intel Corp. Itanium 2 processors..."
    "The National Center for Supercomputing Applications will use it for research"


    1. Make a system that generates more heat than a supernova.
    2.Research a solution to global warming.
    3. Profit!

  29. In other Headlines by ShadowRage · · Score: 4, Funny

    SCO gained $715,776

  30. Another thing Sun does well.... by passthecrackpipe · · Score: 4, Insightful
    Cache reduction - ehh cash reduction. One of the prime reasons Sun is losing serious levels of installed base to Linux is not because linux is better, it is because Sun is bloody expensive - outrageously so. And while most customers had to endure the annual fleecing with gritted teeth - due to lack of alternatives - Sun is now being pummeled out of datacenter after datacenter.

    I have replaced Sun Hardware/Software combo's in the core datacenter for many of our customers, and I can tell you that yes - Sun brings some amazing features to the table - most of which are there to serve old technology. Linux on simple CPU's delivers such an amazing price performance (depending on the job, we see an average of 3x to 4x performance increase for 25% of the cost. That means that if I were to spend the same, lifecycle-wise, on a Linux cluster as I would on a big Sun box like the 10k or 15k, I'd end up with 12x to 16x the performance of the Sun solution.

    The same functionality in terms of cpu and ram (and other hardware) failure is available on the Linux cluster, albeit in less graceful form - the magic spell to invoke goes like this:
    shutdown -h now
    if I have 300 machines crunching my data, I can afford to lose a couple, and can afford to have a few hot-standby's.

    Of course, the massively parrallel architecture does not work for all applications, and in those cases you would look to use either OpenMOSIX or of course the (relatively expensive) SGI box mentioned in this article.
    --
    People who think they know everything are a great annoyance to those of us who do.
    1. Re:Another thing Sun does well.... by sparkz · · Score: 2, Informative
      If you're lucky enough to have a massively-parallel, read-only application, then go for Linux clusters.

      Read the Sun Blueprints (http://www.sun.com/blueprints/browsesubject.html# cluster) for how a real cluster works - actaully caring about data integrity. That is the crux with clustered systems: What happens if one node "goes mad" even though it's no longer a "valid" part of the cluster?
      Look into Sun's dealing with failure-fencing; it's drastic (PANIC a node if it can't be sure it's a cluster member) but it works.

      By contrast, Linux clustering seems to be at the level of "let's share an IP address, we can balance the load". Great for DNS (but -oh, DNS has that built-in) or Apache read-only servers (assuming no session-management, static-only pages).

      Digital had an excellent cluster package last decade; Sun seem to be getting to that level now. Linux, sorry to say, is years behind.

      --
      Author, Shell Scripting : Expert Re
    2. Re:Another thing Sun does well.... by kscguru · · Score: 2, Informative
      OLTP is the classic anti-cluster workload. Essentially random data access patterns, very large resident data sets with a huge amount of simultaneous (and synchronous) accesses. OLTP means low-latency, and OLTP will die on a cluster. By definition.

      Now sure, some careful planning can take an OLTP system and make it more cleanly distributed, but at that point it isn't OLTP, because all the nasty bits that made it a hard workload are washed out. Running a constantly-changing database (e.g. financial market?) on a cluster is hard; running a mostly static database (e.g. shopping cart?) is easy.

      However, I agree with your point. Very few people need the 32-cpu monster (although there are a few!). Handling transaction volume can be done two ways: buy a big general-purpose machine that can handle the volume, or buy a cheaper cluster that more closely matches the workload. And today, the cluster is the right answer.

      I think the difference between then and now is that before, we didn't know what the workload was supposed to be. In that case, a big general-purpose monster server is the most flexible solution. But now, we know what workload we want, and it's cheaper to design a cluster for that workload.

      --

      A witty [sig] proves nothing. --Voltaire

  31. Wow by Steamhead · · Score: 2, Funny

    Hot damn, this is one server that could survive a slashdotting.

  32. Impressive... by Pantero+Blanco · · Score: 2, Informative

    ...Right on the heels of this too.

  33. Re:Advantages...? by myg · · Score: 4, Informative
    Because a machine like that isn't about running Apahce or serving files.

    The purpose of that computer is to solve complex scientific problems such as weather simulations, high-energy particle simulations, protine folding, etc. Many of these simulations involve iterated systems of equations that can take decades to solve on the fastest CPU's we have today.

    The only way to get meaningful results in a meaningful amount of time is to break the problem apart into smaller problems and solve them in parallel.

    Some projects, such as Folding@Home and Find-A-Drug go the distributed computing route -- use many disconnected systems to solve the problem.

    The downside to that approach is that not all problems can be easily broken apart -- and some classes of problems can exist without tight coupling but they loose efficiency. The impressive thing about this particular super computer is that it has a single, unified memory image.

    This is very useful for some classes of simulation problems when the entire simulation must be present for each iteration.

  34. RISC overrated by TheLink · · Score: 3, Informative

    It's ok for embedded and other areas (slower CPUs) but with desktop/server CPUs being much faster than memory speeds and remaining so for the forseeable future, having common and popular instructions being shorter than other instructions is actually an advantage despite the complexity that involves.

    It's like having on-the-fly instruction decompression. e.g. CISC programs tend to be smaller in main memory+cache, and they travel in CISC/"compressed" form taking up less memory bandwidth over the memory/cache buses to the CPU instruction decoder where they are "decompressed" to RISC micro-ops to be executed.

    Look at the mainstream desktop/workstation/server CPUs. Only the SPARC is RISC. IBM POWER/PowerPC is barely RISC[1], some people think it's more CISC than RISC. Itanium isn't RISC. x86 isn't. The rest (Alpha, MIPS, PA-RISC) are either out of the market or on their way out.

    As long as CPUs are fast and much faster than RAM (and cache remaining expensive), it's often worth doing the compression/decompression thing.

    [1] I believe IBM's POWER chips actually decode their "RISC" instructions to simpler instructions, some of their "RISC" instructions are pretty complex- kinda oxymoronic... But as I mentioned, that may not be such a bad thing.

    --
  35. Actually.. by krutadal · · Score: 2, Informative

    Pentium 4 reduces the CISC instructions to a series of RISC-like "microops" that, for the most complex of the bunch, can take hundreds of cycles to complete.

  36. Scalability of applications by xyote · · Score: 2, Insightful

    Well, we know that the kernel can be made to scale but what about the applications? The same issues the kernel had to face, the applications have to face also. For parallel computing you naturally try to avoid too much sharing by "parallelizing" the programs. For applications like databases, you are talking about a lot of sharing of a lot of data. Not all the techniques the Linux kernel used are available to the applications yet.

    1. Re:Scalability of applications by xtp · · Score: 5, Informative

      SGI has had 512 and 1024-cpu MIPS-based systems in operation for more than 5 years. Much work was done on the Irix systems to initialize large parallel computations and provide libraries and compiler support for these configurations. One technique is to provide message-passing libraries that use shared memory. A better technique is to morph (slightly) parallel mesh apps so that each computational mesh node exposes the array elements to be shared with neighbors. No message-passing needed - you push data after a big iteration and then use the (really fast) sync primitives to launch into the next iteration. With shared-nothing clusters (i.e. Beowulf) a computation (and its memory) must be partitioned among the compute nodes. The improvement over a "classical" cluster can be startling especially with computations that are more communications-bound than compute-bound. This means there is no value for replacing a render farm with a big system. But there are big compute problems, e.g. finite element, for which the shared-nothing cluster is often inadequate.

      With a single memory image system the computation can easily repartition dynamically as the computation proceeds. Its very costly (never say impossible!) to do this on a cluster because you have to physically move memory segments from one machine to another. On the NUMA system you just change a pointer. The hardware is good enough that you don't really have to worry about memory latency.

      And let's not forget io. Folks seem to forget that you can dump any interesting section of the computation to/from the file system with a single io command. On these systems the io bandwidth is limited only by the number of parallel disk channels - a system like the one mentioned in the article can probably sustain a large number of GBytes/sec to the file system.

      Let's not forget page size. The only way you can traverse a few TB of memory without TLB-faulting to death is to have multi-MByte-size pages (because TLB size is limited). SGI allowed a process to map regions of main memory with different page sizes (upto 64 MB I think) at least 10 years ago in order to support large image data base and compute apps.

      When I used to work at SGI (5 years ago) the memory bandwidth at one cpu node was about 800 MBytes/s. My understanding is that the Altix compute nodes now deliver 12 GBytes/s at each memory controller. Although I haven't had a chance to test drive one of these new systems, it sounds like they have gradually been porting well-seasoned Irix algorithms to Linux. It is unlikely that a commodity computer really needs all of this stuff, but I'm looking at a 4-cpu Opteron that could really use many of the memory management improvements.

      g

  37. The real test by Bruha · · Score: 4, Funny

    Fire up apache and then post a link to it here on slashdot. We love a challenge.

  38. Let me clue you in on a few things by justins · · Score: 4, Informative
    You don't pick sun for just "lots of cpus", you pick it for a very scalable OS and amazing hardware that allows for a very, very solid datacenter.

    The UNIX made by SGI (the company making the machine referenced in the article) is more scalable than Solaris. Remember, IRIX was the first OS to scale a single Unix OS image across 512 CPUs. And now they've eclipsed that, with Linux.

    Sun hardware has additional, wonderful resiliency features like - allowing cpu's to "fail-over" to other cpus in case of failure.

    None of that is unique to Sun.

    Finally, since Sun has been doing the "lots of cpus" thing for many years, their process management and scalability tends to be much better.

    Better than what? And says who? They've never decisively convinced the market that they're beter at this than HP, SGI, IBM or Compaq.

    If downtime costs a lot (ie. you lose a lot of money for being down), you should have Sun and/or IBM zseries hardware. Unfortunately those features cost a lot and most times you can use Linux clustering instead for a fraction of the cost and a high percentage of the availability.

    In addition to ignoring the other good Unix architectures out there in a dumb way with this comparison, you're also totally missing the point of the article. Linux supercomputing isn't just about cheap clusters anymore. Expensive UNIX machines on one side and cheap Linux clusters on the other is a false dichotomy.
    --
    Now before I get modded down, I be to remind whoever might read this that what I am saying is FACT. - bogaboga
  39. 1024 cpus and 3 TB memory by Anonymous Coward · · Score: 4, Funny

    That's almost enough to run Emacs!

  40. So, when will Jeff Dike have UML ported to this? by kclittle · · Score: 2, Funny
    1024 physical CPUs running *one* logical host linux image running god knows how many uml instances, each fully independent of the other and seeing 3 TB of memory. The mind boggles! :-)

    --
    Generally, bash is superior to python in those environments where python is not installed.
  41. Re:Will it be done in time for Quake 3? by Ari_Haviv · · Score: 2, Funny

    It's already out...in Japan

    --
    Join Team Mozilla #38050 Folding@home
  42. Wow! by juggaleaux · · Score: 2, Funny

    That much hard drive space rivals my porn collection! :O

  43. Re:it would make 2d place by Sangui5 · · Score: 2, Informative

    35 TFLOPS is the peak performance number sitewide. Cobalt itself should be able to clear between 6 and 7, making it a much more modest 25ish place. There are rumours that a bigger cluster-style machine is in the works, once the issues with Tungten (NCSA's biggest and #5 in the world) are ironed out.

  44. Re:What happened to RISC? by ArbitraryConstant · · Score: 2, Informative

    "Quick examples: RISC use less power because it has less logic? No, it needs to run at a higher frequency to maintain the same speed as a slower CISC."

    No. This is exactly wrong. G5s are a good example of this. They easily outperform P4s at the same clock speed, and it's the P4 which must run at the higher speed to compensate.

    The overhead of supporting all the various instructions and adressing modes, as well as being able to fit the whole CPU in one die were what made RISC a good choice in the past. Now, that overhead is dwarfed by other parts of the chip, and they're all running weird u-ops internally, so it makes little difference.

    "RISC is easier to program? Depends on the person. A compiler can take advantage of large instructions very well which are hardware optimized."

    Compilers are notorious for not utilizing esoteric opcodes. And when they do, there's almost never a significant performance advantage in doing so.

    For example, none of the code I've ever tested with icc (one of the only compilers that can use weird opcodes on i386) has been more than about 5% faster than "gcc -Os -msse2", and a lot of it has been slower.

    "RISC is physically smaller? No. RISC needs a higher clock frequency because many more instructions need to be executed. The result of this is that a much larger instruction cache is needed on chip.

    RISC does generally need a larger cache, but it does not need a higher frequency.

    "I don't remember every comparison but it pretty much comes out that neither is better than the other. That being said RISC is better than x86. Everything is better than x86. However CISC vs RISC is much harder to judge. Having done x86, 68k, and MIPS I must say that RISC is a pleasure."

    Just use a compiler. Anything with a proper MMU will be good enough.

    --
    I rarely criticize things I don't care about.
  45. Not likely - Same Machine for $1k in 14 years. by DanielJH · · Score: 2, Interesting

    The point here is that if performance continues to grow like it is today, they will be selling these machines for $1,000 at Walmart in just 14 years. It will be about the same size as the computer you own now.

    The problem with 1024CPU is much more then just the operating system. It is a mess of communication hardware needed to wire everything together. It is about special power feeds and air conditioning, and sometimes floor loading requirements.

    Take a quick look at the end of this PDF. It talks about heat output and the need for 3 phase 240V power coming into this computer. It is not unusual to hire both an electricial and a cooling expert when you talk about installing one of these babies. Not for the Home user, and never will be, however, idential compute power comming in just 14 years, so get ready...

    1. Re:Not likely - Same Machine for $1k in 14 years. by isorox · · Score: 2, Interesting

      Indeed, we're implementing a 24 bay system at the moment, in a brand new apps room off one of our current ones (which happens to have about 100 bays, most of the overflowing), so, yes, power is a problem, and cooling doubly so. (One apps room is currently responsible for two 24 hour tv channels and barely has a backup AC unit (it may work if we shut down some of the less-essential equipment).

  46. Re:What happened to RISC? by ArbitraryConstant · · Score: 2, Informative

    "I just wanted to point out you mentioned GCC. Sadly GCC is about the worst compiler in existence for performance."

    That was my point. A shitty compiler with moderate optimization settings is very close in performance to one of the top compilers out there.

    "The top compiler is infact the Intel compiler in part because it knows about unpublished instructions. Have fun reading the code it generates."

    Yes, this was the example I used. The vectorized loops are a bitch to read.

    "On the subject of G5s being faster, there are a whole host of differences between G5's and P4's. You can't just pick one difference and claim that's the reason."

    That's true. However, I never gave a reason for the performance difference, so I'm not sure why you're saying this.

    You said that RISC CPUs needed to run at a higher frequency to get the same performance as a CISC CPU. Since you're wrong, I gave an example to prove you wrong.

    There is basically only one RISC CPU architechture that has the benefit of a really large R&D effort these days, and that's POWER/PowerPC. Itanium is not strictly RISC, and nothing else has the benefit of such a huge R&D effort.

    Thus, the only RISC CPUs that can be fairly compared to x86 are the POWER/PowerPC chips from IBM. The only two x86 CPUs that have a really huge R&D effort behind them are the Athlons from AMD and the Pentiums from Intel.

    They all have relatively similar performance (with advantages going to one or the other in a few niches). PowerPC chips are shipped at similar clock speeds to the Athlons and much lower clock speeds than the Pentiums.

    Therefore, your statement that RISC CPUs need higher clock speeds to get the same performance has been demonstrated to be false in a comparison between the only 3 large chip makers in operation.

    Further comparisons, such as those between Sparc and the VIA C3, which are smaller but significant efforts, show the RISC CPU getting more done per clock cycle, again demonstrating your statement to be false.

    --
    I rarely criticize things I don't care about.
  47. Not sure if your serious but lets explain. by SmallFurryCreature · · Score: 3, Informative

    I will avoid the tech terms (partly because they would confuse you, partly because I don't know them all but mostly because they ain't needed.

    A single CPU computer can execute ONE instruction at the time. Meaning one program thread running at the time. But wait you say, my OS can run multiple programs at the same time. WRONG. It can't. It is a trick. It is running one program at the time but it is switching the program it is running really fast. There is however a problem with this. When it has switched to a program all the other programs are effectevily at the the mercy of the program now running INCLUDING the OS. Wich is why DOS and Windows and Linux and Mac OS and all the others had "hangups". With an extremely well written OS these hangups (when a program doesn't switch back to the OS) can be avoided but it still remains a case that all the programs and the OS are fighting for time on 1 single cpu.

    So what happens when you add a cpu? Well a lot less switching PLUS if a program for whatever reason does not switch properly the OS can still be run on the other processor. Just making a windows box a dual CPU instantly makes it far more robuust. I encountered this myself with an old dell P3 that had a dual board but no dual CPU installed. Before I added a second CPU it was the usual windows crap of hangs and reboots and BSoD. Afterwards it ran as stable as a unix machine. Simple things like openeing a complex folder in exploder no longer "froze" the desktop as it could simple run exploder on one CPU and say word or my mp3 player on the other.

    Don't forget too that there think like ATA harddrives and CD-ROM need the cpu to drive them. This takes a lot of long cycles and a lot of waiting, not so much CPU power as just time on the CPU. With a second one to do all the other tasks this makes everything run far smoother.

    So what is better? Running 1 2ghz cpu or 2 1ghz cpu's? Depends. If you are running 1 program thread go with the 1 cpu. It will take all the cpu time but will not need to share it. If however you are running countless small threads go with the 2 or more solution. Threads will have access faster and you will loose less cpu time on the time needed to execute switches.

    Oh yeah that is another problem. Switching between programs takes cpu time as well. It is not unknown for single CPU systems to spend so much time on switching they don't have time to run anything anymore. The old to many running programs problem known from windows but wich affects every OS.

    Lastly there is a simple problem. Say you want real power do you go for a quad 2ghz or a single 8ghz. Answer? It is a trick, no such thing as a 8ghz cpu.

    If you get the chance buy a second hand dual P3 and install windows 2000+ or Linux on it and be amazed. That old system will respond a lot faster underload then your 3ghz monster.

    --

    MMO Quests are like orgasms:

    You may solo them, I prefer them in a group.

    1. Re:Not sure if your serious but lets explain. by Paul+Jakma · · Score: 2, Informative

      A single CPU computer can execute ONE instruction at the time.

      Incorrect, a modern superscalar CPU can execute several instructions at the same time potentially. The pentium was the first Intel CPU able (very crudely) to do this, the P6 was 3-way superscalar (iirc - there was an article linked to on /. about it recently), able to retire (ie execute) 3 instructions per clock cycle. This implies some kind of pipeline (ie the processor must fetch several instructions at the same time from RAM and examine them and decide how to schedule them), which implies that actually such a CPU at any given time has a whole bunch of instructions in different stages of execution.

      Meaning one program thread running at the time.

      It does not mean that at all.

      Most CPUs only support a single context of execution, however some CPUs support multiple execution contexts, intel "HyperThreading" would be one example. So a superscalar CPU with multiple execution contexts could have many instructions in several stages of execution from multiple programme contexts at any given point in time.

      When it has switched to a program all the other programs are effectevily at the the mercy of the program now running INCLUDING the OS. Wich is why DOS and Windows and Linux and Mac OS and all the others had "hangups".

      You're describing co-operatively multi-tasking operating systems, which linux is not. Ie systems like Windows 3 and MacOS 9 and earlier.

      Linux is a preemptive multi-tasking OS, as is MacOS X, WinNT/2k and (partly) Win9x. Under such a system programmes are given only limited periods of time to run, a programme which does not yield control of the system by itself will be suspended eventually. Typically this is done by the OS setting a hardware timer, upon the expiration of which the hardware forcibly returns control to the OS, where upon it can elect to give control of the system to another process (setting that timer again, if needs be). On most hardware this is done with a timer interrupt, eg IRQ 0 on PC class machines, which fires at a preprogrammed interval (100 (older linux) or 1000HZ (2.6) or 1024HZ (Linux or Digital Unix on Alpha)), when the interrupt goes off, the CPU saves the state of the process (as it always does to handle interrupts) and runs the appropriate interrupt vector as installed by the operating system, which can then elect to run another process (usually requiring the OS to save some state of current running process which CPU hadnt saved or which is OS dependent and then restore state of another process).

      Anyway, a process on Linux can *NOT* "hang" the system by refusing to yield control. The OS (with help from hardware) will intervene.

      still remains a case that all the programs and the OS are fighting for time on 1 single cpu

      This isnt really a good mental image to have of a modern OS. The OS does not "fight for time". The OS only ever runs because:

      1. A process calls the OS to perform some service on that processes behalf.

      Eg, to do work on that processes behalf such as IO (read/write from disk/network/whatever or IPC IO and deliver it to process/destination), or to setup the OS abstractions needed for IO (filehandles typically on Unix) or to interact with OS abstractions, eg to list a directory or running processes or send a signal to a process, etc.

      1a. A subset of 1, where a process calls the OS to voluntarily yield control of the CPU it is executing on. The OS potentially can do some housekeeping here before restoring state of another process and allowing it to run.

      2. The hardware directly intervenes and executes OS installed functions, typically in response to an interrupt generated by a timer or other hardware or else some exceptional event (typically a memory fault where a memory address is referenced that does not "exist").

      Operating systems will typically try to do as little as possible work in the latter case and will try defer as much as p

      --
      I use Friend/Foe + mod-point modifiers as a karma/reputation system.
  48. In my garage ... by CyBlue · · Score: 3, Funny

    I've been working all weekend to cluster 4 Honda Civics. When I'm done, I expect it to go 280MPH, get 12MPG and 0-60 in under 3 seconds.

    1. Re:In my garage ... by mindfucker · · Score: 2, Funny

      Ha. I can get your Civic to do that without modifying it at all. Just push it off a cliff.

  49. Scalability of sorts by Decaff · · Score: 3, Informative

    The UNIX made by SGI (the company making the machine referenced in the article) is more scalable than Solaris. Remember, IRIX was the first OS to scale a single Unix OS image across 512 CPUs. And now they've eclipsed that, with Linux.

    Scalability is a complex issue. SGI has put a whole lot of processors together and put a single Linux image on it (so that a single program can use all memory), but this says nothing about how that setup will actually perform for general purpose use. Just because the hardware allows threads on hundreds of processors to make calls into a single Linux kernel, does not mean that there will not be major performance issues if this actually happens.

    There are performance issues with memory even on single processor systems with nominally a single large address space, and a developer may need to put a lot of work into ensuring that data is arranged to make best use of the various levels of cache.

    Many of the multi-processor architectures require even greater care to ensure that the processors are actually used effectively.

    The fact that a single Linux image has been attached to hundreds of processors is no indication of scalability. A certain program may scale well, or not.

  50. Correctable RAM and L2 errors? by AtariDatacenter · · Score: 2, Informative

    Being an administrator of some 24-way boxes, I have to ask a more detailed question about the error handling. Is the L2 cache in the CPUs just ECC'd, Parity, or fully mirrored? You'll find that on a large installation of CPUs, not being fully mirrored on your L2 will cause quite a bit of downtime over the course of a year with that many CPUs. I don't have those Itanium 2 specs. Anyone?

    UPDATE: I looked. Itanium 2's L2 cache is ECC. It'll correct a 1 bit failure, detect and die on a 2 bit failure. Believe it or not, on a large number of CPUs running over a long period of time, it happens more often than you think. It also says it has an L3. No idea on the L3 cache protection method used. Because they don't say, I'd also guess ECC. Wheee! Lots of high speed RAM around the CPU with ECC protection. Well, nobody called this an enterprise solution, so I guess its okay.

    Also, you're going to have regular issues with soft ECC errors on that many TB of RAM. And then your eventual outright failures that'll bring down the whole image of the OS. (An OS could potentially handle it 'gracefully' by seeing if there is a userspace process on that page and killing/segfaulting it, but that's more of an advanced OS feature.)

    Boy, I'd really hate to be the guy in charge of hardware maintenance on THAT platform.

  51. Current State of the Art - 2 TB mem and 256 cpus by random_me · · Score: 2, Informative

    I am happy to say that I have worked, and continue to work on the current state of the art:
    http://www.ccs.ornl.gov/Ram/Ram.html

    A few notes:
    Linux kernel: 2.4.21-sgi240rp04051808_10074
    From df, a 1 TB ram disk:
    none 1023700704 0 1023700704 0% /dev/shm
    From /etc/redhat-release:
    Red Hat Linux Advanced Server release 2.1AS (Derry)

    The machine is actually not nice to work on. It is prone to frequent short freezes (2-15 seconds long; about one every 2-3 minutes, although not evenly spaced out).