Slashdot Mirror


SGI to Scale Linux Across 1024 CPUs

im333mfg writes "ComputerWorld has an article up about an upcoming SGI Machine, being built for the National Center for Supercomputing Applications, "that will run a single Linux operating system image across 1,024 Intel Corp. Itanium 2 processors and 3TB of shared memory.""

360 comments

  1. Whoa! by rylin · · Score: 5, Funny

    Sweet, now we'll be able to run Doom3 at highest detail in *SOFTWARE*-rendering mode!

    1. Re:Whoa! by Anonymous Coward · · Score: 0

      And you are, what again?

    2. Re:Whoa! by Pharmboy · · Score: 1

      Sweet, now we'll be able to run Doom3 at highest detail in *SOFTWARE*-rendering mode!

      So your saying this new computer has 1024 CPUs, but doesn't support Open GL?

      --
      Tequila: It's not just for breakfast anymore!
    3. Re:Whoa! by Thaidog · · Score: 1

      It's SGI... how the FUCK could it not?

      --

      ||| I still can't believe Parkay's not butter.

    4. Re:Whoa! by Anonymous Coward · · Score: 0

      Why is doing OpenGL with CPUs *not* 'supporting' OpenGL?

  2. Ok by CableModemSniper · · Score: 5, Funny

    But does it run--crap. I mean what about a Beowulf--doh!
    Damn you SGI!

    --
    Why not fork?
    1. Re:Ok by JamesD_UK · · Score: 1

      It does, but SCO will charge you $715,776 to license it!

    2. Re:Ok by yomegaman · · Score: 1

      Does it have tabs?

      --
      ...wearing a skin-tight topless leather jumpsuit, with cutaway buttocks and transparent crotch panel.
    3. Re:Ok by biounlogical · · Score: 2, Funny

      In Soviet Russia 1024 Itaniums run a single image of you!
      ha!

    4. Re:Ok by jc42 · · Score: 4, Funny

      Hey, any reason we couldn't build, say, 1024 of these things, and make a beowulf cluster of them?

      --
      Those who do study history are doomed to stand helplessly by while everyone else repeats it.
    5. Re:Ok by djcapelis · · Score: 1, Interesting

      Other than a little issue called price... I don't think so!

      Alright, pass around the hat.

      --
      I touch computers in naughty places
    6. Re:Ok by Anonymous Coward · · Score: 0

      God I wish I had some mod points. This doesn't deserve 5.

    7. Re:Ok by kooshvt · · Score: 1

      All they need to do now is move it to Japan and pour hot grits on it. I can head the sound of slashdot readers heads exploding.

    8. Re:Ok by Anonymous Coward · · Score: 0

      I've seen one of these in every third story for the last month. It wasn't funny when it was original, it isn't funny now.

    9. Re:Ok by PetoskeyGuy · · Score: 1

      A million Itanium CPU's running all in the same building?

      And people think global warming is caused by clouds....

    10. Re:Ok by turgid · · Score: 1
      Hey, any reason we couldn't build, say, 1024 of these things, and make a beowulf cluster of them?

      If you're paying...

    11. Re:Ok by Anonymous Coward · · Score: 0

      Well its for 'weather simulations' wonder if thats in the model? Or maybe they are just trying to effect weather just by the amount of heat given off :)

    12. Re:Ok by NuclearDog · · Score: 0

      It's all a conspiracy... they're in it with the weather people...

      They put 1048576 (1024*1024) cpus in one building thereby insuring hot weather, and the weather people will never be wrong again...

      I'm telling you, it's the weather people. It's ALWAYS the weather people.

      --
      This statement is forty-five characters long.
    13. Re:Ok by Anonymous Coward · · Score: 0

      Please install that in Canada, we can handle the temp increase better than the US, we need it more too.

    14. Re:Ok by GuyWithLag · · Score: 1

      Hey, if 1024 Itaniums could run me in near-realtime, i'd pay for them and upload myself, even in Soviet Russia.

      .... and then the power's out...

  3. Longhorn by Anonymous Coward · · Score: 3, Funny

    Yeah, but can it run Longhorn?

    1. Re:Longhorn by cpghost · · Score: 1, Funny

      No, but it will be used to [cross-]compile it!

      --
      cpghost at Cordula's Web.
    2. Re:Longhorn by arvindn · · Score: 3, Funny

      I hereby nominate this to be the next standard in-joke of slashdot. The previous candidate, evil overlords, never really took off in popularity, leaving us in the pathetic situation that every single bad joke available is soooo 2002! I particularly like "but can it run Longhorn?" because it will be funny until Longhorn is out, which is (hopefully) a long long time from now ;-)

    3. Re:Longhorn by EvilTwinSkippy · · Score: 1

      Up until recently I would have said the same thing about Doom 3.

      --
      "Learning is not compulsory... neither is survival."
      --Dr.W.Edwards Deming
    4. Re:Longhorn by TheScienceKid · · Score: 2, Funny

      You can always rant about Duke Nukem Forever

    5. Re:Longhorn by Soul-Burn666 · · Score: 1

      The answer for "but can it run Duke Nukem Forever?" is always, by definition, "no". Because you need to have the software to be able to run it.

      --
      ^_^
    6. Re:Longhorn by Anonymous Coward · · Score: 0

      Hmmmmmmm, what you're really asking here is for some hot grits down your pants. And now I just have to figure out where does Ms. Natalie Portman fit in here. Of course she's naked and perified, and so is my ... DOH!!!

    7. Re:Longhorn by Anonymous Coward · · Score: 0

      Yeah, but can it run Longhorn? ... In Japan!

    8. Re:Longhorn by ttrafford · · Score: 1

      I for one welcome our new Longhorn overlords.

  4. In other news... by b1t+r0t · · Score: 4, Funny

    Intel's sales figures for Itanic^Hum CPUs more than doubled as a result.

    --

    --
    "Open source is good." - Steve Jobs
    "Open source is evil." - Microsoft
    1. Re:In other news... by levram2 · · Score: 5, Informative

      The limit for Windows Server 2003, Datacenter edition for 64 bit Itaniums is actually 64 processors and 512 GB RAM. http://www.microsoft.com/windowsserver2003/64bit/i pf/datacenter.mspx

    2. Re:In other news... by caluml · · Score: 5, Funny

      We don't care about your actual facts for Windows - here at Slashdot we have FUD, rumour, and downright persistence. I think you will find if you read up on it more closely that 2003 Datacentre can only support up to 2 CPUs, and 256Mb maximum.
      Please stop letting facts get in the way of a good MS bashing session.

      Minister for Dis-Information.

    3. Re:In other news... by Anonymous Coward · · Score: 0

      Get off it idiot. Clearly the GP was a joke and modded as such.

    4. Re:In other news... by HugeFatty · · Score: 0, Offtopic

      I wouldn't trust their data. I am just finishing up a project for a Pocket PC, which happens to run Windows CE. I use Microsoft Embedded Visual C++ as the compiler for the project, and at one point I was looking into using C++ style exceptions to make my code a bit cleaner. In the help file included with Embedded Visual C++, Microsoft claims that C++ syle exceptions are supported, and they give examples of how to do it. These claims are also available online here , and where they explain how to use C++ or structured style exception handling, they recommend using C++ style exceptions for portability. However, when I tried to compile my application, I first got the warning that I needed to compile with a flag to enable the stack unwinding semantics. So I did that, and tried to recompile, but this time it failed in the linking stage with an error like "error LNK2001: unresolved external symbol "const type_info::`vftable'" (??_7type_info@@6B@)". After hours and hours of trying to figure out if I had missed a command line option for the linker or forgot to #include something, I stumbled across an online discussion where someone else was having the same problems. Someone else responded with an answer, pointing to a Knowledge Base article, which said that C++ style exceptions are NOT supported in Embedded Visual C++, and that the functionality "is by design."

      As you may guess, this is very frustrating, to be told in all documentation that something is supported, and to have the compiler act like it's supported, but for it to not be supported. It also goes to show that you cannot trust those bastards over at Microsoft when they say that something is supported. Oh, yeah, and you may notice that the invalid documentation has been up since 2000, and the KB article was put up in 2003. This means that the invalid documentation has been up for 4 years now, and only a year ago do they admit that it's not supported (even though it's STILL in the documentation for the product that it IS supported).

      --


      I am clearly fatter than you.
    5. Re:In other news... by killjoe · · Score: 2, Insightful

      Oooh you told him! Way to stick up for MS! They need help from you. They can't counteract FUD by themselves with the billions they spend on advertising, astroturfing, financing lawsuits by SCO, and paying for ADTI studies. Thank god MS has people like to you run to their aid whenever somebody says something bad about windows.

      Still though the fact that linux can scale to 1024 processors while windows can only scale to 64 is enough reason to bash windows isn't it? I mean wasn't bill gates recently bashing linux because it was a "toy" and wouldn't scale?

      --
      evil is as evil does
    6. Re:In other news... by Anonymous Coward · · Score: 0

      1984 : Amiga vs Atari
      2004 : linux vs windows

    7. Re:In other news... by Tony · · Score: 1

      So it's only 1/16th the OS as Linux, instead of 1/32nd?

      That's reassuring.

      --
      Microsoft is to software what Budweiser is to beer.
    8. Re:In other news... by Anonymous Coward · · Score: 0

      classic slashdot moderation in action :D how long until this page gets preserved for all eternity again? :)

    9. Re:In other news... by killjoe · · Score: 1

      " 1984 : Amiga vs Atari"

      They both lost. windows won.

      "2004 : linux vs windows"

      they both lose? Apple wins?

      --
      evil is as evil does
  5. in time for.... by Prof.Phreak · · Score: 1, Funny

    DOOMIII :-)

    --

    "If anything can go wrong, it will." - Murphy

    1. Re:in time for.... by Anonymous Coward · · Score: 0

      Yep, 3TB sounds about right for the Minimum Memory Requirement ;)

    2. Re:in time for.... by Ari_Haviv · · Score: 2

      you should see the specs for longhorn's minimum install...

      --
      Join Team Mozilla #38050 Folding@home
  6. Solaris by MrWim · · Score: 3, Insightful

    It seems that if they pull this off one of the dtrongholds of solaris (namely massivly parralell computing) will have been conqurered by linux. I wonder how sun are feeling at the moment?

    1. Re:Solaris by 4lex · · Score: 1

      I wonder how sun are feeling at the moment?

      You might be sure that they either feel good or bad at the moment ;)

      --
      My journal. Mainly about freedom.
    2. Re:Solaris by justins · · Score: 4, Informative

      Solaris is not a leader in supercomputing, never has been.

      http://top500.org/list/2004/06/

      There's no "stronghold" for Sun to lose.

      --
      Now before I get modded down, I be to remind whoever might read this that what I am saying is FACT. - bogaboga
    3. Re:Solaris by mrm677 · · Score: 4, Interesting

      It seems that if they pull this off one of the dtrongholds of solaris (namely massivly parralell computing) will have been conqurered by linux. I wonder how sun are feeling at the moment?

      Solaris scales to hundreds of processors out-of-the-box. Until the vanilla Linux kernel accepts these changes and scale, Solaris still has a big edge in this area.

      Lame analogy: many people have demonstrated that they can hack their Honda Civic to outperform a Corvette, however I can walk into a dealership and purchase the latter which performs quite well without mods.

    4. Re:Solaris by Bishop · · Score: 1

      SGI has been playing in this NUMA market ever since they bought Cray about a decade ago. The T3 had a similar number of Alpha processors. The current Origin scales to 1024 MIPS processors. I believe both systems ran IRIX. The T3 may have used UNICOS. The point is the only thing new here is Linux on a 1024 processors. And even then SGI already has a 256 Itanium Linux system.

    5. Re:Solaris by kasperd · · Score: 5, Interesting

      Until the vanilla Linux kernel accepts these changes and scale, Solaris still has a big edge in this area.

      I wouldn't be surprised to see these changes in the 2.8 kernel. And what will people do until then I hear some people ask. I can tell you that right now it is very few people that actually have the need to scale to 1024 CPUs. And that will probably also be true by the time Linux 2.8.0 is released. AFAIK Linux 2.6 does scale well to 128 CPUs, but I don't have hardware to test it, neither does any of my friends. So I'd say there is no need for a rush to get this in mainstream, the few people that need this can patch their kernels. My guess is that in the time from now until 2.8.0 is released, we will see less than 1000 such machines worldwide.

      --

      Do you care about the security of your wireless mouse?
    6. Re:Solaris by Waffle+Iron · · Score: 3, Interesting
      Solaris scales to hundreds of processors out-of-the-box. Until the vanilla Linux kernel accepts these changes and scale, Solaris still has a big edge in this area.

      If someone buys one of these clusters from SGI, then it does scale "out of the box" as far as they're concerned.

    7. Re:Solaris by Nasarius · · Score: 3, Funny
      My guess is that in the time from now until 2.8.0 is released, we will see less than 1000 such machines worldwide.

      640 CPUs are enough for anyone? :)

      --
      LOAD "SIG",8,1
    8. Re:Solaris by isorox · · Score: 2, Interesting
      My guess is that in the time from now until 2.8.0 is released, we will see less than 1000 such machines worldwide.

      640 CPUs are enough for anyone? :)


      A better retort would be "There's a world market for maybe 5 computers" by the IBM dude.

      Claims are very difficult to make, and impossible to proove. However putting a time limit on a claim is easy. 2.8.0 will be released in 05 or 06, maybe we'll all have 1024CPU boxes in 20 years, but in 20 months?
    9. Re:Solaris by diegocgteleline.es · · Score: 1

      Linux _DOES_ run in those machines with vanilla kernels. I guess throw their own patches but there's no doubt vanilla kernel is good enought: http://marc.theaimsgroup.com/?l=linux-kernel&m=108 341362028320&w=2

    10. Re:Solaris by DarkOx · · Score: 1

      I am not sure that analogy makes since. We are already talking very custom computer hardware, here. That hardware no doubt required a great deal of engineering and for what that probably cost you are not going to waste that hardware running something generic designed to support similar but not exactly your system. You will probably pay some software engineers to develop a framework for supporting like hadware that you and others can benifet from in the future when you build the next supercomputer, and write more custome modular pices of OS level code to extract the max benifet of that custom hardware you just built.

      So really we are already talking about a major inhouse development type project here. Its not an important cost center in the investment at this point to bring the talent on board to work out these "hacks" so if they can take a chasis like Linux and make it out perform *NIX or whatever else then thats exactly what someone makeing such a large investment in supercomputing is going to do.

      --
      Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
    11. Re:Solaris by ameoba · · Score: 1

      SGI's had the Altix, which goes up to 256 procs & runs Linux, in their production lineup for a while now. This is a pretty big jump, but in realistic terms, if a 256-way machine isn't doing it, why is a 1024-way machine?

      Sun's biggest unit is only 100 procs.

      ...and they're not that fast.

      --
      my sig's at the bottom of the page.
    12. Re:Solaris by Sponge+Bath · · Score: 1
      I wouldn't be surprised to see these changes in the 2.8 kernel.

      You are correct. In the fullness of time, this kind of scalability will come to Linux.
      But the next kernel version is going to be 3.0 "Erect Mongoose".

    13. Re:Solaris by justins · · Score: 1
      Solaris scales to hundreds of processors out-of-the-box. Until the vanilla Linux kernel accepts these changes and scale, Solaris still has a big edge in this area.

      Lame analogy: many people have demonstrated that they can hack their Honda Civic to outperform a Corvette, however I can walk into a dealership and purchase the latter which performs quite well without mods.

      Call it a hunch, but I bet the people buying supercomputers are willing to pay for the "modded" Linux they need in order to run these machines. Maybe it's a stretch but I bet they might even be willing to pay for support!

      You can't possibly be talking about a sales edge, since Sun's sales in HPC are totally anemic.
      --
      Now before I get modded down, I be to remind whoever might read this that what I am saying is FACT. - bogaboga
    14. Re:Solaris by jedidiah · · Score: 1

      This is a total red herring.

      "Vanilla Linux" is completely irrelevant for systems of this kind. Do you even have any idea how much 12 and 32 cpu systems cost? If someone is going to shell out the money for a 1024 cpu machine then they are likely only going to run SGI's OS in it (whatever that may be).

      Your analogy also sucks.

      The comparable situation would be a Honda dealership selling "conversion civics" much in the same way that a Truck or Van dealership might sell greatly enhanced F150's or E450's.

      SGI is a value added reseller of Linux just as Sun is a value added reseller of BSD.

      No one ignores Sun just because you can't get a generic version of FreeBSD that supports NUMA and 100 cpus.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    15. Re:Solaris by timeOday · · Score: 2, Insightful
      A better retort would be "There's a world market for maybe 5 computers" by the IBM dude.
      I've heard that one. I think the guy was right! It was 1943 after all. Somehow we interpret this as, "There will only ever be a market for 5 computers, even if they change so completely that nothing is left of current technology and only the name stays the same."
    16. Re:Solaris by wwwillem · · Score: 1

      Before you come to the conclusion that this beast is really doing MPS computing and competes with Sun's F15k, let's first see how well it vertically scales using something like Oracle. Something like "N" transactions on a single CPU box, "1000 x N" (I won't bother about the last 24 :-) transactions on SGI's machine. We'll see...

      --
      Browsers shouldn't have a back button!! It's all about going forward...
    17. Re:Solaris by Chatz · · Score: 1

      Its not a cluster!
      Its not a cluster!
      Its not a cluster!

      Its a single system.

      --
      There is folly and foolishness on the one side, and daring and calculation on the other. - Admiral Pellew, Hornblower
    18. Re:Solaris by mrm677 · · Score: 1

      It matters because in 5 years, your desktop computer may have a microprocessor that is capable of running 16 threads. A higher-powered workstation will have 32+ threads.

      Linux 2.4.x definitely does not scale well past 4-8 (especially for workloads that are kernel or IO intensive). I'm not sure about 2.6.x.

    19. Re:Solaris by Anonymous Coward · · Score: 0

      I predict that in 50 years, there will still only be 1500 such machines worldwide, and that they will cost hundreds of millions of dollars each...Oh, crap...

    20. Re:Solaris by Sxooter · · Score: 1

      Solaris scales to hundreds of processors out-of-the-box.

      BZZZT! But thanks for playing. Solaris may well do better out of the box on hundreds of processors. However, any two installations using hundreds of processors are different enough that you'll have to do plenty of tuning on either kernel, Solaris or Linux, if you want them to run well.

      PostgreSQL needs fair amounts of shared memory, but being process driven will likely scale better under Linux.

      Java virtual machines, being threaded, will likely scale better on Solaris than on Linux.

      They're different tasks that can use hundreds of CPUs for high load, but they each tax the OS differently, and therefore, neither one can scale properly "out of the box" beause there's simply no such thing.

      Solaris may be a little better tested in this space, and a little simpler to configure, but it's not some magic solution you just drop in place and watch it work either.

      --

      --- It is not the things we do which we regret the most, but the things which we don't do.
    21. Re:Solaris by mrm677 · · Score: 1

      PostgreSQL needs fair amounts of shared memory, but being process driven will likely scale better under Linux.

      You are confusing application and OS scalability. Vanilla linux will croak on 64 processors. The scheduler and mutex can't handle the amount of real-time clock interrupts and other reasons for invoking the OS. Process A wants to spawn a kernel thread? Ah schucks, I have to wait for Process C-Z to serialize on the spinlock protecting the task structures.

    22. Re:Solaris by SuperQ · · Score: 1

      Actualy, I run a 16-way IA-64 machine at work, I purchased an EDU license for RHEL 3, and put the CD in the box and did the install. No custom modifications, no compiling.

      We also have some SGI altix stuff.. it's very nice.

    23. Re:Solaris by Citizen+of+Earth · · Score: 1

      Solaris scales to hundreds of processors out-of-the-box. Until the vanilla Linux kernel accepts these changes and scale, Solaris still has a big edge in this area.

      Well, if I had just plunked down the cash for a 1024-processor box, I'd probably buy a Linux that is a little better tuned for that environment than, say, Fedora Core 2. If SGI can tune a Linux distribution for this environment, Sun will be even deader than it was yesterday.

      BTW, do purists still insist on calling Linux distributions GNU/Linux now that they include applications like Open Office which probably include more lines of code than all of the GNU utilities put together? We should call it Sun/GNU/Linux. This has the advantage of being more ironic, since it emphasizes that Sun has made a major charitable contribution to its own demise. (Not that Open Office would had gotten much traction if Sun hadn't done this.)

    24. Re:Solaris by 4of12 · · Score: 1

      Claims are very difficult to make, and impossible to proove. However putting a time limit on a claim is easy. 2.8.0 will be released in 05 or 06

      Oops.

      Don't you mean that "2.8.0 will be released in the future, when it's ready, later than originally expected?

      --
      "Provided by the management for your protection."
    25. Re:Solaris by Sxooter · · Score: 1

      No, I understand that kernel and process scalaing are inexorably linked at the hip. A kernel with no processes running on it isn't going to accomplish much,

      --

      --- It is not the things we do which we regret the most, but the things which we don't do.
  7. Re:Will it be done in time for Quake 3? by Gleng · · Score: 2, Funny
    Will it be done in time for Quake 3?

    Hmm, quite possibly.

    --
    "Proudly Posting Without Reading The Article"
  8. What happened to RISC? by d0d0 · · Score: 1

    what happend to RISC...

    1. Re:What happened to RISC? by Anonymous Coward · · Score: 0

      All the companies shy away from and want to minimize it...

    2. Re:What happened to RISC? by CableModemSniper · · Score: 4, Funny

      They decided it was too RISCy maybe?

      --
      Why not fork?
    3. Re:What happened to RISC? by DAldredge · · Score: 3, Informative

      AMD and Intel happened. What do you think is running your computer right now (assuming it's an x86)? It a RISC chip that has x86 translater attached, the core of the chip is RISC.

    4. Re:What happened to RISC? by Jeff+DeMaagd · · Score: 2, Informative

      Well, this system is neither RISC nor CISC. Itaniums are VLIW. IIRC, it too does have an x86 translator somewhere, but they work far better with native code.

    5. Re:What happened to RISC? by DAldredge · · Score: 2, Informative

      True. There are at least two different x86 emulators available. There is the HW one that is built in and the newer and faster IA-32 Execution Layer (currently only available for windows).

    6. Re:What happened to RISC? by Epistax · · Score: 4, Interesting

      RISC and CISC offer no final advantage over the other, so the one that dominated is the one that was here first.

      Quick examples: RISC use less power because it has less logic? No, it needs to run at a higher frequency to maintain the same speed as a slower CISC.
      RISC is easier to program? Depends on the person. A compiler can take advantage of large instructions very well which are hardware optimized.
      RISC easier to develop/manage? I'll say yes for RISC on this one. There's simply less logic on the chip so less logical errors possible. There's plenty more cache which can break but broken parts can be fused off.
      RISC is physically smaller? No. RISC needs a higher clock frequency because many more instructions need to be executed. The result of this is that a much larger instruction cache is needed on chip.

      I don't remember every comparison but it pretty much comes out that neither is better than the other. That being said RISC is better than x86. Everything is better than x86. However CISC vs RISC is much harder to judge. Having done x86, 68k, and MIPS I must say that RISC is a pleasure.

    7. Re:What happened to RISC? by baywulf · · Score: 1

      RISC is still used in new processors these days. The key attributes of RISC processors are a fixed instruction size, few addressing modes and one clock execution. You will see this one most architectures now. Actually the advantages of RISC and CISC have mixed in an concept called superscalar. With superscalar there are many execution units and a instruction scheduler will dispatch instructions in an out-of-order manner if necessary to give better performance while handling dependencies and resource constraints. With the method, it is possible for the instruction scheduler to translate CISC style instructions into RISC style micro operations before processing.

    8. Re:What happened to RISC? by myg · · Score: 1
      Itanium 2 is certainly not x86. It has an x86 emulator but its native instruction set is more like RISC than CISC.

      The Itanium 2 is based on what Intel calls the EPIC architecture: Explicit Parallel Instruction Computing. Basically the CPU fetches 128-bit 3-instruction bundles. The instructions themselvs are somewhat simple and all three are executed simultaneously by parallel execution units.

      Like some RISC architectures the Itanium 2 instruction set includes predication, register windows (lots of them -- not necessarily the best thing for context switching though).

      All in all, its not a bad architecture at all.

    9. Re:What happened to RISC? by Anonymous Coward · · Score: 1, Interesting

      Once again people are missing the fundamental nature of RISC programming vs. CISC programming. RISC architectures are very much "load-store" machines, where you load data into its registers, operate on it in fairly complex ways, and then store the results. With CISC chips, your operations tend to take the form of modifying or fetching operands or results that are in memory.

      The fears of RISC instruction bloat are unfounded: the instructions are going to be in L1 i-cache 99% of the time, and won't slow anything down.

      What shorter/simpler instructions enable is much smaller pipelines. My G4 does a fused mulitply-ad op in 7 stages, a P4 does it in 2 passes through a 20 stage pipeline (40 cycles, since the result of the mult isn't availible until the end.) The P4 pipeline has to fetch operands from somewhere on the stack and write them back. This means CISC cpu's are more prone to memory-bottleknecking in worst-case scenarios (of course, in most cases, the working data set for both archs will be in L1.)

      In conclusion, CISC vs. RISC is EASY to tell apart: if its operating on data in registers and memory simultaneously, its CISC. If its loading the working data into an expansive register set, operating on it locally, and then storing it back, its RISC.

    10. Re:What happened to RISC? by Anonymous Coward · · Score: 0

      Point of fact...RISC chips consistently do use less power, are physically smaller, and have higher performance per clock (many more instructions are executed simultaneously).

    11. Re:What happened to RISC? by shaitand · · Score: 1

      "RISC is physically smaller? No. RISC needs a higher clock frequency because many more instructions need to be executed."

      In practice, RISC chips have always been smaller, run cooler, used less power, and been faster at the same clockspeed compared to CISC chips?

      Since a RISC chip executes numerous instructions simultaneously and can even perform out of order execution of instructions it eliminates that advantage of CISC chips.

      The instructions also all execute in a single clock tick.

    12. Re:What happened to RISC? by ArbitraryConstant · · Score: 2, Informative

      "Quick examples: RISC use less power because it has less logic? No, it needs to run at a higher frequency to maintain the same speed as a slower CISC."

      No. This is exactly wrong. G5s are a good example of this. They easily outperform P4s at the same clock speed, and it's the P4 which must run at the higher speed to compensate.

      The overhead of supporting all the various instructions and adressing modes, as well as being able to fit the whole CPU in one die were what made RISC a good choice in the past. Now, that overhead is dwarfed by other parts of the chip, and they're all running weird u-ops internally, so it makes little difference.

      "RISC is easier to program? Depends on the person. A compiler can take advantage of large instructions very well which are hardware optimized."

      Compilers are notorious for not utilizing esoteric opcodes. And when they do, there's almost never a significant performance advantage in doing so.

      For example, none of the code I've ever tested with icc (one of the only compilers that can use weird opcodes on i386) has been more than about 5% faster than "gcc -Os -msse2", and a lot of it has been slower.

      "RISC is physically smaller? No. RISC needs a higher clock frequency because many more instructions need to be executed. The result of this is that a much larger instruction cache is needed on chip.

      RISC does generally need a larger cache, but it does not need a higher frequency.

      "I don't remember every comparison but it pretty much comes out that neither is better than the other. That being said RISC is better than x86. Everything is better than x86. However CISC vs RISC is much harder to judge. Having done x86, 68k, and MIPS I must say that RISC is a pleasure."

      Just use a compiler. Anything with a proper MMU will be good enough.

      --
      I rarely criticize things I don't care about.
    13. Re:What happened to RISC? by Epistax · · Score: 1

      CISC chips run instructions simultaneously and out of order as well. Instructions executing on a single clock depends on the instruction. Multiply? Divide? Load? Store? Don't think so.

      RISC chips being faster than a CISC chip running at the same clock? Sorry, but no. That's completely unrealistic statement.

    14. Re:What happened to RISC? by Epistax · · Score: 1

      I just wanted to point out you mentioned GCC. Sadly GCC is about the worst compiler in existence for performance. The top compiler is infact the Intel compiler in part because it knows about unpublished instructions. Have fun reading the code it generates.
      On the subject of G5s being faster, there are a whole host of differences between G5's and P4's. You can't just pick one difference and claim that's the reason.

      I do agree with one point you made. Compilers don't use every opcode available. In fact a study was done around the time RISC was getting popular that stated something like 20% of the x86 opcodes have never been used. That's pretty mind numbing.

    15. Re:What happened to RISC? by ArbitraryConstant · · Score: 2, Informative

      "I just wanted to point out you mentioned GCC. Sadly GCC is about the worst compiler in existence for performance."

      That was my point. A shitty compiler with moderate optimization settings is very close in performance to one of the top compilers out there.

      "The top compiler is infact the Intel compiler in part because it knows about unpublished instructions. Have fun reading the code it generates."

      Yes, this was the example I used. The vectorized loops are a bitch to read.

      "On the subject of G5s being faster, there are a whole host of differences between G5's and P4's. You can't just pick one difference and claim that's the reason."

      That's true. However, I never gave a reason for the performance difference, so I'm not sure why you're saying this.

      You said that RISC CPUs needed to run at a higher frequency to get the same performance as a CISC CPU. Since you're wrong, I gave an example to prove you wrong.

      There is basically only one RISC CPU architechture that has the benefit of a really large R&D effort these days, and that's POWER/PowerPC. Itanium is not strictly RISC, and nothing else has the benefit of such a huge R&D effort.

      Thus, the only RISC CPUs that can be fairly compared to x86 are the POWER/PowerPC chips from IBM. The only two x86 CPUs that have a really huge R&D effort behind them are the Athlons from AMD and the Pentiums from Intel.

      They all have relatively similar performance (with advantages going to one or the other in a few niches). PowerPC chips are shipped at similar clock speeds to the Athlons and much lower clock speeds than the Pentiums.

      Therefore, your statement that RISC CPUs need higher clock speeds to get the same performance has been demonstrated to be false in a comparison between the only 3 large chip makers in operation.

      Further comparisons, such as those between Sparc and the VIA C3, which are smaller but significant efforts, show the RISC CPU getting more done per clock cycle, again demonstrating your statement to be false.

      --
      I rarely criticize things I don't care about.
    16. Re:What happened to RISC? by Anonymous Coward · · Score: 0

      ever heard of CPU's running instructions in more than one cycle ? a risc can *of* *course* be faster than a cisc at the same clock speed. just imagine a RISC doing everything in 1 cycle, and a CISC doing everything in 10 cycles. then rewrite our statement.

    17. Re:What happened to RISC? by Anonymous Coward · · Score: 0

      This is incorrect, but CISC and RISC have no real meanings anyway. These term are often mis-used to describe architectures when they really describe Instruction Sets.

      All x86 chips beyond the 8086 used a microcode engine to execute the more complex x86 instructions. Many people consider microcode to be the sign of a CISC device when in fact microcode does exactly what you describe above. The P4 and equivelent AMD and IBM devices also introduce another layer of translation, the micro-op. This layer would be considered 'RISC' because it also does exactly what you describe, just not down to the level microcode does. It's almost become like a network stack.
      In the 1970s, many computers could execute arbitrary instruction sets simply by reprogramming the micro-code. The concept was created by Maurice Wilkes, you can read about at the wiki entry for Microcode. Instead of a normal assembler, one had to use a meta-assembler where you defined your instructions in one file and then the actual assembly code in another.

    18. Re:What happened to RISC? by ArbitraryConstant · · Score: 1

      "CISC chips run instructions simultaneously and out of order as well."

      RISC chips can do this as well. In fact, they're generally better at it because they have more registers, and can therefore have more instructions without dependencies between them. For example, the PowerPC 970 can execute up to 5 instructions in one clock cycle if conditions are good. This is more than any CISC chip can do.

      "RISC chips being faster than a CISC chip running at the same clock? Sorry, but no. That's completely unrealistic statement."

      Not only is it possible, there are shipping processors that do exactly what you're saying is unrealistic.

      CISC processors break things up into one or more internal RISC-like micro-ops. Instructions that require more than one micro-op (an add instruction that stores the results to memory for example) will require multiple internal ops. Exactly like a RISC chip does things, the RISC chip just doesn't hide what it's doing. This is why they can have similar performance. Or in the case of the P4, which is designed to increase the clock speed for marketting purposes, the RISC processor can have better performance per clock cycle.

      You're completely wrong. Please shut up, read some of the papers on Ars Technica, and stop being wrong.

      --
      I rarely criticize things I don't care about.
  9. Why gaming? by wyldwyrm · · Score: 2, Funny

    Obviously this would be overkill for doom3(altho I'd still like to have it in my apartment as a space heater/server)! Ok, so it would be more than a space heater; I'd have to run my a/c 24/7/365.25, with all my windows open in the winter. But rendering would be sooooo sweet.

    1. Re:Why gaming? by Ari_Haviv · · Score: 2, Informative

      think real-time radiosity

      --
      Join Team Mozilla #38050 Folding@home
    2. Re:Why gaming? by Anonymous Coward · · Score: 0

      24/7/365.25

      Nothing personal but this has always bothered me...

      24 hours a day. Ok.
      7 days a week. Ok.
      365.. weeks a 7 year? 365 weeks a half fortyear? What the hell.

    3. Re:Why gaming? by blane.bramble · · Score: 1

      24/7/365

      It's used by marketing-types who don't understand that 24/7 already means every day of the year.

    4. Re:Why gaming? by Anonymous Coward · · Score: 0

      Well, logically it could be reduced to 24. Even 1 (as in second per second)

    5. Re:Why gaming? by Rob+Riggs · · Score: 1
      Obviously this would be overkill for doom3

      In about the same way that a Boeing 747 is overkill as a suburban/city commuting vehicle.

      --
      the growth in cynicism and rebellion has not been without cause
    6. Re:Why gaming? by Anonymous Coward · · Score: 0

      Should be 24/7/52, eh?

    7. Re:Why gaming? by EvilTwinSkippy · · Score: 1
      24/7/365

      It's used by marketing-types who don't understand that 24/7 already means every day of the year.

      The same marketdriods that need to be reminded that free is always 100%.

      BTW, it's 365.2462

      --
      "Learning is not compulsory... neither is survival."
      --Dr.W.Edwards Deming
    8. Re:Why gaming? by Beale · · Score: 1

      Great for car-pooling though.

    9. Re:Why gaming? by shaitand · · Score: 1

      wrong pattern it goes like this

      24hrs/day
      7days/week
      365days/yr
      3652days/deca de

      We just usually aren't willing to commit to the decade thing.

    10. Re:Why gaming? by kannibal_klown · · Score: 1

      It depends....

      24/7 means all day, every day of the week. BUT... does it mean holidays? Thanksgiving, Martin Luther King Day, Christmas / Hanukah / Quanza / etc? Tech support might be offered 23/7, but the employees might take off for the holidays.

      23/7/365 is usually redundant. But, I'd hope if a person says this that they really do mean that it's up / operational / available all-year-round, even if I call on a government-sanctioned holiday.

  10. Press Release by foobsr · · Score: 3, Informative

    The link to the press release as of July 14.

    CC.

    --
    TaijiQuan (Huang, 5 loosenings)
  11. Re:really fast? by jhunsake · · Score: 4, Funny

    so does this mean KDE and Openoffice will finally run at decent speed?

    No, you're going to need quantum computing for that.

  12. The big question is... by mangu · · Score: 4, Funny

    ...how easy it is to install printer and sound drivers?

    1. Re:The big question is... by carlmenezes · · Score: 4, Funny

      Well on Windows you'd get a message saying...

      "Windows has detected 1024 new sound cards and is installing them..."

      and then the inevitable..

      "Windows needs to restart your computer. Click OK to restart"

      and then on system restart ...

      1024 sound control apps in the system tray! =)

      --
      Find a job you like and you will never work a day in your life.
    2. Re:The big question is... by Anonymous Coward · · Score: 0

      Actually, Windows would detect one sound card and tell you to reboot your computer, then repeat that 1023 more times.

    3. Re:The big question is... by Sangui5 · · Score: 1

      No, you get the message

      "Windows has detected a new sound card and is installing it"

      followed by

      "Windows needs to restart your computer. Click OK to restart".

      in a cycle of 1024 reboots.

    4. Re:The big question is... by Anonymous Coward · · Score: 0

      You forgot that Windows would find one sound card per boot...

  13. In other news... by k4_pacific · · Score: 4, Funny

    Microsoft made a statement today reminding everyone that Windows Server 2003 can handle as many as 32 processors, at the same time even.

    When shown the report about Linux running on 1024 processors, Gates purportedly responded, "32 processors ought to be enough for anybody."

    --
    Unknown host pong.
  14. Re:Will it be done in time for Quake 3? by Ari_Haviv · · Score: 1

    you mean Doom 3, right?

    --
    Join Team Mozilla #38050 Folding@home
  15. Re:really fast? by iggymanz · · Score: 3, Funny

    yes, according to the project leader "on this supercomputer, OpenOffice will finally *run* at decent speed, but waiting for the JVM to start up will still be a bitch" As for KDE, he stated "we're still waiting for the qt toolkit to initialize, but we're confident we can be fully logged in before August"

  16. wow... by jms258 · · Score: 0, Redundant

    can you imagine a beowulf cluster of these ?

  17. Clusters... by avalys · · Score: 0, Redundant

    I was going to ask why not just build a cluster instead. Then, the Slashdot herd mentality took hold, and this thought crossed my mind:

    "My god, imagine a Beowulf cluster of those!"

    --
    This space intentionally left blank.
    1. Re:Clusters... by ogl_codemonkey · · Score: 0

      Mmm... a beowulf of Slashdot herd mentalities...

      'I could blow up the whole damn world with that thing.' - RvB

  18. Re:really fast? by darkjedi521 · · Score: 2, Funny

    They said Itanium cluster, not VAX cluster!

  19. Sun != scientific computing by vlad_petric · · Score: 4, Informative
    Sun processors execute server workloads (database, app server) very well, but that's pretty much it. The emphasis with such workloads is on the memory system. Boatloads of caches do the job. It's also more effective to have tons of processors that are very simple, than just a couple of them that are complex and powerful.

    Scientific computing means data crunching (floating point). Complex, powerful processors are needed. The "stupider, but more" tradeoff doesn't work anymore. Sun processors have fallen behind in this respect.

    --

    The Raven

  20. It became obsolete by mangu · · Score: 3, Informative

    RISC stands for "reduced instruction set computer". It made sense in the 1980's when the "CISC", complex instruction set computers, took tens or hundreds of clock cycles to execute some instructions. With RISC one had less instructions, but each instruction executed in less clock cycles, resulting in a faster computer. Today, CPU's with full-size instruction sets execute most of them as fast as a RISC CPU does, so there is no need to limit the instruction set anymore. Even such complex instructions as multyplying double-precision floating point numbers are executed in a single clock cycle in a Pentium 4.

    1. Re:It became obsolete by Johan+Veenstra · · Score: 4, Informative

      Actually RISC is a bad name for what it stand for, it should have been SISC (Simplified Instruction Set Computer), since the key difference between the two are the complexity of the instructions and not the quantity.

      A CISC instruction could do things like: take the value in register BP, add 4, get the value from the memory at the address you just computed, add the value in the register AX, and put the result back at the same memory location. Execution would take several clock-ticks.

      To do the same in RISC, you would need several instructions (add 4, get from memory, add ax, store to memory). The execution of the individual instructions would take one tick each, so the sequence would take several. But on average RISC was a bit faster.

      CISC was invented in a time that the memory was small, in the CISC way you could store larger programs in the same amount of memory.

      RISC was invented when memory-size was not limited anymore, and looked to displace CISC in the long run.

      CISC was still around when the memory bandwidth became a limiting factor. And since fewer instructions needed to be fetched from memory, more bandwidth was left for other data traffic. RISC lost some of it's speed advantage.

      Modern CISC processors, get CISC instructions from memory, chop them up in smaller instructions, and executes those smaller instructions really fast. So in fact they can be seen as RISC processors, posing as CISC processors, ie the best of both worlds.

      So CISC is a way of compressing RISC instructions, so they take up less memory/bandwidth.

    2. Re:It became obsolete by AKAImBatman · · Score: 3, Informative

      With RISC one had less instructions, but each instruction executed in less clock cycles, resulting in a faster computer.

      Technically, RISC chips were supposed to execute all instructions in ONE cycle. This simplified the chip architecture, allowing it to scale up much farther. The downside was that it put the onus on the compiler writer to produce efficient code. (MIPS is a perfect example of this architecture.) All he had to do was make sure that fewer instructions were executed per task, and the code would run faster.

      That is, until the chip designers started introducing SuperScaler and Out of Order execution. You see, simplifying the chip design provided chip designers with a way to add new optimizations in how instructions were loaded and executed. Unfortunately, this again meant more work for the compiler writer. Now he not only had to optimize the number of instructions, but he also had to optimize the ordering so that multiple instructions could be executed simultaneously or out of order.

    3. Re:It became obsolete by Anonymous Coward · · Score: 0

      It made sense in the 1980's? Really? Please name any RISC cpu that was faster than what intel had at the time. Extra points for pricing information.

    4. Re:It became obsolete by Anonymous Coward · · Score: 0

      This is why we need a -1 Wrong modifier.

      Modern x86 cpus essentially decode the CISC instruction set they use in micro ops that all essentially take the same amount of time to complete (hmm, what does that remind you of?). They do this because it's far easier to design a fast processor when each instruction essentially looks of the same in term of time to complete. RISC was a good idea. The reason that people think that it sucks is because all the money went into x86, which now a days is a RISC cpu masquerading a s a CISC cpu.

      Oh and by the way, there are tons of x86 instructions for the p4 that take more than 1 cycle to complete.

    5. Re:It became obsolete by Anonymous Coward · · Score: 0
    6. Re:It became obsolete by Anonymous Coward · · Score: 0
      Quick question: is there some strong relation between RISC vs. CISC and load-store vs. direct memory access? RISC machines I've worked on (like Sparc) all seem to be load-store architectures, whereas CISC machines (like 386) have nutty addressing modes.

      I don't know where PDP-11 fits into all of this. The instruction set is very simple/regular, but the addressing modes are the craziest I've ever seen.

    7. Re:It became obsolete by Anonymous Coward · · Score: 0

      PDP-11 is considered a CISC processor.

    8. Re:It became obsolete by Anonymous Coward · · Score: 0

      Your wrong about the P4 FPU. It's worse than the PIII. The P4's saving grace is SIMD. http://www.aceshardware.com/Spades/read.php?articl e_id=15000198

    9. Re:It became obsolete by Anonymous Coward · · Score: 0

      No, actually the idea that RISC is supposed to have only one instruction per clock cycle is a myth. I don't know where it got started, but I've heard it repeated over and over again (perhaps it even pops up in literature). But the real definition really doesn't have any hard and fast rules like that. Read David. A Patterrson and John L. Hennessy's book Computer Organziation & Design. It's relvant because Patterson led the design & implementation of RISC I, and Hennessy developed MIPS. The key point to realize about RISC architecture is that it is based on the idea that a very small set of machine or assembly instructions are executed the majority of the time . So the idea is that you Reduce your Instruction Set for the chip your designing, and optimize for that reduced set (often this means getting each instruction to execute in one cycle, but not necessarily). It's more of a architectural design philosophy then some simple rule like "one cycle per instruction". Or it would be called OCPI or something, not RISC.

    10. Re:It became obsolete by Anonymous Coward · · Score: 0
      In the UK at around 1987, Acorn (who previously made the Atom and BBC) came out with the Archimedes - A home-computer with an Acorn ARM 32-bit RISC CPU.

      It was way ahead of it's time, and as a result, cost about twice as much as an Amiga. The high price, the limited software-availability, combined with the fact that it hardly left the UK meant it did not do very well.

      I don't know what Intel was up to at the time. I think the 80386 was the leading CPU of the time. I would suspect that any 80386 machine clocked high enough to out-pace an Archimedes (if that was possible back then) must have cost a fortune.

    11. Re:It became obsolete by Ernesto+Alvarez · · Score: 1

      Quick question: is there some strong relation between RISC vs. CISC and load-store vs. direct memory access? RISC machines I've worked on (like Sparc) all seem to be load-store architectures, whereas CISC machines (like 386) have nutty addressing modes.


      Yes. the idea behind RISC machines is that they are supposed to be simple hardwired things that run most (if not all) of the instructions in a single CPU cicle.

      As such, they are not able to run such "nutty" addressing modes. For example, a CISC machine might doing and indirect+index memory access would be running some microcode inside the CPU that tells it to fetch the pointer, add the index and then fetch the value in the resulting position. That would take a lot of cicles and two memory access, it is something complex (microcode) and cannot be done in a cicle (two memory accesses).

      In a RISC CPU, the idea was to make simpler instructions, and move the complexity to the compiler. If you wanted to execute the same thing on a RISC, you would have to execute multiple instructions. The idea is that a compiler might do something better than that, and that those instructions run REALLY fast.

      So, the answer is "nutty" modes = complexity != RISC.

      Check Andy Tanembaum's "Structured computer organization" for more info.
  21. Sun does more than that by puppetluva · · Score: 4, Insightful

    Sun hardware has additional, wonderful resiliency features like - allowing cpu's to "fail-over" to other cpus in case of failure. The same holds true for memory, network interfaces, etc. Solaris is aware of these hardware features and can "map out" the bad memory and cpus on the fly (or allow swap-in replacements). The engineers can then replace the broken cpus/memory/interfaces WITHOUT BRINGING THE MACHINE DOWN. This lends itself to an environment than can enjoy nearly 100% uptime. Finally, since Sun has been doing the "lots of cpus" thing for many years, their process management and scalability tends to be much better.

    I don't work for Sun, I'm just an SA that deals with both Solaris and Linux boxes. You don't pick sun for just "lots of cpus", you pick it for a very scalable OS and amazing hardware that allows for a very, very solid datacenter. If downtime costs a lot (ie. you lose a lot of money for being down), you should have Sun and/or IBM zseries hardware. Unfortunately those features cost a lot and most times you can use Linux clustering instead for a fraction of the cost and a high percentage of the availability.

    1. Re:Sun does more than that by tlk+nnr · · Score: 1
      The same holds true for memory, network interfaces, etc. Solaris is aware of these hardware features and can "map out" the bad memory and cpus on the fly (or allow swap-in replacements). The engineers can then replace the broken cpus/memory/interfaces WITHOUT BRINGING THE MACHINE DOWN.
      Does that happen in real life?
      Hot swapping components sounds great, but what if the screwdriver slips out of the finger of the engineer and causes a short?
      Who has seen that a memory chip or a cpu was hot-swapped in a production server? I wouldn't be surprised if this never happened outside of the hw vendors labs or for a cool demonstration of the new toy.
    2. Re:Sun does more than that by superpulpsicle · · Score: 1

      While Sun stuff is good. Altix xvm and cxfs blows away the entire line of Sun foundation suite of leadville stack, solstice disksuite. Not to mention sun cluster is completely overrated. They have to rely on veritas cluster to pull them through. If people really follow up with SGI, they're clearly on the rebound in the hi-end market. But even if they beat Sun, they still won't beat IBM in this sector.

    3. Re:Sun does more than that by Jeff+DeMaagd · · Score: 3, Interesting

      Hot swapping components sounds great, but what if the screwdriver slips out of the finger of the engineer and causes a short?

      The systems I've seen that have hot-swap PCI cards have plastic partitions between the slots to prevent the cards from touching each other when hot swapping them.

      I'm not sure why the hypothetical screwdriver in such a tech's hands. Many systems have non-screw means of retaining memory, PCI cards, CPUs and such.

    4. Re:Sun does more than that by Anonymous Coward · · Score: 0

      Mod parent up. I've been down for years and I know how it can affect your bottom line.

      There's a hack for linux somewhere that'll allow you to use bad memory - I dunno if it works on the fly.

    5. Re:Sun does more than that by ddmau · · Score: 1

      Ahh...the SGI Numaflex hardware/software does the same thing (and has for years) i.e., map out bad memory/cpus on-the-fly, never shuts down,etc..etc. (I worked for them for 16+ years).... The 1000+ CPU ORIGIN syetems at Los Alomos (and other places) have been up and running for over three years. If a problem comes up, system migrates process off of that node, flags the engineer, he replaces parts/modules, fires the module back up, system re-integrades that module, and away it goes....system never shuts down.
      It will be interesting to see if they will be able to do this with Linux, but I don't see why not. I have been "away" from SGI for over a year, but I was always impressed with the NUMAFLEX archetecture....Fun to watch it finally happening with Linux.

    6. Re:Sun does more than that by ddmau · · Score: 2, Interesting

      Yes... I was an engineer for SGI for over 16 years (laid off about a year ago)....I have hot-swapped modules on a running system many many times without problems. With the Numaflex archetechure, you have "modules" that house a seperate set of CPUs, memmory, power supply, etc. You shut the offending module down (after the OS has migrated all process's off of it-on the fly). After parts are replacesd, you run a diagnostic off of your laptop/terminal, and bring the module back into the system (OS "sees" the change on the fly and re-integrates the module). It works extreamly well.

    7. Re:Sun does more than that by BitchKapoor · · Score: 1

      When I worked with those SGI machines at Los Alamos a while back (about 5 years ago), they crashed quite regularly because they were SGI's largest site, and hence running an alpha version of the OS in order to make use of all the features. So the situation has improved?

    8. Re:Sun does more than that by ddmau · · Score: 1

      Yes, they went to the current "flavor" of NUMAFLEX architecture about three years ago. The systems you referred to were probably the original ORIGIN Servers, and you are right, they were a bit of a bear to keep running back then (Many days of lost sleep) . The newer architecture is much more modular. Los Alamos has quite a few different large SGI systems, so the ones you are familiar with are most likely not even close to the current Altix boxes the article refers too. When I left SGI about a year ago, the new NUMAFLEX systems (including Altix) were very stable, and IRIX was well beyond the "Beta" stage (you are absolutely correct...Irix was not ready for primetime about five years ago...but SGI was always at the "bleeding edge"...and so were most of our customer base).

    9. Re:Sun does more than that by BitchKapoor · · Score: 1

      Oh, cool. Yeah, I think they were O2ks at that time.

    10. Re:Sun does more than that by jedidiah · · Score: 1

      This is because a Sun NUMA box IS JUST A GLORIFIED CLUSTER. It's a collection of discrete modules that qualify as separate computers in their own right. This is the "magic" involved. This is also not anything that SGI doesn't already have.

      SUN BOUGHT IT'S NUMA TECH FROM SGI IN THE FIRST PLACE.

      It would be naive to expect SGI to be lax in this area. This goes for the underlying hardware tech as well as the OS level continuity services.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    11. Re:Sun does more than that by Anonymous Coward · · Score: 0

      Servers such as the Sun Fire 3800 - F15K use UniBoards (CPU Boards with up to 4 CPUs) and I/O Boards (with 6 cPCI or 8 PCI slots).

      You remove the UniBoard or I/O from the domain, it goes into 'hotswap' mode, you walk up to the machine, undo 2 plastic locks and remove the board.

      At no time during the removal and swap out do you have a screwdriver in contact with live components ...

      The hotswap cPCI cards have plastic partitions, plastic guide rails and plastic locks .. my dog could swap them out.

    12. Re:Sun does more than that by timeOday · · Score: 1
      LSun hardware has additional, wonderful resiliency features like - allowing cpu's to "fail-over" to other cpus in case of failure. The same holds true for memory, network interfaces, etc. Solaris is aware of these hardware features and can "map out" the bad memory and cpus on the fly (or allow swap-in replacements).
      Well that's exactly the great thing about linux being chosen for this kind of system - it's a sign of linux gaining these features too!

      If we thought they were just going to install RedHat on this monster, why would we even care, except for bragging rights? No, the great thing about big Linux adoptions is they lead to infusions of new technology.

    13. Re:Sun does more than that by Decaff · · Score: 1

      SUN BOUGHT IT'S NUMA TECH FROM SGI IN THE FIRST PLACE.

      Interesting. I thought Sun always used pure SMP, not NUMA. Do you have a URL to back up this statement?

    14. Re:Sun does more than that by Decaff · · Score: 1

      but I was always impressed with the NUMAFLEX archetecture....

      I was under the impression that NUMA-type architectures were actually cheap compromises, providing massive multiprocessing for systems that could not really cope with fully symmetric CPU access to memory. Have things changed, or am I wrong? I'd be interested to know, as I have not done any development on SMP-type machines for years.

    15. Re:Sun does more than that by ddmau · · Score: 1

      Yes, the original ccNUMA architecture was a bit of a kluge... but SGI's implementation is now at its third generation, and is a very elegant design. Here's a link to a whitepaper on the Altix 3000 (Linux based) system that may be more information than you need, but if you take a quick look at the diagrams, it gives you a good understanding of the single shared memory concept they are using now: http://www.sgi.com/pdfs/3474.pdf There are a number of other good sources of information about SGI's implementation of SMP on their web-site (www.sgi.com) , if you want to get more detailed information. (I no longer work for SGI, but I still follow their technology, Great stuff, IMHO.

    16. Re:Sun does more than that by sparkz · · Score: 1
      I saw a Sun E10K on Friday - the roof had leaked water into it... remarkably, it kept running.

      That is a remarkably lucky exception, but in the more likely scenario that CPUs and IO boards had died, DR would still have kept the domains alive and (to the users' eyes) "working" despite what most people would have called a castrastrophic failure.

      --
      Author, Shell Scripting : Expert Re
    17. Re:Sun does more than that by dsouth · · Score: 2, Interesting
      I don't have a URL, but was involved in several HPC procurements at the time (and knew some insiders at SGI and Cray). The poster is basically correct. The sequence of events was:
      • SGI (at the time still called Silicon Graphics Inc) purchased Cray Research.
      • Well before the purchase, Cray had a hand in developing and marketing Suns larger machine, the "Super Dragon", sold by Sun as the 64SC, and referred to within by Cray as the 64CS -- I'm probably messing up the number, but I do recall the difference in the letter ordering. :-)
      • Prior to the purchase, Cray had completed the design for a new shared memory system based on a high speed switch and single image OS.
      • Prior to the purchase SGI had already completed the design of the first NUMAflex systems, the Origin2000 and Onyx2.
      • So after the purchase, the new merged Cray/SGI had two large SMP/NUMA systems, the Origin line and the Cray developed line. Since they didn't need two, they sold the Cray design to Sun, where it was marketed as the E10000. They also called they NUMA fabric on the Origin2K "CrayLink" even though Cray had little or nothing to do with its design.
      • For a few years afterwards, there were a few within Cray CF (Chippewa Falls) that were somewhat bitter about SGI's decision to pawn off the E10000 design, pointing out repeatedly that Sun was selling plenty of E10000s...
      If it matters, the HPC procurement I was involved in opted for the SGI, which was probably the correct decision. As unstable as the SGI hardware was, the sites I knew running E10000's for general HPC loads had far worse stability problems (though the E10K's undoubtedly better at running Oracle).

      As far as the E10000 being NUMA or SMP -- depends on how you look at it. The Origin line used a bristled hypercube interconnect topology, so memory on the same node as a CPU was one hop thru the fabric, memory on another node connected to the same router was three hops, on a distant node might be multiple routre hops. The E10K (and I think the E15K) used a star topology where memory was ether on the same bus as the CPU or was on another bus that had to go through the switch. So the Sun has basically two levels of memory latency, whereas the SGI could have many levels. The SGI is definitely NUMA, the Sun is either SMP or "slightly NUMA", or however you want to parse it.

      If you've never seen it, the tech papers on how the SGI NUMA systems work are worth reading. Build a fast 8-port crossbar chip (the "spyder chip"), then use it to glue CPUs, memory, and peripherals together. Keep a couple ports open, and you can glue the crossbars together in a fabric. Presto, you can now build a system with 200 CPUs or 100 PCI busses. Pretty cool, even if it was expensive, proprietary, and all the rest.

    18. Re:Sun does more than that by Decaff · · Score: 1

      Thanks for all the info - I had forgotten the SGI/Cray connection. However, I'm still not sure I see how it could be said that Sun purchased a NUMA-type architecture off SGI - it looks like they purchased the SMP technology?

      The SGI is definitely NUMA, the Sun is either SMP or "slightly NUMA"

      I think this is my point, which is that the Sun (or any) SMP system is more symmetric, and so probably provides more 'scalable' power for general purpose use?

    19. Re:Sun does more than that by Anonymous Coward · · Score: 0

      No, Sun's multiprocessor architecture is not "slightly NUMA" at all, and is definitely not SMP by definition.

      From here, The SunFire memory design has the following characteristics:
      Access type, bandwidth, memory request latency (to get load latency, add the relevant transfer latency from the bottom table as well)
      CPU local, 9.6GB/s, 180ns
      Board local, 6.7GB/s, 193-207ns
      Different board, 2.4GB/s, 333-440ns

      So, in the 3-4 level memory hierachy in a Sun system, latency is increased by a factor of 3, and bandwidth decreased by a factor of 4.

      SGI's Altix systems are a bit harder to classify like this, because they are not a simple crossbar, and have many nodes.

      Their node local memory bandwidth is 10.2GB/s, with aggregate interconnect bandwidth per node of 6.4GB/s, node local latency is 145ns, with maximum latency on a 512 CPU system is around 650ns I think (and IIRC the 1024 CPU topology has the same maximum hops distance between nodes).

    20. Re:Sun does more than that by mt-biker · · Score: 1

      Sun hardware has additional, wonderful resiliency features... engineers can then replace the broken cpus/memory/interfaces WITHOUT BRINGING THE MACHINE DOWN. This lends itself to an environment than can enjoy nearly 100% uptime.

      Can Sun also transparently replace a failed CPU on a machine, WITHOUT HAVING LOST THE PROCESS THAT WAS RUNNING ACROSS ALL 1024 CPUS SIMULTANEOUSLY?

      Thought not.

      Long uptimes look good on the availability report you send to your boss, but when a user's job dies after 100,000 CPU hours he's still not going to be impressed.

    21. Re:Sun does more than that by Decaff · · Score: 1

      definitely not SMP by definition.

      You may be right, but I'm not sure where that gets us as almost everyone else calls Sun's architecture SMP.

    22. Re:Sun does more than that by Anonymous Coward · · Score: 0

      I am right and it doesn't matter what anyone else calls it. Where does it get us? Well it shows that Sun's architecture is not true SMP, and it has similar tradeoffs as SGI's large multiprocessor systems, for an SGI system the size of Sun's systems, latency and bandwidth figures would probably look similar, if not favouring SGI.

    23. Re:Sun does more than that by Decaff · · Score: 1

      Well it shows that Sun's architecture is not true SMP, and it has similar tradeoffs as SGI's large multiprocessor systems,

      Firstly, I would argue that the linux kernel itself implies you are wrong about Sun's architecture. From 2.0 the kernel has support for "Sun SMP". Support for NUMA was only introduced into the main kernel source in 2.4.

      I can't see how it has similar tradeoffs, as the architecture is quite different. I also can't see how latency and bandwith figures would look similar or favour SGI as it's well-established that SGI-style NUMA is a compromise, working where true SMP would be hard to implement. Also, using that large NUMA system is hard work: the Linux kernel provides routines to allocate threads and memory resources to nodes. On an SMP (like Sun), you don't bother - you make your program scalable by making it fine-grained enought in terms of threads, and you can use random memory.

    24. Re:Sun does more than that by dsouth · · Score: 1
      It depends on what you mean by scalable:

      If you mean consistent access times to memory (aka SMP) then yes, the Sun design is more scalable (though, as mentioned, there is still a latency difference between local memory and remote memory). The downside is that you pay for the switch up-front, and the design (and performance) is limited by the size of a single switch.

      If by scaleable you mean the maximum number of processors, maximum amount of memory, maximum number of IO channels, maximum cross-sectional bandwidth, or the expandability of a given system, then the SGI design wins. You add NUMA fabric as you add nodes, so you can start small and grow things. The downside is that it is NUMA, so as the system gets larger so do the latency differences in memory access (though the OS does manage to hide some of that using things like page replication and migration).

      As I eluded to earlier, if I was running Oracle, I'd opt for the Sun, If I was running HPC loads, I'd opt for the SGI. Actually, now days I'd opt for a cluster of opterons, but that wasn't the question. :-)

    25. Re:Sun does more than that by monsted · · Score: 1

      Anything will blow solstice out of the water. It's horrible.

      I heard some talk about integrating veritas into the base OS in Solaris 10, but i don't know if that made it through...

    26. Re:Sun does more than that by Anonymous Coward · · Score: 0

      Dude. What part of Non Uniform Memory Access do you not understand? If SGI's Altix is NUMA, then Sun's Enterprise servers are NUMA. No two ways about it.

    27. Re:Sun does more than that by Anonymous Coward · · Score: 0

      Supporting "Sun SMP" means supporting multiple processors on Sun machines. Before NUMA support they are treated as having uniform access to memory even though they don't.

      Secondly, if you think that "you don't bother" doing memory and node affinity on Sun systems, and "can use random memory", then you seem not to understand the full implications of the problem. Sure, you can access remote memory on Sun systems - at triple the latency and a quarter the bandwidth. Nice.

    28. Re:Sun does more than that by dsouth · · Score: 1

      I agree with you, but Sun (and a lot of other people) do not -- they refer to the Sun E10K/15K as SMP and the SGI O2K/3K as NUMA. My points were intended to go to the underlying designs (Sun has two tiers of memory latency, SGI has many tiers), not to the nomenclature attached to the Enterprise servers. But feel free to keep beating the dead horse -- I'd be especially grateful and impressed if you could get Sun to change their web pages. :-)

    29. Re:Sun does more than that by JonAnderson · · Score: 1

      Guys, It's called ZFS - maybe you have heard of it...

  22. Similar software available? by Pierce · · Score: 2, Interesting

    With the exception of the NUMA stuff, is there software available to re-create this? I'm not even sure what to search for; would this still be considered a "cluster"?

    1. Re:Similar software available? by dwgranth · · Score: 5, Informative

      well, sgi uses/hacks NUMA, spinlocks, etc to make this happen in a more efficient manner. We recently had a SGI rep come and explain their 512CPU architechture at our LUG meeting... and he basically said that SGI has their own implementation of all of the clustering/cpu stacking techs... which they will eventually feed back into the community.. all good stuff.. understandably they will wait for a year or so so they can get their money's worth before they release their changes.

    2. Re: Similar software available? by Anonymous Coward · · Score: 0

      Hmm. I would *LOVE* to see DragonFly BSD on this hardware a year or two from now, and see how it compares to vanilla Linux of the time. IIRC, DragonFly is being redesigned at it's core for exactly this sort of scalability, be it via NUMA machines, or through clustering.

      The Linux patches seem like such terrible hacks in comparison to the elegance of the essentially lockless, message-passing architecture that DragonFly is adopting.

    3. Re:Similar software available? by shaitand · · Score: 2, Insightful

      If they sell you a copy, they've then distributed it and the gpl requires them to license those changes to you under the gpl.

      Any SGI customer can then contribute the changes back to the kernel long before a year is up.

    4. Re: Similar software available? by Anonymous Coward · · Score: 0

      "Seem like". It's developed by SGI who have innovated NUMA before. So that gives them at least my trust to look into this. Btw have you read the code?

    5. Re:Similar software available? by diegocgteleline.es · · Score: 2, Informative

      SGI publishes their code. It's just that their changes are so radical and "dirty" that they're not useful/mainteinable for the rest. Remember, SGI has sold 256 CPU machines with their 2.4 kernel - where 2.4 vanilla doesn't works very well beyond 8 cpus

    6. Re:Similar software available? by Anonymous Coward · · Score: 0

      The sourcecode SGI's XFS penetrated to linux-2.4.x & linux-2.6.x is really "big & dirty", more than 3 megabytes!!!

    7. Re:Similar software available? by Anonymous Coward · · Score: 0

      SGI worked long and hard to cleanup that code to get it in the kernel. It's not easy to get your scalability improvements in.

    8. Re:Similar software available? by cpeterso · · Score: 1


      SGI had the same problem with the Apache group. SGI greatly improved the performance of Apache 1.2, but their patches were so "dirty", the Apache group rejected them. SGI ignored the Apache group's feedback and whined that their improvements where ignored, when in actuality, SGI ignored the Apache community.

    9. Re:Similar software available? by diegocgteleline.es · · Score: 1

      That's quite possible but is not _that_ bad. Linux vanilla kernel has to work for all architectures, all kind of machines etc, targetting 1024 cpu boxes erradicates a lot of targets and makes radical changes much easier, after all this is the nice thing about OSS, it allows to do your own stuff. I've heard that SGI 2.4 kernel is just a diferent OS, not linux anymore, and that's understandable, but for 2.6 they seem to want to merge back things - I hope they do well enought, they have indeed marged some changes already: http://www.linuxsymposium.org/2004/view_abstract.p hp?content_key=147

  23. from MPI to multithreaded ? by InodoroPereyra · · Score: 3, Interesting
    From the article:
    Earlier cluster supercomputers at the NCSA used multiple images of the Linux operating system -- one for each node -- along with dedicated memory allocations for each CPU. What makes this system more powerful for researchers is that all of the memory will be available for the applications and calculations, helping to speed and refine the work being done, Pennington said.

    "The users get one memory image they have to deal with," he said. "This makes programming much easier, and we expect it to give better performance as well."

    So, anyone has any insights as to why/how this matters for the programmers ? Does this mean that the applications running on the "old" clusters, presumably using some flavor of MPI to communicate between nodes, will have to be ported somehow to become multithreaded applications ? Or maybe they will still run using MPI on the big shared memory pool, and each process will be sent to the appropriate node by the OS on demand ? Thanks !
    1. Re:from MPI to multithreaded ? by xyote · · Score: 1
      I presume it means that the application can directly access the shared data through the NUMA memory rather than using mpi to access the data from whatever node it thinks the data is on. Data coherency gets moved down into the hardware/kernel layer rather than up at the application layer. The communication of the data would be done by a low latency interconnect either way.

      This article is news to me. My impression was that HPC programmers preferred mpi over shared memory multi-threading because they found the former somehow more intuitive and less error prone.

    2. Re:from MPI to multithreaded ? by TheLink · · Score: 1

      Such a system looks like a _huge_ SMP box. So you can run stuff like Postgresql which as of 7.4.x doesn't cluster easily because it requires shared memory between processes.

      Sharing memory between processes running on different machines that are indeed separate machines is not that easy. Often requires fancy hardware and software.

      while the SGI solution also involves fancy hardware and software, I believe a single process gets to have terabytes of memory, which is rather different from the common cluster architecture where a process can't easily have a memory space larger than that of the largest cluster member.

      --
    3. Re:from MPI to multithreaded ? by Sangui5 · · Score: 4, Informative

      Does this mean that the applications running on the "old" clusters, presumably using some flavor of MPI to communicate between nodes, will have to be ported somehow to become multithreaded applications ?

      NCSA still has plenty of "old" style clusters around. Two of the more aging clusters, Platinum and Titan are being retired, to make room for newer systems like Cobalt. Indeed, the official notice was made just recently--they're going down tommorrow. However, as the retirement notice points out, we still have Tungsten, Copper, and Mercury (Terragrid). Indeed, Tungsten is number 5 on the Top 500, so it should provide more than enough cycles for any message-passing jobs people require.

      So, anyone has any insights as to why/how this matters for the programmers ?

      What it means is that programming big jobs is easier. You no longer need to learn MPI, or figure out how to structure your job so that individual nodes are relatively loosely-coupled. Also, jobs that have more tightly-coupled parallelism are now possible. The older clusters used high-speed interconnects like Myrinet or Infiniband (NCSA doesn't own any Infiniband AFAIK, but we're looking at it for the next cluster supercomputer). Although they provided really good latency and bandwidth, they aren't as high-performing as shared memory. Also, Myrinet's ability to scale to huge numbers of nodes isn't all that great--Tugsten may have 1280 compute nodes, but a job that uses all 1280 nodes isn't practical. Indeed, untill recently the Myrinet didn't work at all, even after partitioning the cluster into smaller subclusters.

      This new shared-memory machine will be more powerful, more convienient, and easier to maintain than the cluster-style supercomputers. Hopefully it will allow better scheduling algorithms than on the clusters too--an appaling number of cycles get thrown away because cluster scheduling is non-preemptive.

      I'd also like to point out some errors in the Computerworld article. NCSA is *currently* storing 940 TB in near-line storage (Legato DiskXtender running on an obscenely big tape library), and growing at 2TB a week. The DiskXtender is licenced for up to 2 petabytes--we're coming close to half of that now. The article therefore vastly understates our storage capacity. On the other hand, I'd like to know where we're hiding all those teraflops of compute--35 TFLOPS after getting 6 TFLOPS from Cobalt sounds more than just a little high. That number smells of the most optimistic peak performance values of all currently connected compute nodes. I.e. - how many single-precision operations could the nodes do if they didn't have to communicate, everything was in L1 cache, we managed to schedule something on all of them, and they were all actually functioning. Realistically, I'd guess that we can clear maybe a quarter of that figure, given machines being down, jobs being non-ideal, etc. etc. etc.

      As a disclaimer, I do work at NCSA, but in Security Research, not High-Performance Computing.

    4. Re:from MPI to multithreaded ? by PingPongBoy · · Score: 1

      What is easier for programmers, I ask?

      Programmers like transparency. Effort used to break a program into 1024 threads running on 1024 CPUs is one thing I wouldn't want to deal with. The time used to do this plus all the extra code, which obscures bugs, makes it hard for multiprocessor systems to deliver value.

      An OS or even a CPU-chipset combination that can automatically offload work to different processors will allow an existing software base to run faster.

      --
      Know your pads. One time pad: good for cryptography. Two timing pad: where to take your mistress.
    5. Re:from MPI to multithreaded ? by kscguru · · Score: 3, Informative
      Caveat: I think MPI itself is very recent (standardized only w/in the past few years), before that everyone used custom message-passing libraries.

      It's a tradeoff. MPI is "preferred" because a properly written MPI program will run on both clusters and shared-memory equally fast, because all communication is explicit. It's also much harder to program, because all communication must be made explicit.

      Shared-memory (e.g. pthreads) is easier to program in the first place (since you don't have to think about as many sharing issues) and more portable. However, it is very error-prone - get a little bit off on the cache alignment or contend too much for a lock, and you've lost much of the performance gain. And it can't run it on a cluster without horrible performance loss.

      If it's the difference between spending two months writing the shared-memory sim and four months writing the message-passing sim that runs two times faster on cheaper hardware, well, which would you choose? Is the savings in CPU time worth the investment in programmer time?

      Alas, the latencies on a 1024-way machine are pretty bad anyway. If they use the same interconnect as the SGI Origin, it's 300-3000 cycles for each interconnect transaction (depending on distance and number of hops in the transaction). Technically that's low-latency... but drop below 32 processors or so, and the interconnect is a bus with 100 cycle latencies, so those extra processors cause a lot of lost cycles.

      --

      A witty [sig] proves nothing. --Voltaire

    6. Re:from MPI to multithreaded ? by joib · · Score: 1


      This new shared-memory machine will be more powerful, more convienient, and easier to maintain than the cluster-style supercomputers.


      As always, it's a tradeoff. Distributed memory machines, like a cluster, are usually vastly cheaper than shared memory systems, and can scale to tens of thousands of nodes. Thus it is economically stupid to run MPI apps on a shared memory computer. That's why most supercomputer centers have both distributed and shared memory machines. There's a place for both architectures.


      Hopefully it will allow better scheduling algorithms than on the clusters too--an appaling number of cycles get thrown away because cluster scheduling is non-preemptive.


      Uh? Gang schedulers exist and are in use on clusters too.

    7. Re:from MPI to multithreaded ? by Sangui5 · · Score: 1

      and can scale to tens of thousands of nodes

      The largest current machines (in terms of # of processors) I can find reference to are ASCI Q, BlueGene/L DD1, and ASCI White (Los Alamos, IBM, and Lawrence Livermore respectively). Each has 8192 processors. ASCI Red at Sandia is 9632, but aging (from 1999). Even if those processors are in clusters of 1 processor per node (they aren't--BlueGene isn't even a cluster), that is not tens of thousands. Unless you're talking about some classified machines, tens of thousands is out. The communication interconnects for this scale of machine stop scaling so well past 1000 nodes or so. Many of the largest of the largest clusters are, in production, split into smaller subclusters most of the time.

      Thus it is economically stupid to run MPI apps on a shared memory computer.

      Hence why I pointed out that NCSA still has 3 large cluster machines, including the 3rd fastest in the world (the Earth Simulator and BlueGene aren't traditional clusters in the strictest sense). Also, economy isn't everything in HPC. If it is faster to run an MPI job on a shared-memory machine, then there are times that it will make sense to do so, even though running it on a cluster would be cheaper. Remember, it would be cheapest and most efficient of all to run everything on one processor, except that everything would take forever to finish.

      Gang schedulers exist and are in use on clusters too.

      Gang scheduling has absolutely nothing to do with the lack of premption. Gang scheduling means that a job will get all of the processors it needs at once. Which is one of the causes of the inefficiency.

      Suppose you have a 100 node cluster. Your job queue has 60 jobs that each want 1 node, 1 that wants 50 nodes, and 1 that wants the whole cluster (arriving in that order). Suppose also that the 50 node job is currently running, and another 50 node job just finished (so you have 50 unused nodes). Finally, assume that more small jobs are being entered into the queue relatively frequently.

      Since the scheduling is non-preemptive, in order to run the 100 node job, the entire cluster must first be idle. Suppose you start allocating the 1 node jobs in the 50 free nodes, since they fit. They each take an unpredictable amount of time to run, so when any one finishes, the others are likely to be still going. Even when the 50 node job finishes, you'll have some number of small jobs running (remember, more are entering the ready queue). You *cannot* preempt them, so the 100 node job cannot run. If you keep scheduling jobs that fit, it is likely that it will never run.

      Suppose you run jobs in strict order of arrival. Then the most likely outcome will be that a few of the 1 node jobs will take much longer than the others. You'll have most (99?) of the nodes free, but one or two straggler jobs still going. That 100 node job can't run, and although the jobs after it would fit, they can't run either. You have to sit with most of your cluster idle.

      The schedulers in use don't do first come first served, and make an effort to keep efficiencies high, but it is still the case that large numbers of nodes lie idle when there are jobs that could fit in the idle nodes, in an effort to get the larger jobs through. If it was possble to suspend jobs, aka preempt them, this wouldn't be a problem, but on current clusters, that isn't possible (there are some research-grade cluster systems that can, but they have other very bad weak points (i.e. proprietary MPI extensions), and aren't in production use). It also isn't feasible/allowable to share a node amoung more than one job--for the same reasons that gang scheduling is efficient.

      On large shared memory machines, however, preemptive scheduling is much easier. Whether this particular machine will allow it or not, I don't know. But it is much more likely than on the clusters.

    8. Re:from MPI to multithreaded ? by illtud · · Score: 1

      I'd also like to point out some errors in the Computerworld article. NCSA is *currently* storing 940 TB in near-line storage (Legato DiskXtender running on an obscenely big tape library), and growing at 2TB a week. The DiskXtender is licenced for up to 2 petabytes--we're coming close to half of that now

      Interesting. I used the original OTG product on NT with a smallish (6B) tape library. We weren't that impressed with the software, and the support was terrible (in the UK). I heard that OTG bought Unitree's linux HSM software, and that the Legato product was (eventually) going to be based on this one, rather than the OTG offering. Any idea if this is so? What platform are you running it on? I'd also be interested in the tape library technology - we've moved on to ADIC's LTO Scalar 1000. The AMASS software we're using with that doesn't seem to be that manageable (I'm not running the tape archives myself anymore).

    9. Re:from MPI to multithreaded ? by joib · · Score: 1


      Unless you're talking about some classified machines, tens of thousands is out.


      Blue Gene/L, supposed to be finished by the end of this year, is going to have ~65000 nodes (i.e. ~128000 cpu:s). The Cray Red Storm, also currently under construction IIRC, has a wee bit over 10000 nodes (1 cpu per node).


      BlueGene isn't even a cluster


      If you read my post you see that I wrote "distributed memory machines", which the Blue Gene definitively is.


      Also, economy isn't everything in HPC. If it is faster to run an MPI job on a shared-memory machine, then there are times that it will make sense to do so, even though running it on a cluster would be cheaper. Remember, it would be cheapest and most efficient of all to run everything on one processor, except that everything would take forever to finish.


      Well, I was thinking of economy as in "cheapest machine to solve a specific problem within a specific time". There are many cases where a distributed memory machine is the most economical in this sense, and there are other cases where a shared memory machine is more economical, and even cases where a shared memory machine is practically necessary.


      Gang scheduling has absolutely nothing to do with the lack of premption.


      On the contrary, gang scheduling relies on the ability to preempt jobs.


      Since the scheduling is non-preemptive, in order to run the 100 node job, the entire cluster must first be idle. Suppose you start allocating the 1 node jobs in the 50 free nodes, since they fit. They each take an unpredictable amount of time to run, so when any one finishes, the others are likely to be still going. Even when the 50 node job finishes, you'll have some number of small jobs running (remember, more are entering the ready queue). You *cannot* preempt them, so the 100 node job cannot run. If you keep scheduling jobs that fit, it is likely that it will never run.


      What you're describing is essentially a "backfill" scheduler, a very popular scheduler algorithm. Research has showed that it's actually quite good IF the users can be bothered to specify correct time estimates for their jobs. Well, usually the users won't, and efficiency suffers as a result.


      Suppose you run jobs in strict order of arrival. Then the most likely outcome will be that a few of the 1 node jobs will take much longer than the others. You'll have most (99?) of the nodes free, but one or two straggler jobs still going. That 100 node job can't run, and although the jobs after it would fit, they can't run either. You have to sit with most of your cluster idle.


      Now that is just plain stupid. I don't think anybody seriously uses such a poor scheduler.


      If it was possble to suspend jobs, aka preempt them, this wouldn't be a problem, but on current clusters, that isn't possible (there are some research-grade cluster systems that can, but they have other very bad weak points (i.e. proprietary MPI extensions), and aren't in production use).


      At least IBM:s Loadleveler scheduler can be configured to use a gang scheduling algorithm (i.e. with preemption). The problem is that it requires the users to recompile their apps with the thread aware compiler (mpxlf95_r), which in some cases (such as mine. :-( ) produces a pretty serious slowdown.

      Cray reportedly has had gang scheduling (preemptive, as I said before) for the T3 (i.e. a distributed memory machine) since the mid 90:ies.

  24. Its and experiment..... by 3seas · · Score: 1

    .... to see just how far you can stretch a bit before the MP loses control....

  25. HP Overstock by bayerwerke · · Score: 2, Funny

    They bought HP's overstock of them for pennies on the dollar.

  26. Fine... by Sunnan · · Score: 1

    But does it play Ogg Vorbis?

  27. 3TB of memory? by gsasha · · Score: 3, Funny

    I wish I had that much disk space...

    1. Re:3TB of memory? by PingPongBoy · · Score: 1

      But you do.

      1 DVD-R > 4 Gb
      1000 DVD-R > 4 Tb available for $1000

      --
      Know your pads. One time pad: good for cryptography. Two timing pad: where to take your mistress.
  28. Ummm.... by Anonymous Coward · · Score: 0

    A small suggestion - get better hardware, and you won't need those features.

    No charge :-)

    1. Re:Ummm.... by Anonymous Coward · · Score: 1, Insightful

      Ummm, false. We're talking mean time to failure here-- get a 10,000 processors with a MTTF of 10,000 days (27 years) what are your chances one of your processors will fail tomorrow? Or this week/month/year? They don't all last 27 years.

  29. Coincidence? by Quirk · · Score: 1, Redundant

    First Doom 3 now this... coincidence? I don't think so.

    --
    "Academicians are more likely to share each other's toothbrush than each other's nomenclature."
    Cohen
    1. Re:Coincidence? by DARKFORCE123 · · Score: 0

      You're right. Doom 3's story is about opening a portal to Hell.

      The guys at the National Center for Supercomputing Applications want to one up ID and use this machine to help open a portal to Hell in the real world resulting in our Final Doom.

    2. Re:Coincidence? by Beale · · Score: 1

      It's already been done. I mean, where did you think marketing executives come from?

  30. Re:really fast? by Anonymous Coward · · Score: 0

    won't that result in running both KDE and GNOME at the same time, but as soon as you look at it you see FVWM instead?

  31. Sun and/or IBM zseries hardware by r00t · · Score: 3, Informative
    Linux runs on both of these, with official IBM support on the zSeries. On the IBM hardware, go ahead and swap out CPUs and memory. It's supported, today, with Linux.

    The Sun hardware is more difficult to deal with, since there isn't a virtual machine abstraction. You can't do everything below the OS. Still, Linux 2.6 has hot-plug CPU support that will do the job without help from a virtual machine. Hot-plug memory patches were posted a day or two ago. Again, this is NOT required for hot-plug on the zSeries. IBM whips Sun.

    I'd trust the zSeries hardware far more than Sun's junk. A zSeries CPU has two pipelines running the exact same operations. Results get compared at the end, before committing them to memory. If the results differ, the CPU is taken down without corrupting memory as it dies. This lets the OS continue that app on another CPU without having the app crash.

    1. Re:Sun and/or IBM zseries hardware by christophersaul · · Score: 1

      What applications are you suggesting be run on this hugely expensive mainframe?

    2. Re:Sun and/or IBM zseries hardware by Anonymous Coward · · Score: 1, Interesting

      Anything you want.

      For instance were I work we have a older s/390 mainframe that runs a database.

      We have 1. Win2000 server running IIS web server and MS SQl that is used online to form Queries automagicly for the mainframe stuff for our customers. 2. We have a Linux based firewall 2. other Linux servers 3. routers 4. networks 5. numerious other insudry Linux machines.

      All this could be replaced by Linux running in a single partition in the mainframe. All the network, all the server.

      So don't be a dipshit. Obviously there is reasons for running linux in a Mainframe, especially WHEN YOU ALREADY OWN ONE FOR DOING SOMETHING ELSE.

      Now ZSeries isn't just a mainframe. It makes a great server. There are different pricing levels, different setups.

      Now go find a big Corporate Windows server farm (rarer then you'd think) now look at the hundreds of Windows servers, Hundreds of support personal, experts, then the rest of the A+ certified service geeks.

      Now delete all that, replace it with one server, running various things in it's many partitions. It's run by 2 admins and some assistants.

      It will be faster, more reliable, and probably much cheaper. However the benifits go far beyond just elimating hundreds of redundant personals, and dozens of high maintainance PC servers running a unreliable OS, you have something that is easy to deal with supported by a company that will bend over backwards for you, instead of being beholdent to the assholes in MS.

      NOW if you don't end up liking it, then you could move to solaris, or run a Server clusters of Linux PCs. And since your already running Linux, moving to any other Unix platform running any other hardware, or running Linux on commodity hardware, is much much easier then migrating from Windows in the first place.

    3. Re:Sun and/or IBM zseries hardware by flok · · Score: 2, Interesting

      What happens if both pipelines make the same mistake because the L1-cache feeds them both the same corrupted data?

      --

      www.vanheusden.com - home of Multitail, HTTPing, CoffeeSaint, EntropyBroker, rsstail, bsod, listener, nagcon, nagi
    4. Re:Sun and/or IBM zseries hardware by r00t · · Score: 2, Informative

      There isn't any part of the system without some
      sort of error correction. The cache generally
      has ECC for this. Since L1 is innermost and small,
      it may well be duplicated along with the pipelines,
      but I think they use ECC for that as well.

      This is full-path protection. Cables have ECC
      and/or a protocol with checksums. Disks are RAID.
      Methods of error correction vary by component,
      but nowhere are they missing.

    5. Re:Sun and/or IBM zseries hardware by kasperd · · Score: 2, Informative
      If the results differ, the CPU is taken down without corrupting memory as it dies.

      A few questions:
      • What if an error happens in the comparison unit?
      • What happens to the program that was running on the CPU as it is taken down? (The CPU registers is part of the program state, so you cannot just continue on another CPU).
      --

      Do you care about the security of your wireless mouse?
    6. Re:Sun and/or IBM zseries hardware by Anonymous Coward · · Score: 0

      Problem with your argument is that nobody runs Linux as the "native OS" on zSeries. It's actually the VM that provides all of the management stuff.

    7. Re:Sun and/or IBM zseries hardware by r00t · · Score: 1

      Having an error that matters, then also having an
      error in the comparision unit, would amount to
      having two unlikely errors at the exact same moment.

      CPU registers are tough to deal with. I don't know
      what gets done about that. Re-generating the CPU
      registers should be possible as long as memory on
      which they depend has gone unmodified (or can be
      rolled back) since the registers were last saved.
      However it's done, do note that these systems are
      not known for having fast CPUs.

    8. Re:Sun and/or IBM zseries hardware by abdulla · · Score: 1

      From what I understand, this is the point of the virtual machine, you'll still have the information to continue it on another CPU from the point of failure. Feel free to correct me.

    9. Re:Sun and/or IBM zseries hardware by r00t · · Score: 1
      I don't see a problem. I even admitted as such. The VM is pretty much part of the platform; you can't run Solaris on the IBM VM now, can you? The VM provides, and Linux benefits.

      You can run Linux on the bare hardware if you like. The VM is not required.

      You can use the Linux-native CPU hot-plug support. Even VM users do this, adding and removing virtual CPUs from virtual machines. Linux-native hot-plug memory support is a recently posted patch, which would be useful when running without the VM. The Linux-native hot-plug memory support should prove useful on Sun hardware, SGI hardware, and IBM's pSeries and iSeries hardware.

    10. Re:Sun and/or IBM zseries hardware by kasperd · · Score: 1

      Having an error that matters, then also having an error in the comparision unit, would amount to having two unlikely errors at the exact same moment.

      That is not what I had in mind. I think about a situation where both pipelines work correctly, but an error happens in the comparision unit, such that you end up sending the incorrect data to RAM.

      Re-generating the CPU registers should be possible as long as memory on which they depend has gone unmodified (or can be rolled back) since the registers were last saved.

      Memory doesn't go unmodified for long time, how many instructions does programs execute on average between memory writes? (Less than 10 I would think). Rolling back memory contents is not an option if you have threads communicating through shared memory. And normally registers are saved and restored only by the scheduler. But what if an error happened while the scheduler was running.

      I wonder if the consensus impossibility result with a single crash can be adopted to show something about a system of this kind.

      --

      Do you care about the security of your wireless mouse?
    11. Re:Sun and/or IBM zseries hardware by sl3xd · · Score: 1

      If I get your gist, then I would say this: Mainframe processors aren't at all like your 'consumer' processors like G5's and Pentiums. People pay much, MUCH more for these things with the expectation that they will NEVER have problems of ANY kind; a processor as reliable as gravity. A processor for which a warranty is redundant, as the processor will be perfect to begin with.

      But I digress:

      That is not what I had in mind. I think about a situation where both pipelines work correctly, but an error happens in the comparision unit, such that you end up sending the incorrect data to RAM.

      Even if an error happens in the comparison unit, it isn't going to end up in RAM.

      The reason is simple if you understand how processors work: processors are a collection of logic gates. Gates are strictly one-way; there is no reverse. There are circumstanses under which a gate can be forced to operate in reverse, but they usually involve physically destroying the gate, and would quickly result in a system failure anyway.

      A fundamental thing to understand is that the output from the pipeline is strictly 'read only' to the comparitor. Another important fact is that it is completely unnecessary to make a copy of the outputs in order to compare them, nor is it necessary to change or 'handle' the input data to compare them.

      In any event, the comparitor looks at its two input values. And the comparitor probably only one kind of output: a single bit on whether the values are the same. If they are the same, other circuitry in the CPU takes the same input value the comparitor saw -- from the exact same electrical 'pins' -- and passes it along. And, most likely only one of the pieline outputs is actually used; the other, being identical, is simply discarded. (Saves transistors and logic that way)

      Here's a crude table to list things out:

      Only one pipeline bad=Error caught
      Only comparitor bad=Error caught
      Both pipelines bad(1)=Error caught
      Both pipelines bad(2)=Bad data->memory
      One pipeline bad, comparitor bad=Bad data->memory

      ** Both pipelines bad(1) means that the two pipelines do not have an identical defect, and will return a result different from the correct result and each other.
      ** Both pipelines bad(2) means that through some miracle they return the same incorrect result. This can happen because of a design problem -- but if that were the case, it would be happen on a lot more than just a couple of processors, and the whole batch would be recalled.

      It's worth noting that the last three are many orders of magnitude less likely to happen than the first two. (With the probability of an identical error in both pipelines being so near to zero as to be a pointless conversation anyway.)

      The worse case scenerio here is that you have an error in one of the pipelines AND an error in the comparitor that flags the mismatch as good; the effect being the same as if you had a 'normal' non-verified (Pentium/Athlon, etc) CPU generating errors.

      The entire point however, is that it is much, much less likely for you to have one of the combinations of these errors on the same chip (and hence much, much less likely to have bad data written to memory) than it is in a 'normal' unverified pipeline.

      Memory doesn't go unmodified for long time

      In processor terms, it sure does. It goes unmodified for a compartitive ice age, even. You've got to come to terms with how painfully, agonizingly, tormentuously long it takes to get data between the CPU and main memory (not to mention latencies added because of differences in clock speeds between memory and the CPU, etc. etc.). At 2 GHz, it takes more than a few clock ticks just to travel the distance from the CPU to the memory-- to hell with actually changing its contents once it gets there (which takes even longer).

      how many instructions does programs execute on average between memory writes? (Less than 10 I would think). Rolling back memo

      --
      -- Sometimes you have to turn the lights off in order to see.
    12. Re:Sun and/or IBM zseries hardware by imperialfrost · · Score: 1

      this is why the power5 exists now and power6 will probably replace the zseries and the iseries. then you also got the speed..

    13. Re:Sun and/or IBM zseries hardware by kasperd · · Score: 1

      If they are the same, other circuitry in the CPU takes the same input value the comparitor saw -- from the exact same electrical 'pins' -- and passes it along.

      And this is where something can go wrong. If the right data goes to the comparision unit, but on the other path to where the data is actually to be used, an error happens, it is not detected.

      --

      Do you care about the security of your wireless mouse?
    14. Re:Sun and/or IBM zseries hardware by sl3xd · · Score: 1

      Um... Not many bad things can happen on a wire only a few namometers long. If you're worried about that, than there are over a hundred million such points of failure. More importantly, this kind of error is extremely easy to catch, and would be discovered in the chip foundary, before it even gets cut from the silicon wafer.

      One thing to recall is how much time, money, and effort goes into verifying that the die is 'perfect' before it is cut from the silicon wafer. Chip foundaries spend billions on testing equipment to test as much as they can. It's a pretty interesting thing to see done, because as you subdivide the chip down, eventually you get to the point where you can test a small portion for every single possible combination of inputs, and ensure things are working properly.

      It's a time-consuming process, and requires extra transistors to make it happen; even your average Intel processor does this. The difference is in how much verification happens. A mainframe-class processor sees a LOT more such verification.

      It's worth noting that Intel is constantly trying to figure out what to do with the 'dead' transistors on its CPU's. (by 'dead' I mean that they are on the die, but aren't used for anything in the finished product. These transistors are there specifically to make testing of the chip easier when it's still on the silicon wafer.

      And since these transistors cost money to make, it would be nice for everybody involved for them to be used in 'normal' use, instead of only at the foundary.

      --
      -- Sometimes you have to turn the lights off in order to see.
  32. Re:Will it be done in time for Quake 3? by Chicane-UK · · Score: 0, Troll

    Worst.. FP.. ever!

    --
    "Hey! Unless this is a nude love-in, get the hell off my property!!"
  33. The solution! by Sidicas · · Score: 5, Funny

    "will run a single Linux operating system image across 1,024 Intel Corp. Itanium 2 processors..."
    "The National Center for Supercomputing Applications will use it for research"


    1. Make a system that generates more heat than a supernova.
    2.Research a solution to global warming.
    3. Profit!

    1. Re:The solution! by Ari_Haviv · · Score: 1

      see you don't know how to do it right.
      Let's try it again:
      1. Make a system that generates more heat than a supernova.
      2. Research a solution to global warming.
      3. ????
      4. Profit!

      --
      Join Team Mozilla #38050 Folding@home
    2. Re:The solution! by Anonymous Coward · · Score: 0

      In this case, step 3 would be "Steal Underwear".

  34. In other Headlines by ShadowRage · · Score: 4, Funny

    SCO gained $715,776

  35. Another thing Sun does well.... by passthecrackpipe · · Score: 4, Insightful
    Cache reduction - ehh cash reduction. One of the prime reasons Sun is losing serious levels of installed base to Linux is not because linux is better, it is because Sun is bloody expensive - outrageously so. And while most customers had to endure the annual fleecing with gritted teeth - due to lack of alternatives - Sun is now being pummeled out of datacenter after datacenter.

    I have replaced Sun Hardware/Software combo's in the core datacenter for many of our customers, and I can tell you that yes - Sun brings some amazing features to the table - most of which are there to serve old technology. Linux on simple CPU's delivers such an amazing price performance (depending on the job, we see an average of 3x to 4x performance increase for 25% of the cost. That means that if I were to spend the same, lifecycle-wise, on a Linux cluster as I would on a big Sun box like the 10k or 15k, I'd end up with 12x to 16x the performance of the Sun solution.

    The same functionality in terms of cpu and ram (and other hardware) failure is available on the Linux cluster, albeit in less graceful form - the magic spell to invoke goes like this:
    shutdown -h now
    if I have 300 machines crunching my data, I can afford to lose a couple, and can afford to have a few hot-standby's.

    Of course, the massively parrallel architecture does not work for all applications, and in those cases you would look to use either OpenMOSIX or of course the (relatively expensive) SGI box mentioned in this article.
    --
    People who think they know everything are a great annoyance to those of us who do.
    1. Re:Another thing Sun does well.... by kscguru · · Score: 1
      That means that if I were to spend the same, lifecycle-wise, on a Linux cluster as I would on a big Sun box like the 10k or 15k, I'd end up with 12x to 16x the performance of the Sun solution.

      And if you are trying to use a 10k or 15k for a job where a simple cluster will suffice, you DESERVE to be fleeced! If the application is simple enough to run on a cluster, where parallelism means high-latency message passing, then you DON'T need big iron to run it. You know this, I know this... yet people out there keep buying 15k's and paying three or four times as much as an equivalently powerful cluster would cost.

      Big iron really is a niche market - they only belong in specific scenarios where the latencies of a cluster are too large and the applications themselves really need shared-memory. A really heavy-duty database qualifies, simulations with large (multi-gigabyte) active datasets qualify, rendering qualifies, OLTP-type apps qualify. Your web server cluster doesn't, your "one giant machine to run everything" really doesn't, and a sufficiently distributed database (like Google or some other well-designed database) doesn't.

      Honestly, to hear someone complaining about how expensive a 15k is, and how nice the replacement cluster is, tells me that some purchaser in the past was stupid. It's like buying an SUV to commute in, and then complaining about the excessive gas consumption. Well, duh!

      --

      A witty [sig] proves nothing. --Voltaire

    2. Re:Another thing Sun does well.... by jedidiah · · Score: 1

      NOPE. "Heavy duty databases" and OLTP run just fine on clusters. If you put any thought into it at all you would quickly realize that OLTP is especially suitable for execution in a cluster environment.

      This is why Sun is in such deep sh*t. No one needs a 32 cpu box to run a monster Oracle instance anymore.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    3. Re:Another thing Sun does well.... by sparkz · · Score: 2, Informative
      If you're lucky enough to have a massively-parallel, read-only application, then go for Linux clusters.

      Read the Sun Blueprints (http://www.sun.com/blueprints/browsesubject.html# cluster) for how a real cluster works - actaully caring about data integrity. That is the crux with clustered systems: What happens if one node "goes mad" even though it's no longer a "valid" part of the cluster?
      Look into Sun's dealing with failure-fencing; it's drastic (PANIC a node if it can't be sure it's a cluster member) but it works.

      By contrast, Linux clustering seems to be at the level of "let's share an IP address, we can balance the load". Great for DNS (but -oh, DNS has that built-in) or Apache read-only servers (assuming no session-management, static-only pages).

      Digital had an excellent cluster package last decade; Sun seem to be getting to that level now. Linux, sorry to say, is years behind.

      --
      Author, Shell Scripting : Expert Re
    4. Re:Another thing Sun does well.... by passthecrackpipe · · Score: 1

      Hmm, maybe you should revisit linux cluster then - it all depends on what components you use to build your cluster. There are several mechanisms available today that will allow you to deal with the situations you describe - on the systems level, there is STONITH - if you are not that a machine is part of the cluster, kill it. Then on the application level there are a variety of solutions that can be utilised.

      Do you honestly believe that the likes of (insert name of very big pharma, rhymes with "ein") or some of the examples in the top 500 will spend millions of dollars on DNS or Apache read only systems Supercomputers?

      Scott McNeally must be jumping for joy everytime he sees FUD like yours printed - he sees his marketing dollars paid off handsomely. Linux clustering is a mature, solid and reliable solution to a large array of computational problems. There are problems that cannot be solved effeciently with a massively parrallel setup, and for those, you would look to the more expensive, but in it's own catagory, very competitive, NUMA system like the one designed by SGI. And it still runs Linux.

      --
      People who think they know everything are a great annoyance to those of us who do.
    5. Re:Another thing Sun does well.... by kscguru · · Score: 2, Informative
      OLTP is the classic anti-cluster workload. Essentially random data access patterns, very large resident data sets with a huge amount of simultaneous (and synchronous) accesses. OLTP means low-latency, and OLTP will die on a cluster. By definition.

      Now sure, some careful planning can take an OLTP system and make it more cleanly distributed, but at that point it isn't OLTP, because all the nasty bits that made it a hard workload are washed out. Running a constantly-changing database (e.g. financial market?) on a cluster is hard; running a mostly static database (e.g. shopping cart?) is easy.

      However, I agree with your point. Very few people need the 32-cpu monster (although there are a few!). Handling transaction volume can be done two ways: buy a big general-purpose machine that can handle the volume, or buy a cheaper cluster that more closely matches the workload. And today, the cluster is the right answer.

      I think the difference between then and now is that before, we didn't know what the workload was supposed to be. In that case, a big general-purpose monster server is the most flexible solution. But now, we know what workload we want, and it's cheaper to design a cluster for that workload.

      --

      A witty [sig] proves nothing. --Voltaire

    6. Re:Another thing Sun does well.... by monsted · · Score: 1

      TruCluster is still a very nice product and is still in use in many places. HP has announced that eventhough Tru64 is going to die, both TruCluster and AdvFS (Thank you, oh great developers - excellent FS/VM combo :)) are going to be ported to HPUX.

  36. Re:really fast? by Anonymous Coward · · Score: 0

    i know this is a joke but i don't get it - you guys must have slow pcs is all i can say. kde ain't slow for me..

  37. Re:Will it be done in time for Quake 3? by Forge · · Score: 0

    Will the person who moded the parent "off topic" please put down the crack pipe and post a reply explaining why.

    --
    --= Isn't it surprising how badly I spell ?
  38. Wow by Steamhead · · Score: 2, Funny

    Hot damn, this is one server that could survive a slashdotting.

  39. Du-uh by passthecrackpipe · · Score: 1, Interesting

    As everybody that has read the IBM redbooks about mainframe linux knows, Sendmail is the service of choice! Of course, you could run Postfix on a decrepid old pentium-1 and get the same level of perfomance, but that won't help IBM with their Mainframe income, will it?

    --
    People who think they know everything are a great annoyance to those of us who do.
    1. Re:Du-uh by christophersaul · · Score: 1

      On a serious note, I can't think of any app other than Oracle that's of any use beyond the OSS stuff.

      I think it's funny how Sun is either far too expensive and we're being told to run everything on a few old 486s from the back of the office cupboard, or that Linux on a mainframe is the way to go.

    2. Re:Du-uh by Anonymous Coward · · Score: 0

      Lest you miss the important point here...

      You can run Linux on everything from the "few old 486s from the back of the office cupboard" to a mainframe to this fire-breathing monster Itanium CPU cluster with shared memory from SGI!

      I consider that awesome! And I also think that, because of this, Linux will still be running on hardware when everyone has completely forgotten that computers used to be etched on sand!

    3. Re:Du-uh by passthecrackpipe · · Score: 1

      I was only half kidding with the sendmail comment - (flamebait indeed, braindead moderators - looks like these days, anybody on slashdot with an account.... oh wait) read the IBM readbooks on mainframe linux: Sendmail and lotus notes are prime examples pushed by IBM for mainframe use. Rule of thumb when considering the zSeries: if your application needs to push *enormous* amounts of data around *really* fast, but does not need much in the way of processing, especially interactive processing, then the mainframe would be a contender. of sorts. note though, that these days, there are several other systems and architectures around that can rival the mainframe for datamanagement capacity, uptime and abilities, at (again) a fraction of the cost. read the IBM redbook on mainframe linux - it really is an eye opener about how old, decrepid, and *crap* the mainframe really is.

      Best course of action when you find yourself considering mainframes - especially IBM mainframes - is to beat yourself over the head with a large stick, several times. Then think about the iSeries (also runs linux) or pSeries (also runs linux, now available in sexy blade format). Or better yet, bust out those old 486's from the back of the office cupboard (firehazard, anyway).

      --
      People who think they know everything are a great annoyance to those of us who do.
    4. Re:Du-uh by jedidiah · · Score: 1

      Yes... and Oracle will run on all of them too.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    5. Re:Du-uh by Anonymous Coward · · Score: 0

      And, as we all know Oracle is the pinnacle of computing progress... NOT! /sarcasm

      Quit that! Many computers do many useful things without Oracle.

      Take the long view (gee, as my original post was supposed to suggest) and what do you find? Linux has, and will continue to, be used on everything from microcontrollers to mainframes to mega-multi-CPU supercomputers. What technology do you think will be adapted to be used on the next wave of computing? Linux, already adapted to run on many platforms, regardless of instruction sets or memory organization concerns, OR Windows, so finely tuned to Intel x86 architecture that it didn't even make the final cut on Intel's Itanium architecture?

      And, given that, what OS is going to underly the next software advance, when database management, such as Oracle represents, becomes obsolete? /rant

      I swear, Slashdot gets dumber every day: dumb, dumber, Slashdot!

  40. Advantages...? by bogaboga · · Score: 1

    Really, I know this is off topic, but what are the advantages of having more CPUs on a computer system? I have been approached by various vendors explaining to me why a system with dual CPUs would be better. Wouldn't I be able to harness the same performance by using more and more RAM or just a more powerful CPU?

    1. Re:Advantages...? by Anonymous Coward · · Score: 1, Funny

      Short answer.... no

    2. Re:Advantages...? by myg · · Score: 4, Informative
      Because a machine like that isn't about running Apahce or serving files.

      The purpose of that computer is to solve complex scientific problems such as weather simulations, high-energy particle simulations, protine folding, etc. Many of these simulations involve iterated systems of equations that can take decades to solve on the fastest CPU's we have today.

      The only way to get meaningful results in a meaningful amount of time is to break the problem apart into smaller problems and solve them in parallel.

      Some projects, such as Folding@Home and Find-A-Drug go the distributed computing route -- use many disconnected systems to solve the problem.

      The downside to that approach is that not all problems can be easily broken apart -- and some classes of problems can exist without tight coupling but they loose efficiency. The impressive thing about this particular super computer is that it has a single, unified memory image.

      This is very useful for some classes of simulation problems when the entire simulation must be present for each iteration.

    3. Re:Advantages...? by Pantero+Blanco · · Score: 1

      Unless the "more powerful" CPU is several times as fast as the original, no. Two men that can lift 100 pounds can carry a 200-pound load, but one man that can lift 150 pounds can't.

      And RAM will only help to a point (and there's a cap on how much a regular computer can take)...I could deck out a K6 running at 333 Mhz with 512 MB of RAM, but it still wouldn't make it run as fast as a computer with multiple CPUs or one with a newer processor.

    4. Re:Advantages...? by TheHawke · · Score: 1

      mmm... Give me a system that'll help forge a method of either distort or generate gravity energy that would conteract a planet's own gravitational field... IE, a repulsorlift drive!

      --
      First rule of holes; When in one, stop digging.
    5. Re:Advantages...? by shaitand · · Score: 1

      Not really.

      Essentially when you get down to the most basic level, a single cpu system can only do one thing at a time.

      A dual-cpu system can do two things at a time. A quad can do 4.

      Even if you have a cpu which does do more than one thing at a time, 4 of them will do 4 times as many things at a time.

      Also you can have your 1 fast cpu, or you can have 4 of that same very fast cpu. Do the math, if there was a single cpu as fast as 4, they'd just release dual and quad systems utilizing that cpu?

    6. Re:Advantages...? by kannibal_klown · · Score: 1

      For normal applications (not multi-threaded or written to take advantage of multiple CPU's), you would get no extra performance on singular apps. A quick application thrown together to perform some complex calculation will not run in less time.

      However, while that unoptimized process / application is running at 100% of CPU1, CPU2 is avaiable to do other things. If you try to run another application (like a game) while a complex is running on CPU1, the OS would usually throw the new application on CPU2 so you would not get too much slowdown from the first app.

      BUT... applications can be written to take advantage of the extra CPU's. Using multiple threads is one way. For example, if you start distributing complex things across smaller threads, then each thread would be thrown onto the most available CPU. Then, the program would have the 2 CPU's running parts of the applications, and you'd get more down quicker.

      There are other ways, but I don't want to write an article.

      Some programs are written to take advantage of multiple CPU's, others aren't. I know QUAKE 3 was written to take advantage, but I don't know if the other games based on the engine also see the beneft. Photoshop is also written to take advantage of multiple CPU's.

      But in the end, as someone else stated, even if the program is written to take advantage of multiple CPU's, the relationship isn't exactly 2x (with a dual proc system). But it does help.

  41. it would make 2d place by S3D · · Score: 1, Informative

    According to top 500 supercomputers 35 TFLOPS would put it ino second place after Earth simulation center - 35-40 TFLOPS

    1. Re:it would make 2d place by S3D · · Score: 1

      35 TFLOPS for a total NASA power. This beast itself is 6 TFLOPS - somewhere in 20's - 30's place

    2. Re:it would make 2d place by Anonymous Coward · · Score: 0
      It has a potential peak performance of more than 6 trillion floating-point operations per second (TFLOPS), which will bring the total computing power at NCSA to more than 35 TFLOPS and disk storage to three-quarters of a petabyte.

      They are saying that all of the combined computing power at NCSA will be over 35 TFLOPS, with this SMP machine having a peak performance of 6 TFLOPS.

    3. Re:it would make 2d place by Sangui5 · · Score: 2, Informative

      35 TFLOPS is the peak performance number sitewide. Cobalt itself should be able to clear between 6 and 7, making it a much more modest 25ish place. There are rumours that a bigger cluster-style machine is in the works, once the issues with Tungten (NCSA's biggest and #5 in the world) are ironed out.

  42. Uh oh by dacarr · · Score: 1

    There goes SGI again, violating SCO's copyright! I'm gonna tell Darl McBride on them!

    --
    This sig no verb.
  43. Houston we have a problem.... by Anonymous Coward · · Score: 0

    Why is NCSA buying big iron like this when the National Science Foundation has told them to start steering away buying machines like this?

    And from SGI for crying out loud! What are these people thinking?

  44. 32 is quite BIG by diegocgteleline.es · · Score: 1

    I don't think many people realize how _HUGE_ is a 32 processor box (myself at least I can't even imagine it). As someone already told you, win 2k3 supports 64 CPUs not 32. But I bet they could make it run in ej: 128 cpus with no much effort (if any).

    Why they don't to it? Plain simple: money. That SGI box is a special case because some "national" thing. Enterprises don't use 1024, or even 128 or 64 cpus in most of the cases: The run clusters of small machines, that's what it sells today and that's where Microsoft is putting their eyes.

    1. Re:32 is quite BIG by killjoe · · Score: 1

      "The run clusters of small machines, that's what it sells today and that's where Microsoft is putting their eyes."

      yes because it's well known that linux does not cluster very well and neither does the Mac. It's smart of MS to go after a market where the competition hasn't demonstrated that it can outscale MS without even trying.

      Yea right.

      --
      evil is as evil does
    2. Re:32 is quite BIG by Anonymous Coward · · Score: 0

      So? WHO CARES about performance? Microsoft is where it's because their marketing and their fancie graphical tools. AFAIK Microsoft is preparing a Windows version targetted to clusters.

  45. Impressive... by Pantero+Blanco · · Score: 2, Informative

    ...Right on the heels of this too.

  46. RISC overrated by TheLink · · Score: 3, Informative

    It's ok for embedded and other areas (slower CPUs) but with desktop/server CPUs being much faster than memory speeds and remaining so for the forseeable future, having common and popular instructions being shorter than other instructions is actually an advantage despite the complexity that involves.

    It's like having on-the-fly instruction decompression. e.g. CISC programs tend to be smaller in main memory+cache, and they travel in CISC/"compressed" form taking up less memory bandwidth over the memory/cache buses to the CPU instruction decoder where they are "decompressed" to RISC micro-ops to be executed.

    Look at the mainstream desktop/workstation/server CPUs. Only the SPARC is RISC. IBM POWER/PowerPC is barely RISC[1], some people think it's more CISC than RISC. Itanium isn't RISC. x86 isn't. The rest (Alpha, MIPS, PA-RISC) are either out of the market or on their way out.

    As long as CPUs are fast and much faster than RAM (and cache remaining expensive), it's often worth doing the compression/decompression thing.

    [1] I believe IBM's POWER chips actually decode their "RISC" instructions to simpler instructions, some of their "RISC" instructions are pretty complex- kinda oxymoronic... But as I mentioned, that may not be such a bad thing.

    --
    1. Re:RISC overrated by Anonymous Coward · · Score: 0

      The POWER chips do indeed use microcode(seperate from the micro ops) for complex instructions, as well as the P4 and the Opteron and Athlon.

      My Master's Thesis was on microcode and you're dead right about it saving memory bandwidth.

  47. Actually.. by krutadal · · Score: 2, Informative

    Pentium 4 reduces the CISC instructions to a series of RISC-like "microops" that, for the most complex of the bunch, can take hundreds of cycles to complete.

  48. Scalability of applications by xyote · · Score: 2, Insightful

    Well, we know that the kernel can be made to scale but what about the applications? The same issues the kernel had to face, the applications have to face also. For parallel computing you naturally try to avoid too much sharing by "parallelizing" the programs. For applications like databases, you are talking about a lot of sharing of a lot of data. Not all the techniques the Linux kernel used are available to the applications yet.

    1. Re:Scalability of applications by xtp · · Score: 5, Informative

      SGI has had 512 and 1024-cpu MIPS-based systems in operation for more than 5 years. Much work was done on the Irix systems to initialize large parallel computations and provide libraries and compiler support for these configurations. One technique is to provide message-passing libraries that use shared memory. A better technique is to morph (slightly) parallel mesh apps so that each computational mesh node exposes the array elements to be shared with neighbors. No message-passing needed - you push data after a big iteration and then use the (really fast) sync primitives to launch into the next iteration. With shared-nothing clusters (i.e. Beowulf) a computation (and its memory) must be partitioned among the compute nodes. The improvement over a "classical" cluster can be startling especially with computations that are more communications-bound than compute-bound. This means there is no value for replacing a render farm with a big system. But there are big compute problems, e.g. finite element, for which the shared-nothing cluster is often inadequate.

      With a single memory image system the computation can easily repartition dynamically as the computation proceeds. Its very costly (never say impossible!) to do this on a cluster because you have to physically move memory segments from one machine to another. On the NUMA system you just change a pointer. The hardware is good enough that you don't really have to worry about memory latency.

      And let's not forget io. Folks seem to forget that you can dump any interesting section of the computation to/from the file system with a single io command. On these systems the io bandwidth is limited only by the number of parallel disk channels - a system like the one mentioned in the article can probably sustain a large number of GBytes/sec to the file system.

      Let's not forget page size. The only way you can traverse a few TB of memory without TLB-faulting to death is to have multi-MByte-size pages (because TLB size is limited). SGI allowed a process to map regions of main memory with different page sizes (upto 64 MB I think) at least 10 years ago in order to support large image data base and compute apps.

      When I used to work at SGI (5 years ago) the memory bandwidth at one cpu node was about 800 MBytes/s. My understanding is that the Altix compute nodes now deliver 12 GBytes/s at each memory controller. Although I haven't had a chance to test drive one of these new systems, it sounds like they have gradually been porting well-seasoned Irix algorithms to Linux. It is unlikely that a commodity computer really needs all of this stuff, but I'm looking at a 4-cpu Opteron that could really use many of the memory management improvements.

      g

    2. Re:Scalability of applications by Sangui5 · · Score: 1

      The people writing "applications" for this type of machine are used to writing for message-passing cluster architectures. These are jobs that have a *lot* of inheirant parallelism already. Mostly math-heavy simulation type stuff, such as protein folding, weather simulation, chemical simulation, and the like. Things with large matrix multiplies, billions of independant simulation cells, or massive fourier transforms, all of which are *easy* to parallelize.

      Given that these jobs used to be run on message-passing clusters, writing on a large NUMA shared-memory machine should be easy. At the worst, you simulate MPI and still get performance gains. At the best, you can use the tighter coupling allowed to improve performance and/or run less inheirantly parallel jobs.

    3. Re:Scalability of applications by RageEX · · Score: 1

      mod parent up

    4. Re:Scalability of applications by sparkz · · Score: 1
      Let's not forget Oracle RAC, either. Scalable database, but it requires gigabit interconnects. 4 machines will saturate that with the SGA. Doesn't matter what the OS is, Oracle can't scale above 4 machines because the SGA will fill a gig interconnect.

      128 Dell/Linux boxes? Great - but you can't do anything useful with it - might as well settle for 4 Dell boxes, or spend more cash, get 4 E25K's, and get a truly reliable cluster.

      Let's not forget MTTF (Mean time to failure) - the more hardware you add to a system, the more frequently you will experience failures. Adding cheap x86 boxes massivley increases your MTTF. Adding quality hardware increases MTTF, but adds resilience within the box, so the overall uptime is increased.

      I don't care if I can replace 4 Sun boxes with 64 Linux boxes for half the price - I want uptime, and Sun can provide that, where Dell/Linux can not.

      --
      Author, Shell Scripting : Expert Re
  49. The Mighty Thor by Anonymous Coward · · Score: 0

    is not impressed by these system specs.

  50. The real test by Bruha · · Score: 4, Funny

    Fire up apache and then post a link to it here on slashdot. We love a challenge.

    1. Re:The real test by kasperd · · Score: 1

      Fire up apache and then post a link to it here on slashdot.

      I don't think servers get slashdotted due to lack of CPU/RAM resources, but rather due to insufficient networkbandwidth or bad programming/configuration. A good programmer can make a C64 survive a slashdotting, it might be slow, but it keeps responding. With a 100mbit/s internet connection and a page with not too many pictures, I think any decent server should survive a slashdotting.

      --

      Do you care about the security of your wireless mouse?
  51. Let me clue you in on a few things by justins · · Score: 4, Informative
    You don't pick sun for just "lots of cpus", you pick it for a very scalable OS and amazing hardware that allows for a very, very solid datacenter.

    The UNIX made by SGI (the company making the machine referenced in the article) is more scalable than Solaris. Remember, IRIX was the first OS to scale a single Unix OS image across 512 CPUs. And now they've eclipsed that, with Linux.

    Sun hardware has additional, wonderful resiliency features like - allowing cpu's to "fail-over" to other cpus in case of failure.

    None of that is unique to Sun.

    Finally, since Sun has been doing the "lots of cpus" thing for many years, their process management and scalability tends to be much better.

    Better than what? And says who? They've never decisively convinced the market that they're beter at this than HP, SGI, IBM or Compaq.

    If downtime costs a lot (ie. you lose a lot of money for being down), you should have Sun and/or IBM zseries hardware. Unfortunately those features cost a lot and most times you can use Linux clustering instead for a fraction of the cost and a high percentage of the availability.

    In addition to ignoring the other good Unix architectures out there in a dumb way with this comparison, you're also totally missing the point of the article. Linux supercomputing isn't just about cheap clusters anymore. Expensive UNIX machines on one side and cheap Linux clusters on the other is a false dichotomy.
    --
    Now before I get modded down, I be to remind whoever might read this that what I am saying is FACT. - bogaboga
    1. Re:Let me clue you in on a few things by Anonymous Coward · · Score: 0

      In addition to ignoring the other good Unix architectures out there in a dumb way with this comparison, you're also totally missing the point of the article. Linux supercomputing isn't just about cheap clusters anymore. Expensive UNIX machines on one side and cheap Linux clusters on the other is a false dichotomy.

      Now that you've given your point, let's see some supporting evidence.

  52. Will they publish 1024-processor code under GPL? by jokkebk · · Score: 1

    And what is more interesting, will some of the code that obviously needs to be developed for this kind of system be released under GPL?

    After all, in most cases you'd think that the work would be based on GPL'd stuff like the Linux kernel, and therefore the modifications would need to be licenced under GPL also, bringing a lot of technology available for Linux.

    --
    http://codeandlife.com
  53. Re:Will it be done in time for Quake 3? by Anonymous Coward · · Score: 0

    But what about Duke Nukem Forever?

  54. WRONG by Anonymous Coward · · Score: 0

    No. Stupid Doom 3 framerate jokes in stories about supercomputers are NOT FUNNY ANYMORE.

    They stopped being funny SEVERAL YEARS AGO.

  55. Re:Sun does more than that, but SGI always has by PenguinOpus · · Score: 1

    I realize you were trying to defend Sun, but in this case the vendor (SGI) has far more experience with large systems than Sun does. At every point over the last 16 years (since SGI announced the original PowerSeries on 10/4/88), SGI has always supported more processors running a single OS than Sun. Those processors were MIPS based, but the Altix architecture is derived from the bricks/bus of the Origin servers.

    The flip-side of this is that SGI has been in decline for several years longer than Sun and may have lost some or all of its edge in this area.

    PowerSeries 140 (4x16Mhz MIPS R2000), 10/4/88.

  56. Re:Will it be done in time for Quake 3? by Anonymous Coward · · Score: 0

    Quake 3 is _already_ out, thats why.

  57. 1024 cpus and 3 TB memory by Anonymous Coward · · Score: 4, Funny

    That's almost enough to run Emacs!

    1. Re:1024 cpus and 3 TB memory by Anonymous Coward · · Score: 0

      But not enough to make vi enjoyable.

    2. Re:1024 cpus and 3 TB memory by isorox · · Score: 1

      That's almost enough to run Emacs!


      Now lets not get ahead of ourselves, but it might be enough to install gentoo before it's obsolete.
  58. So, when will Jeff Dike have UML ported to this? by kclittle · · Score: 2, Funny
    1024 physical CPUs running *one* logical host linux image running god knows how many uml instances, each fully independent of the other and seeing 3 TB of memory. The mind boggles! :-)

    --
    Generally, bash is superior to python in those environments where python is not installed.
  59. Well it turned out that RISC.... by carlmenezes · · Score: 1

    was just a little too RISCy

    --
    Find a job you like and you will never work a day in your life.
  60. wrong by Anonymous Coward · · Score: 0

    Read the article again. This machine will be around 6 TFLOPS, with the total combined NCSA computing power (counting *all* of their machines) over 35 TFLOPS.

  61. Re:Will it be done in time for Quake 3? by Ari_Haviv · · Score: 2, Funny

    It's already out...in Japan

    --
    Join Team Mozilla #38050 Folding@home
  62. no no no... by Anonymous Coward · · Score: 0

    A beowulf cluster of these .

  63. Re:really fast? by Ari_Haviv · · Score: 1

    yeah my 2.6 ghz Pentium 4 is really showing its age.

    --
    Join Team Mozilla #38050 Folding@home
  64. Re:Will it be done in time for Quake 3? by Anonymous Coward · · Score: 0

    Maybe b/c quake 3 has been out for years.

  65. It depends on the problem domain by Anonymous Coward · · Score: 0

    The choice between using a shared-memory or message-passing model depends entirely on the problem domain.

    If you have a massive computation which can be divided into independent subtasks, it doesn't need either model. Just compute each subtask on a different processor and collect the results.

    Things get more difficult when the subtasks have to share intermediate results. If this information is primarily needed for coordination between specific processing nodes, then message-passing between them is a good model.

    A shared-memory model is best for representing algorithms which have to share intermediate results across a large or indeterminate population of processors.

  66. You have to start somewhere. by Fuzzums · · Score: 1

    It wouldn't with multi-processor applications if there isn't an os that supports them :)

    --
    Privacy is terrorism.
  67. Wow! by juggaleaux · · Score: 2, Funny

    That much hard drive space rivals my porn collection! :O

  68. Doom III... I mean by supmylO · · Score: 1

    I know a lot of people have said this is just in time for Doom III, but I think this'll be good for Duke Nukem Forever [In Production].

  69. Ah, now Gnome will run at a decent speed! by Moderation+abuser · · Score: 1

    Where do I sign up to get my hands on one of these?

    --
    Government of the people, by corporate executives, for corporate profits.
  70. Re:Sun does more than that, but SGI always has by kscguru · · Score: 1, Interesting
    Sun very well could support that many CPUs. Sun just doesn't sell hardware that has that many (and therefor won't claim to support that many) - mainly because that kind of hardware is so expensive as to make SPARC look cheap!

    My opinion is that Linux on a 1024-way is a spectacularly stupid idea, introduced more for the sexiness of having a 1024-way machine than for any practical benefits. Linux is simply not designed for scaling that large. And there is a huge difference between an OS designed to scale that large, and an OS hacked up to support something that large, without actually making the appropriate design choices. SGI may know about those choices (and probably better than Sun), but I highly doubt they'd throw them into a GPLed Linux kernel - they still want to sell their own version of Unix!

    I expect (yes, a wild pie-in-the-sky guess) that the advantage of a 1024-way machine over a 512-way machine, both running Linux, is going to be maybe 20-30% performance, far from the 100% the numbers might claim or the 70-80% that might be tolarable. For a supercomputer where that 20-30% is irrelevant because no other machine can crunch the data, cool; for everyone else, two 512-ways running unconnected will be better, cheaper, and faster. [At least, until Linux can scale that large... maybe in 5 years or so?]

    --

    A witty [sig] proves nothing. --Voltaire

  71. Re:really fast? by shaitand · · Score: 1

    KDE isn't slow for me either, but it does reserve all my physical memory causing my system to swap.

    I have a gig of ram, believe it or not, I don't feel I should ever have to swap, and greatly prefer to leave my physical ram open for other things than KDE and it's various subcomponents. Gnome doesn't do this, I'll stick with Gnome.

  72. iSeries and pSeries vs. zSeries by r00t · · Score: 1

    If your iSeries or pSeries CPU gets hit by an
    alpha particle, your data gets corrupted.

    The zSeries protects you with mirrored pipelines.

    Of course, for some apps you could simply run
    every computation twice without hardware helping.
    Who does that though? You can be sure that your
    typical database software wasn't written to do that.

    1. Re:iSeries and pSeries vs. zSeries by Anonymous Coward · · Score: 0

      Um... Alpha particles are typically absorbed into things as silly as a typical piece of paper... I pretty well doubt that it's going to wreck havoc, there's too much in the way. Case. Heatsink. Ceramic... eh.

    2. Re:iSeries and pSeries vs. zSeries by r00t · · Score: 1

      They come from within. Everyday stuff, including
      the CPU packaging material and the silicon itself,
      emits alpha particles.

    3. Re:iSeries and pSeries vs. zSeries by passthecrackpipe · · Score: 1

      My understanding of the pSeries - particulalry the POWER5 - is that processor lock-stepping / mirror pipelinig is a feature. Not sure about the iSeries though, but for the pSeries I am pretty sure I have heard a few IBM marketing drones blabbing on about....

      --
      People who think they know everything are a great annoyance to those of us who do.
  73. Re:Ok - SCO Joke Alert by TheScienceKid · · Score: 1

    "genesis@kurta:genesis$ bc
    bc 1.06
    Copyright 1991-1994, 1997, 1998, 2000 Free Software Foundation, Inc.
    This is free software with ABSOLUTELY NO WARRANTY.
    For details type `warranty'.
    715776/1024
    699"

    Don't be silly... you won't have to pay a totally bogus (Bill!) SCO fee... that joke is, like, so old now, it, like, literally, figuratively, as old as the, like, hills. I thought most excellent (Ted!) SGI had licensed SVR4 in perpetuity regardless?

  74. Are dtrongholds... by TheScienceKid · · Score: 1

    ...a part of dtrace? My goodness... they only just got that and now you say they're losing it?

  75. In related news...two 2k processor... by xyphor · · Score: 1

    The US Army Research Laboratory will be receiving two 2,000+ precessor super computers running Linux. ARL already has a 256 processor Linux cluster that throws off enough heat to cook the staff dinner...gotta wonder how much these suckers will dissipate.

    1. Re:In related news...two 2k processor... by Anonymous Coward · · Score: 0

      For the U.S. Military? I wonder if they can get over not having MS Windows and floppy drives.

  76. Cray: Linux on 10,000 CPUs by Anonymous Coward · · Score: 0

    1024? So what: http://zdnet.com.com/2100-1103_2-5097398.html

    1. Re:Cray: Linux on 10,000 CPUs by diegocgteleline.es · · Score: 1

      cluster != single machine

  77. Not likely - Same Machine for $1k in 14 years. by DanielJH · · Score: 2, Interesting

    The point here is that if performance continues to grow like it is today, they will be selling these machines for $1,000 at Walmart in just 14 years. It will be about the same size as the computer you own now.

    The problem with 1024CPU is much more then just the operating system. It is a mess of communication hardware needed to wire everything together. It is about special power feeds and air conditioning, and sometimes floor loading requirements.

    Take a quick look at the end of this PDF. It talks about heat output and the need for 3 phase 240V power coming into this computer. It is not unusual to hire both an electricial and a cooling expert when you talk about installing one of these babies. Not for the Home user, and never will be, however, idential compute power comming in just 14 years, so get ready...

    1. Re:Not likely - Same Machine for $1k in 14 years. by isorox · · Score: 2, Interesting

      Indeed, we're implementing a 24 bay system at the moment, in a brand new apps room off one of our current ones (which happens to have about 100 bays, most of the overflowing), so, yes, power is a problem, and cooling doubly so. (One apps room is currently responsible for two 24 hour tv channels and barely has a backup AC unit (it may work if we shut down some of the less-essential equipment).

  78. Not sure if your serious but lets explain. by SmallFurryCreature · · Score: 3, Informative

    I will avoid the tech terms (partly because they would confuse you, partly because I don't know them all but mostly because they ain't needed.

    A single CPU computer can execute ONE instruction at the time. Meaning one program thread running at the time. But wait you say, my OS can run multiple programs at the same time. WRONG. It can't. It is a trick. It is running one program at the time but it is switching the program it is running really fast. There is however a problem with this. When it has switched to a program all the other programs are effectevily at the the mercy of the program now running INCLUDING the OS. Wich is why DOS and Windows and Linux and Mac OS and all the others had "hangups". With an extremely well written OS these hangups (when a program doesn't switch back to the OS) can be avoided but it still remains a case that all the programs and the OS are fighting for time on 1 single cpu.

    So what happens when you add a cpu? Well a lot less switching PLUS if a program for whatever reason does not switch properly the OS can still be run on the other processor. Just making a windows box a dual CPU instantly makes it far more robuust. I encountered this myself with an old dell P3 that had a dual board but no dual CPU installed. Before I added a second CPU it was the usual windows crap of hangs and reboots and BSoD. Afterwards it ran as stable as a unix machine. Simple things like openeing a complex folder in exploder no longer "froze" the desktop as it could simple run exploder on one CPU and say word or my mp3 player on the other.

    Don't forget too that there think like ATA harddrives and CD-ROM need the cpu to drive them. This takes a lot of long cycles and a lot of waiting, not so much CPU power as just time on the CPU. With a second one to do all the other tasks this makes everything run far smoother.

    So what is better? Running 1 2ghz cpu or 2 1ghz cpu's? Depends. If you are running 1 program thread go with the 1 cpu. It will take all the cpu time but will not need to share it. If however you are running countless small threads go with the 2 or more solution. Threads will have access faster and you will loose less cpu time on the time needed to execute switches.

    Oh yeah that is another problem. Switching between programs takes cpu time as well. It is not unknown for single CPU systems to spend so much time on switching they don't have time to run anything anymore. The old to many running programs problem known from windows but wich affects every OS.

    Lastly there is a simple problem. Say you want real power do you go for a quad 2ghz or a single 8ghz. Answer? It is a trick, no such thing as a 8ghz cpu.

    If you get the chance buy a second hand dual P3 and install windows 2000+ or Linux on it and be amazed. That old system will respond a lot faster underload then your 3ghz monster.

    --

    MMO Quests are like orgasms:

    You may solo them, I prefer them in a group.

    1. Re:Not sure if your serious but lets explain. by bogaboga · · Score: 1

      You really explained very very well. In my situation, I am just doing the basic office work. Real loads on the system are seen when backing up later at 1130 hours. I even do not know whether the apps are written to take advantage of quad or more CPUs. Any way, thanks a lot.

    2. Re:Not sure if your serious but lets explain. by Anonymous Coward · · Score: 0

      Actually, in an OS that is preemptively multitasked, such as Linux, BSD, the Commercial UNIXes, and MacOS X, a user-space program is at the mercy of the kernel when it comes to CPU time. It cannot hog the CPU; it does not control when the thread of execution is passed back to the kernel (this is known as task-switching and has been possible since the 80386 on intel cpus, and for *much* longer on other architectures).

      DOS had no multitasking whatsoever; this is why it is sometimes called a glorified boot loader. It runs everything in unprotected, one-thread mode. This is because the system it was originally designed for could not handle multitasking.

      Windows (up until the NT line) was what is called cooperatively multitasked. In such a setup, the OS passes a thread of execution to a user space program, and essentially depends on that program voluntarily (cooperatively) passing control back to the kernel. The kernel could take control forcibly on blocking events, such as disk reads, sockets, that sort of thing, but an infinite loop doing completely in-memory execution could hang the system.

      This is why Linux never hangs, but Windows often did: in Linux, a badly-written user space program couldn't do anything without the kernel's permission, but in Windows this wasn't true. NT changed all of this.

      Now, when NT-based Windows hangs, it's usually because the kernel (which is really quite stable, to be fair) passes control to 3rd-party driver *in kernel space*, and that driver has a bug. This is not really Microsoft's fault, and is why Microsoft tries hard to write generic drivers for all its hardware. This is also why in Linux, we want to avoid binary only drivers at all costs -- we can't debug them. Incidentally, this is why folks like Andy Tannenbaum and the HURD group prefer a microkernel architecture, which executes drivers (for example) in a seperate space that isn't microkernel space but also isn't user space, preventing crashes. Linux hasn't encountered many problems in this arena yet because it hasn't been adopted wholesale by private hardware manufacturers, but the Nvidia drivers used by so many geeks (and the oopses and panics that have come with them) foreshadow Linux's future if something drastic doesn't change soon.

      Incidentally, back when hardware was less beefy, there were a number of geeky game developers that said that Windows 98SE was actually a better gaming platform than Linux/NT/BSD not because it was a better OS, but because it could take over the system's CPU without having the thread of execution arbitrarily stolen by the kernel, thus allowing smooth, real time rendering of complex 3D landscapes and the like.

      As computers get more powerful, this is less of a concern.

      Further, most OSs (including Linux) have something called a scheduler which determines how much CPU priority a process should get, and these are getting smarter all the time. Games can be given higher priority now. Given too that 98 is no longer in direct competition with Linux for games (NT is) the technical arguments against Gaming on Linux no longer hold water, as NT is also a premptively multitasking kernel. Of course popularity and marketshare remain an issue :)

      But understand that Windows' current instability is mostly due to binary-only drivers. Linux doesn't have many of these yet. I know Linus has long stood by his choice of a monolithic kernel architecture, but at some point it might be benefitial to create a non-kernel non-user space where non-free drivers can execute in a sandbox. Otherwise, if Linux gains on the desktop, we will end up just as unstable as Windows when Joe User modprobes his buggy printer driver into the kernel. And just as we laugh sarcastically when MS says it isn't their fault, so will Joe User laugh at us. We need to fix this before it gets out of control.

    3. Re:Not sure if your serious but lets explain. by John+Courtland · · Score: 1

      Apps don't have to be written explicitly for multi-processor systems. The OS (any OS worth a damn, at least) has a scheduler that can dole out available CPU time for tasks. If it sees that it has two processors available, it can keep piling all the load-intensive jobs to one, and keep the UI on the other so that you don't encounter the lag a burdened system has. Basically, you can imagine programs as "threads of execution," which may have their own "threads of execution." Each thread, if programmed properly (even on a single CPU system!) should be able to execute on its own without causing race conditions, data corruption, etc. Sometimes programmers have to insert blocks to pause the thread until the data is ready from another thread, or change the problem entirely so it fits a threaded model. This might seem like a pain in the ass, and it sometimes is, but the great part about threading is that if you are programming on a multiprocessor box, there is a possibility that a thread your program generates may execute on a seperate processor. This really, REALLY speeds up tasks that can be threaded and can be very helpful when a system has a lot to do at once. Here's a good example:

      You have a program that generates reports. You need paper copies of these reports, which may be 100 pages long. In a single-threaded model, this program would act very slowly during printing, or may not act at all. In a multi-threaded model, your printing routine can be sent to a seperate thread, and that thread has its own execution time given to it by the system scheduler. The original program's responsiveness won't go down (unless printing eats 100% of your CPU). That's on a single CPU system. On a dual, the printing may get use of the other CPU, even though it's part of the same program. That way, TWO threads can concurrently run, keeping the most important processes as active as possible (usually the UI on a desktop machine, no one likes a laggy mouse).

      I hope this was helpful.

      --
      Slashdot is proof that Sturgeon's Law applies to mankind.
    4. Re:Not sure if your serious but lets explain. by Paul+Jakma · · Score: 2, Informative

      A single CPU computer can execute ONE instruction at the time.

      Incorrect, a modern superscalar CPU can execute several instructions at the same time potentially. The pentium was the first Intel CPU able (very crudely) to do this, the P6 was 3-way superscalar (iirc - there was an article linked to on /. about it recently), able to retire (ie execute) 3 instructions per clock cycle. This implies some kind of pipeline (ie the processor must fetch several instructions at the same time from RAM and examine them and decide how to schedule them), which implies that actually such a CPU at any given time has a whole bunch of instructions in different stages of execution.

      Meaning one program thread running at the time.

      It does not mean that at all.

      Most CPUs only support a single context of execution, however some CPUs support multiple execution contexts, intel "HyperThreading" would be one example. So a superscalar CPU with multiple execution contexts could have many instructions in several stages of execution from multiple programme contexts at any given point in time.

      When it has switched to a program all the other programs are effectevily at the the mercy of the program now running INCLUDING the OS. Wich is why DOS and Windows and Linux and Mac OS and all the others had "hangups".

      You're describing co-operatively multi-tasking operating systems, which linux is not. Ie systems like Windows 3 and MacOS 9 and earlier.

      Linux is a preemptive multi-tasking OS, as is MacOS X, WinNT/2k and (partly) Win9x. Under such a system programmes are given only limited periods of time to run, a programme which does not yield control of the system by itself will be suspended eventually. Typically this is done by the OS setting a hardware timer, upon the expiration of which the hardware forcibly returns control to the OS, where upon it can elect to give control of the system to another process (setting that timer again, if needs be). On most hardware this is done with a timer interrupt, eg IRQ 0 on PC class machines, which fires at a preprogrammed interval (100 (older linux) or 1000HZ (2.6) or 1024HZ (Linux or Digital Unix on Alpha)), when the interrupt goes off, the CPU saves the state of the process (as it always does to handle interrupts) and runs the appropriate interrupt vector as installed by the operating system, which can then elect to run another process (usually requiring the OS to save some state of current running process which CPU hadnt saved or which is OS dependent and then restore state of another process).

      Anyway, a process on Linux can *NOT* "hang" the system by refusing to yield control. The OS (with help from hardware) will intervene.

      still remains a case that all the programs and the OS are fighting for time on 1 single cpu

      This isnt really a good mental image to have of a modern OS. The OS does not "fight for time". The OS only ever runs because:

      1. A process calls the OS to perform some service on that processes behalf.

      Eg, to do work on that processes behalf such as IO (read/write from disk/network/whatever or IPC IO and deliver it to process/destination), or to setup the OS abstractions needed for IO (filehandles typically on Unix) or to interact with OS abstractions, eg to list a directory or running processes or send a signal to a process, etc.

      1a. A subset of 1, where a process calls the OS to voluntarily yield control of the CPU it is executing on. The OS potentially can do some housekeeping here before restoring state of another process and allowing it to run.

      2. The hardware directly intervenes and executes OS installed functions, typically in response to an interrupt generated by a timer or other hardware or else some exceptional event (typically a memory fault where a memory address is referenced that does not "exist").

      Operating systems will typically try to do as little as possible work in the latter case and will try defer as much as p

      --
      I use Friend/Foe + mod-point modifiers as a karma/reputation system.
    5. Re:Not sure if your serious but lets explain. by MoralHazard · · Score: 1

      You avoid the tech terms because you don't know them. This is such a broken explanation that it's not even funny. I'll just take the obvious one, and let someone else pick on the rest:

      "A single CPU computer can execute ONE instruction at the time. Meaning one program thread running at the time. But wait you say, my OS can run multiple programs at the same time. WRONG. It can't. It is a trick. It is running one program at the time but it is switching the program it is running really fast. There is however a problem with this. When it has switched to a program all the other programs are effectevily at the the mercy of the program now running INCLUDING the OS. Wich is why DOS and Windows and Linux and Mac OS and all the others had "hangups". With an extremely well written OS these hangups (when a program doesn't switch back to the OS) can be avoided but it still remains a case that all the programs and the OS are fighting for time on 1 single cpu."

      NO. The ability to have multiple processes at once in a "runnable" state, with the CPU switching rapidly between them (called "multitasking"), is NOT the same beast across all those OSs that you just lumped together. MS-DOS was never capable of multi-tasking--it's a single-user, single-process OS. Mac OS=9 and Windows 95/98 are multi-tasking OSs, behaving somewhat as you described--one misbehaving program can prevent the kernel from asserting control of the hardware. But every modern OS, including Linux, Windows NT/2K/XP/2K3, the BSDs, and OSX, has a dependable scheduler that will maintain control in the face of any non-kernel process--the kernel determines when the currently-executing process will let go, not the process itself. On Linux and the BSDs, a mechanism that will let an arbitrary user program crash the machine in this fashion is usually considered a bug in its own right. It is possibly for user processes to tie up system resources (like hard disks) indefinitely, which can be hella inconvenient, but you can (barring bugs) always get a shell in edgewise to issue a "umount" or "shutdown -h now".

      Also, I have run and maintained Windows 2000/XP on dozens of workstations, for years, and I think that the problems you describe are best attributed to flaky hardware or unstable drivers. There is no reason on a WinNT 5.x kernel why adding a second processor would cause less BSODs.

      Also, I'm assuming you're running 2K or XP, because 98/ME can't even use a second processor. That means that you had to have re-installed Windows to make use of that second processor at all (to rebuild the hardware abstraction layer). If you did, re-installing Windows probably cleaned up some driver issues (maybe got rid of some malware you had lying around, too), which in turn explains why your machine is more "stable". If you didn't re-install Windows, or if you're running a non-NT kernel (95/98/ME), you aren't even using your second processor, ever. The OS is ignoring it completely.

      And lastly, be careful with your definition of "stable"--there's a big difference between responsiveness in the UI and problems with the system actually crashing.

  79. Modules? by onyx+pi · · Score: 1

    Internally referred to as "bricks".

    1. Re:Modules? by ddmau · · Score: 1

      Yes..you are correct. The "internal" name for the system when it was being developed (and at least at the time I left) was "Lego"

  80. Re: Btw have you read the code? by Anonymous Coward · · Score: 0

    Yes, I have.

  81. Re:Will they publish 1024-processor code under GPL by noselasd · · Score: 1

    Well, if you look at the kernel changelogs, e.g.
    linux.bkbits.com:8080/linux-2.4 and
    linux.bkbits.com:8080/linux-2.6
    you'll see that SGI already contributed alot.

  82. Memory not the only issue by Prof.+Pi · · Score: 1
    CISC was invented in a time that the memory was small, in the CISC way you could store larger programs in the same amount of memory.

    That wasn't the only issue. Compiler technology (especially back-end optimization) was less mature then. CPU architects added lots of support for high-level language constructs (look at all the string ops and BCD support in x86) so that compilers could translate to those ops directly. The alternatives were function calls (with high overheads) or inlining small code fragments to do the complex instructions (which would often not compile very efficiently). David Patterson called this the "semantic gap" (Communications of the ACM, January 1985).

    As we now know, this support was costly. My favorite example was Zilog's Z-80 8-bit CPU, which extended the Intel 8080's instruction set with an indexed addressing mode, designed to make it easier for compilers to generate references to local variables in the stack. Unfortunately, the added instructions were a lot slower. With hand-coding, I was able to speed up some critical leaf functions 5x just by avoiding all the indexed instructions and sticking to the old 8080 set, and by loading the variables I needed (not many) into registers at the start.

    As compiler back-ends started to approach assembly-code programmers in efficiency, the benefits of such high-level instructions lessened.

  83. Another side of Intel: AMD's SUPERCLUSTERS. by Anonymous Coward · · Score: 0
    Why not 16384 x2 Opterons 248 with 16384 x 4GiB of DRAM and 16384 x5 HDD SATA 250GB RAID5?

    This image can reach 140 TiFLOPS, WOW!!!
    This image can use 64 TiB of DRAM, WOW!!!
    This image can use 64TiBx50= 3200 TiB of virtual memory, WOW!!!
    This image can use 16384 TB of HDD, WOW!!!

    How increible is this SUPERCLUSTER?

    open4free ©

  84. 1024 processor IRIX in 2002 by green+pizza · · Score: 1

    Remember, IRIX was the first OS to scale a single Unix OS image across 512 CPUs. And now they've eclipsed that, with Linux.

    Just to clear things up:
    SGI's first 1024p single-image supercomputer was an Origin 3800 running a customized "XXL" IRIX 6.5 kernel. This was in August 2002, almost 2 years ago.
    http://www.sgi.com/newsroom/press_releases/2002/au gust/nasa.html

    1. Re:1024 processor IRIX in 2002 by Bishop · · Score: 1

      when SGI aquired Cray in 1996, SGI also aquired the T3E running Unicos on 2048 processors. The Origin 2000 ran IRIX on 128 processors in 1997 and later 512 in 1999 IIRC. ASCI Blue Mountain (1998) was a 6144 processor machine based on the same tech as the Origin 2000. I do not know if this machine runs a single image of IRIX, but I believe it does.

  85. not too difficult by green+pizza · · Score: 1

    ...how easy it is to install printer and sound drivers?

    I don't know about the Altix, but on SGI's Origins you can install a $50 M-Audio Revolution 7.1 PCI card and it'll work right out of the box. Works on any SGI using the IRIX64 kernel. I have one in an old Origin 200 as part of my streaming audio setup. SGI also sells an assortment of consumerish to pro audio cards and video i/o capture cards.

    As far as printer drivers, I'm assuming their Altix setup is similar to their Origin setup, which now uses a commercial CUPS printing subsystem. ESP PrintPro I think. Plenty of drivers to choose from if you don't already have a nice laser/colorlaser with built-in postscript and ethernet.

  86. Re:really fast? by sigaar · · Score: 1

    " yeah my 2.6 ghz Pentium 4 is really showing its age.
    Join Team Mozilla #38050 Folding@home"

    Maybe if you switch off Folding@home you'll actually be able to experience the power of that CPU.

    With enough memory there's no need for KDE to be slow. My housemate's P-II 300 with 256MB runs KDE quite nicely, even with OpenOffice and Mozilla both open. Definitely no "slower" than the Windows 2000 GUI

    --
    sigaar
  87. That is Longhorn's Minimum Requirement Spec by Anonymous Coward · · Score: 0

    And if you want to run Office on top of it, you'll have to double the CPU's and triple the memory.

  88. Re:really fast? by Ari_Haviv · · Score: 1

    hehehe got me there. But I don't fold when responsiveness is crucial.

    --
    Join Team Mozilla #38050 Folding@home
  89. Faster? by Anonymous Coward · · Score: 0

    don't get me wrong, I understand the reliability advatages of the zSeries hardware. You can set the suckers up for 100% availability, if you have the money. Nothing else comes close, with sysplexing, the hardware error detection and recovery, and the sheer stability of the various mainframe OSes (linux not withstanding). But they're not fast machines, and aren't designed to be. They're designed for data processing throughput and reliability.

    And they're also kind of expensive. The maintenance contracts, etc. are a lot anually, and you need to find and hire increasingly scarce resources to admin the machines. Sometimes, though, that's worth it; it depends on your needs.

  90. In related news... by Garabito · · Score: 1

    The SCO Group (SCOX) sues both Silicon Graphics and the National Center for Supercomputing Applications (NCSA) for copyright infrigment of their UNIX Intellectual Property.

  91. In my garage ... by CyBlue · · Score: 3, Funny

    I've been working all weekend to cluster 4 Honda Civics. When I'm done, I expect it to go 280MPH, get 12MPG and 0-60 in under 3 seconds.

    1. Re:In my garage ... by Anonymous Coward · · Score: 0

      I would suggest you read "The Mythical Man-month"...

    2. Re:In my garage ... by mindfucker · · Score: 2, Funny

      Ha. I can get your Civic to do that without modifying it at all. Just push it off a cliff.

  92. Re: few penalties of TLB misses, :P by Anonymous Coward · · Score: 0
    For Virtual Memory, it's better in ? KiB-pages than ? MiB-pages due to his slowest response in loading pages from hard disk's swap partition :P

    In my opinion, for a 64-bit system, 16 KiB, 32 KiB and 64 KiB are better sizes without big performance's impact. ;)

    open4free ©

  93. Re: few penalties of TLB misses, :P by Anonymous Coward · · Score: 0
    Better fine grain (KiB-pages) than huge grain (MiB-pages), so the RAM is easily more fragmentable when there many tiny processes loading quickly ;)

    open4free ©

  94. Scalability of sorts by Decaff · · Score: 3, Informative

    The UNIX made by SGI (the company making the machine referenced in the article) is more scalable than Solaris. Remember, IRIX was the first OS to scale a single Unix OS image across 512 CPUs. And now they've eclipsed that, with Linux.

    Scalability is a complex issue. SGI has put a whole lot of processors together and put a single Linux image on it (so that a single program can use all memory), but this says nothing about how that setup will actually perform for general purpose use. Just because the hardware allows threads on hundreds of processors to make calls into a single Linux kernel, does not mean that there will not be major performance issues if this actually happens.

    There are performance issues with memory even on single processor systems with nominally a single large address space, and a developer may need to put a lot of work into ensuring that data is arranged to make best use of the various levels of cache.

    Many of the multi-processor architectures require even greater care to ensure that the processors are actually used effectively.

    The fact that a single Linux image has been attached to hundreds of processors is no indication of scalability. A certain program may scale well, or not.

    1. Re:Scalability of sorts by justins · · Score: 1
      The fact that a single Linux image has been attached to hundreds of processors is no indication of scalability. A certain program may scale well, or not.

      Well are you skeptical about Linux, or just skeptical about an SSI that large? I mean, you voice skepticism about Linux, but your objections are general enough to cover any large SSI system.

      I mean, is there any reason to think Linux will perform significantly worse than IRIX in this regard? (I say "significantly" because of course IRIX should perform a bit better just by virtue of being more mature, I would think. But who knows.)
      --
      Now before I get modded down, I be to remind whoever might read this that what I am saying is FACT. - bogaboga
    2. Re:Scalability of sorts by Decaff · · Score: 1

      Well are you skeptical about Linux, or just skeptical about an SSI that large?

      I'm skeptical about any system trying to do this. I think its only going to work with very careful coding. My feeling is that this isn't really 'scalability', its just giving specialised developers in highly specific situation the opportunity to use more CPU power. (As someone interested in numeric work, I think this is a good thing).

      I mean, is there any reason to think Linux will perform significantly worse than IRIX in this regard? (I say "significantly" because of course IRIX should perform a bit better just by virtue of being more mature,

      I would argue that IRIX (and Solaris) should perform and scale better on multiple CPUs because they are very much mature in this respect than Linux. Linux will catch up in time, but in spite of all the good work that has been put into the kernel, I just can't believe it can yet match the multiprocessing efficiency of some proprietary kernels that have been highly tuned for this for many years.

    3. Re:Scalability of sorts by Anonymous Coward · · Score: 0

      I'm skeptical about any system trying to do this. I think its only going to work with very careful coding. My feeling is that this isn't really 'scalability', its just giving specialised developers in highly specific situation the opportunity to use more CPU power. (As someone interested in numeric work, I think this is a good thing).

      This is the very definition of scalability. SGI are encountering all the same old problems (and a few new ones) on 512 CPU machines. Serialisation, lock hold / frequency, cacheline sharing, etc.

      Their smaller Altix systems (still larger than the largest Sun system) are also used very much for IO, networking, etc. and not just userspace computation (not sure what sort of workloads these largest ones do).

      I would argue that IRIX (and Solaris) should perform and scale better on multiple CPUs because they are very much mature in this respect than Linux. Linux will catch up in time, but in spite of all the good work that has been put into the kernel, I just can't believe it can yet match the multiprocessing efficiency of some proprietary kernels that have been highly tuned for this for many years.

      That has absolutely no bearing on anything. DOS is more mature than Linux ferchistsake. IRIX, sure, that is more scalable than Linux. Solaris? doubtful. Solaris was traditionally able to do quite well on Sun's big systems, but it is helped along by the fact that they use a brute force, very expensive interconnect (the crossbar switch), which is now showing its age, as Sun cannot push it any further due to prohibitive cost (double your capacity, and you quadruple the size of the crossbar). There is no way Solaris could keep up with Linux on a fancy, modern interconnect like SGI's NUMAflex.

    4. Re:Scalability of sorts by joib · · Score: 1


      The fact that a single Linux image has been attached to hundreds of processors is no indication of scalability. A certain program may scale well, or not.


      The fact that NCSA is forking over a rather sizeable chunk of cash for this machine indicates that at least for their applications, they get good scalability.

    5. Re:Scalability of sorts by Decaff · · Score: 1

      This is the very definition of scalability.

      I really don't think it is, as I think scalability depends on what you are trying to do. For example, I was dealing with both distributed systems and vector architectures in the 80s and 90s - neither was generally 'scalable', unless you stated what the specific problem was (I came across many examples of developers trying to throw all sorts of problems at one or other of these architectures and failing badly in many cases). The same applies, I think to systems which are highly non-uniform in memory architecture. They can be a waste unless you code for them correctly. Something like a well-written Fortran program may be able to great performance, but it may be pointless to try and run a high-load web server.

      Their smaller Altix systems (still larger than the largest Sun system) are also used very much for IO, networking, etc. and not just userspace computation

      Are they? From past experience, and looking at websites now, SGI has always been a provider of primarily technical and numeric workstations and machines - things like video and imaging computation and chemical/biochemical modelling. These tend to be things that NUMA would be appropriate for, as the problems can be divided up into localised spaces. On the contrary, Sun has tended to be used for commerce, with less predictable multi-threaded work, where NUMA is likely to be a lot less effective.

      That has absolutely no bearing on anything. DOS is more mature than Linux ferchistsake

      What I was implying was that these systems were more mature in terms of tuning for multiprocessing, and I think I put that reasonably clearly.

      There is no way Solaris could keep up with Linux on a fancy, modern interconnect like SGI's NUMAflex.

      I don't see why not, as Solaris is just an operating system - your point is about hardware, and Solaris can run on Intel and AMD, and presumably make use of NUMAflex.

    6. Re:Scalability of sorts by Decaff · · Score: 1

      at least for their applications, they get good scalability.

      But that is exactly my point - its for their applications. But would you get scalable performance for a Web server? C compiling? Doom III?

    7. Re:Scalability of sorts by Anonymous Coward · · Score: 0

      I really don't think it is, as I think scalability depends on what you are trying to do.

      Yeah, of course. In this context, we're talking about the Linux kernel. In the capacity that SGI uses it, it is obviously scalable.

      Their smaller Altix systems (still larger than the largest Sun system) are also used very much for IO, networking, etc. and not just userspace computation

      Are they?


      Yes.

      From past experience, and looking at websites now, SGI has always been a provider of primarily technical and numeric workstations and machines - things like video and imaging computation and chemical/biochemical modelling. These tend to be things that NUMA would be appropriate for, as the problems can be divided up into localised spaces. On the contrary, Sun has tended to be used for commerce, with less predictable multi-threaded work, where NUMA is likely to be a lot less effective.

      Funny. You'd better tell Sun that, because their architecture is NUMA.

      What I was implying was that these systems were more mature in terms of tuning for multiprocessing, and I think I put that reasonably clearly.

      Well, they are more mature in terms of single processing as well, but Linux has been able to beat practially every general purpose UNIX OS out there in fundamental things like system call, context switch, network packet, etc overhead for quite a while.

      Maturity doesn't mean much. Development, direction, resources, etc. mean a lot.

      There is no way Solaris could keep up with Linux on a fancy, modern interconnect like SGI's NUMAflex.

      I don't see why not, as Solaris is just an operating system - your point is about hardware, and Solaris can run on Intel and AMD, and presumably make use of NUMAflex.


      Well no it can't make use of NUMAflex because it can't run on IA-64 or MIPS...

      It possibly might, but if so, it beats me why Sun doesn't ditch their crappy old interconnect and go for something with a bit more finesse like SGI's stuff.

    8. Re:Scalability of sorts by Decaff · · Score: 1

      Funny. You'd better tell Sun that, because their architecture is NUMA.

      I'm not an expert on this, but companies like Oracle clearly state that Sun is SMP, as are smaller systems like Dell and HP, whereas systems like Sequent (now IBM) and SGI are NUMA.

      but Linux has been able to beat practially every general purpose UNIX OS out there in fundamental things like system call, context switch, network packet, etc overhead for quite a while.

      I don't mean to be annoying, but I'm skeptical. However, I'm prepared to be convinced by statistics, and I would find this very interesting. Do you have any?

    9. Re:Scalability of sorts by Anonymous Coward · · Score: 0

      I'm not an expert on this, but companies like Oracle clearly state that Sun is SMP, as are smaller systems like Dell and HP, whereas systems like Sequent (now IBM) and SGI are NUMA.

      Read Sun's own paper I linked to on their site and realise NUMA means non uniform memory access. Plain and simple, Sun's architecture doesn't have uniform access to all memory, hence it is NUMA by definition.

      I don't mean to be annoying, but I'm skeptical. However, I'm prepared to be convinced by statistics, and I would find this very interesting. Do you have any?

      Well, you weren't very convinced about Sun's stats telling you their system was NUMA... but here is about where Linux started beating Solaris at single threaded stuff (benchmarks of different opterating systems on the same hardware are very difficult to come by on the web).

      here is one a bit later. So far we have Linux handily beating Solaris even on SPARC hardware.

      here is one fairly recently. (note that Linux and solaris weren't run on the same machine here, or even the same architecture, but I figure the 700 mhz P3 Xeon 1 M is roughly equivalent to the 900 mhz Ultrasparc III 8 Mb cache. Probably the SPARC even has the edge.

      Now also just think about why everyone would be out to beat Linux if it is so slow? Why would Sun be trumpeting that Solaris 10's TCP stack is 30% quicker than Linux? Why does Microsoft try to rig all these benchmarks to show NT beating Linux?

      Those companies really have the resources and no legal obstacles to do comprehensive testing and benchmarking of their product vs Linux, so if they really are faster, and wanted to show that, they could just publish the comprehensive results to open, reproducable tests. Instead they sneak around sniping here and there when they get the chance...

  95. Double nothing is twice nothing by Anonymous Coward · · Score: 0

    Itanic is too expensive and too late.

    At 1024 CPUs sharing the same memory bank, I would say they are also making an archiectual mistake. There will be a memory bus bandwidth problem for sure.

    Beowolf would be a much better choice. And use AMD 64.

    1. Re:Double nothing is twice nothing by Anonymous Coward · · Score: 0

      I would guess that SGI has a better idea on how to do super computers than do you.

    2. Re:Double nothing is twice nothing by Anonymous Coward · · Score: 0

      lol you make AMD fanboys look stupid ..wait

    3. Re:Double nothing is twice nothing by Hoser+McMoose · · Score: 1

      Uhh, you do realize that you are totally talking out of your ass, right?

      These SGI systems have TERMENDOUS memory bandwidth. The whole design is basically centered around memory bandwidth. The whole idea of going for this single-image shared memory space type of system is that it offers much more memory bandwidth.

      A Beowulf cluster, on the other hand, has *terrible* internode bandwidth for memory, limited entirely by your network architecture which is usually orders of magnitude slower than native system bandwidth. For many problems this isn't a big bottleneck and the huge cost savings you get from COTS parts can often more than make up for it (ie who cares if it only scales half as well when you get 4 times as many processors), but that certainly is not always the case. Sometimes Beowulf-style clusters just do NOT scale at all, no matter how many chips you throw at the problem.

  96. This is /almost/ a huge milestone by gazbo · · Score: 1, Insightful
    The real difficulty is getting past the 1024 mark - once you get over 2^10 nodes (2^16 minus 6 status bits), all sorts of assumptions in the multi-CPU scheduling algorithm break, and overflows can occur all over the place.

    Let's hope we hear stories about a 1025 node machine soon!

  97. Re:Sun does more than that, but SGI always has by (negative+video) · · Score: 1
    My opinion is that Linux on a 1024-way is a spectacularly stupid idea, introduced more for the sexiness of having a 1024-way machine than for any practical benefits. Linux is simply not designed for scaling that large.
    Not necessarily true. The NCSA's workload will probably be large 3D physics simulations. To scale well for that, you need
    1. A scheduler that runs out of local data structures as much as possible.
    2. A memory manager that stores the thread's page tables locally.
    3. Forcible processor affinity for threads.
    4. Forcible processor affinity for memory allocation.
    5. Competent use of communication hardware, esp. semaphores.
    All of these problems are either solved or trivially solvable. Not surprising, as the OS's biggest job for this sort of workload is to get the hell out of the way.
  98. Shut the fuck up. by Anonymous Coward · · Score: 0

    That hasn't been funny for a long time. Stop posting.

  99. Re:no jerk. stop using aol. learn english. by Anonymous Coward · · Score: 0

    no jerk. I'm tired of all these retarded geeks
    that think they are experts on the English language. Any decent student of the English language will know there no such thing as a hard and fast rule in the language. Try reading james joyce. He broke plenty of "rules". There's no ISO or IEEE standard for the english language. There's the MLA but nobody follows every rule in that thing. It's not like programming, wanker, most of the time the idea is to get your point across as clearly and succintly (and if your lucky, subtly) as possible.

  100. Location, Location... by jaybird144 · · Score: 1, Interesting

    I wonder where it will be housed...NCSA's new building isn't complete yet. And it doesn't seem like they would install it only to move it a few months later, does it?

  101. Correctable RAM and L2 errors? by AtariDatacenter · · Score: 2, Informative

    Being an administrator of some 24-way boxes, I have to ask a more detailed question about the error handling. Is the L2 cache in the CPUs just ECC'd, Parity, or fully mirrored? You'll find that on a large installation of CPUs, not being fully mirrored on your L2 will cause quite a bit of downtime over the course of a year with that many CPUs. I don't have those Itanium 2 specs. Anyone?

    UPDATE: I looked. Itanium 2's L2 cache is ECC. It'll correct a 1 bit failure, detect and die on a 2 bit failure. Believe it or not, on a large number of CPUs running over a long period of time, it happens more often than you think. It also says it has an L3. No idea on the L3 cache protection method used. Because they don't say, I'd also guess ECC. Wheee! Lots of high speed RAM around the CPU with ECC protection. Well, nobody called this an enterprise solution, so I guess its okay.

    Also, you're going to have regular issues with soft ECC errors on that many TB of RAM. And then your eventual outright failures that'll bring down the whole image of the OS. (An OS could potentially handle it 'gracefully' by seeing if there is a userspace process on that page and killing/segfaulting it, but that's more of an advanced OS feature.)

    Boy, I'd really hate to be the guy in charge of hardware maintenance on THAT platform.

  102. Re:Sun does more than that, but SGI always has by Decaff · · Score: 1

    My opinion is that Linux on a 1024-way is a spectacularly stupid idea, introduced more for the sexiness of having a 1024-way machine than for any practical benefits.

    I think there are practical benefits, but only for very specialised applications. As far as I understand it, Linux only really works highly multiprocessor on architectures that aren't really symmetrical (like this SGI machine). Different CPUs don't have the same speed of access to different memory areas as in true SMP.
    So, it's not comparable to true SMP as in systems from Sun and IBM.

  103. Current State of the Art - 2 TB mem and 256 cpus by random_me · · Score: 2, Informative

    I am happy to say that I have worked, and continue to work on the current state of the art:
    http://www.ccs.ornl.gov/Ram/Ram.html

    A few notes:
    Linux kernel: 2.4.21-sgi240rp04051808_10074
    From df, a 1 TB ram disk:
    none 1023700704 0 1023700704 0% /dev/shm
    From /etc/redhat-release:
    Red Hat Linux Advanced Server release 2.1AS (Derry)

    The machine is actually not nice to work on. It is prone to frequent short freezes (2-15 seconds long; about one every 2-3 minutes, although not evenly spaced out).

  104. at last slashdot can have 1 server by shimen · · Score: 1

    at last slashdot can have 1 server no more need for 1024 servers any more

  105. ok, nobody else said it... by louden+obscure · · Score: 1

    imagine a beowulf cluster of these...

    --
    Serenity now, insanity later.
  106. I will ask a dumb question? by rspress · · Score: 1

    Why is this such a big deal?

    Why is this different than say a G5 cluster?

    1. Re:I will ask a dumb question? by Anonymous Coward · · Score: 0

      If you don't know what single system image shared memory means, then it isn't a big deal for you.

  107. Context switches by Kjella · · Score: 1

    Oh yeah that is another problem. Switching between programs takes cpu time as well. It is not unknown for single CPU systems to spend so much time on switching they don't have time to run anything anymore. The old to many running programs problem known from windows but wich affects every OS.

    Like most else you said, it is wrong (read the other replies to parent). A Linux 2.6 kernel does this 1000 times/second, at a cost on the order of 100 cycles. And that's up 10x from the 2.4 kernel. And 100kHz/1GHz = ~0,01% of the CPU time.

    The only "too many running programs problem" is related to system memory. If you can't keep it and memory and start trashing to swap, performance will suck to no end. But that has nothing to do with the CPU.

    Kjella

    --
    Live today, because you never know what tomorrow brings
  108. Theoretical maximum very close by ChrisInSF · · Score: 1

    From what I have read, the Altix machines have an amazing 1:1 relationship of the 'N' of CPUs to improved performance, a computing Holy Grail of sorts.

    Thats why they received 'Best of Show' at LinuxWorld last year. It wasn't for a 'me too' implementation.

  109. Let me explain by SmallFurryCreature · · Score: 1
    I had the situation 2 times. The first with windows 2000 and exactly the same hardware except first time I installed with only the 1 cpu and the other a dummy. I then managed to get someone to give up his old P3 cpu and reinstalled. I immidiatly noticed that what was before a slow even unstable system (slow meaning just plain slow and "gui hangs/not redrawing" and even just complete freezes. Afterwards the extra cpu was installed and the same basic software was installed it became a very nice system indeed. Pure speed wasn't as good as a more modern system but it always responded and had none of the dreaded explorer hanging while opening a large dir or worse my music skipping.

    More recently I had it with windows XP pre SP1, however here the motherboard changed from a asus single P3 to a dual P3, however the change was remarkeble (why XP on a P3 my athlon caught fire, damn those things get hot), previously XP was just the old windows, hanging at times and just not that good. With the dual it truly became good. No crashes or freezes or unexplained reboots.

    Of course these are hardly scientific results but I noticed the same on Linux. Dual machines just seem better even when using the same crappy brand hardware (asus is hardly a server grade motherboard maker).

    As said I am not an expert am just describing my experiences so that another lay person can understand it. I run and maintain single and dual workstation/gamestations and servers and the duals just perform better.

    --

    MMO Quests are like orgasms:

    You may solo them, I prefer them in a group.

  110. Re:Sun and/or IBM zseries hardware islam rapist by Anonymous Coward · · Score: 0

    fascist raping islam murderer fucker rapist gay father fucking man fuckig boy raping pig fucking islamic shithead. we will fucking CRUSH your religion, fucker , ill get out my .50 bmg and snipe all your fucking clerics for a long ways away.

  111. Close, but no cigar by Anonymous Coward · · Score: 0

    60 mph = 60 * 1609 = 96540 metres / hour
    = 26.82 metres / sec
    At 9.8 metres / sec**2, it would take 2.74 sec. to reach this speed from standstill.

    Allowing for air resistance, it probably would take longer than 3 seconds to get from 0-60mph if you pushed it off a cliff. Or, if the cliff was less than 36.69 metres, it would hit the ground before it reached that speed.

    What would be really impresive would be having sex nine times and getting a baby in one month!