Slashdot Mirror


Cray CTO Says Cray Computers Are Great

Jan Stafford writes "Linux clusters can not offer the same price-performance as supercomputers, according to Paul Terry, chief technology officer of Burnaby, British Columbia-based Cray Canada. In this interview, Terry explains that assertion and describes Cray's new Linux-based XD1 system, which will be priced competitively with other types of high-end Linux clusters."

12 of 338 comments (clear)

  1. editor training by Knights+who+say+'INT · · Score: 2, Interesting

    You really shouldnt place commentary on a story title, unless it's an "its funny, laugh" one.

    Oh, by the way, everyone who has a slashdot account should go to their preferences and set the "light" layout. You wont suffer with the bad color schemes anymore, and the results are more printer-friendly too.

  2. Re:The issues are progress and long-term usefulnes by Marx_Mrvelous · · Score: 4, Interesting

    There are some limitations to clusters that "supercomputers" don't have. Even if your network were exactly as fast as the internal bus of one of the Cray supercomputers (which I highly doubt it is), you still have a logical layer on top of it (TCP/IP/UDP etc). This slows it down.

    For some applications, a cluster of slow PCs is ok. Bu if you want to do real time-intensive computation, you really can't beat a good internal bus.

    --

    Moderation: Put your hand inside the puppet head!
  3. Maybe "APPLE" will buy another Cray! by callipygian-showsyst · · Score: 2, Interesting
    Remember when Applebought a Cray? It was mostly for show, so their R&D group can have the blinkenlights.

    However it spawned a popular story about how "Cray designs on Apple and Apple designs on Cray" (see link.)

    And now for the REST of the story:

    Did you know that Macintoshes are designed on PCs!? That's right--PCs running WINDOWS. You see, nobody makes software to burn eproms or design printed circuit boards that runs on MacOS, so the hardware group has a bunch of Windows PCs!.

    So now you know the *rest* of the story!

    1. Re:Maybe "APPLE" will buy another Cray! by Thagg · · Score: 4, Interesting

      As usual, there is more to the story. Apple brought my company in on a project back in the mid 80's when they bought the Cray. While we had to sign an NDA in blood, I doubt anybody will mind me talking about it now, almost 20 years later.

      Apple was trying to design a new cpu chip. It would have had vector processing capabilities not all that different from the Cray, so they bought the Cray both to do circuit simulations on the chip and as a model for their own design.

      The chip was going to be a 100 MHz chip (an astonishing speed for the time) with a four-pipleline vector processing unit.

      They considered (but eventually declined to) hire us to develop some kind of 3D desktop for the Mac. The idea was this would distinguish the Mac further from other computing systems, but they wouldn't be able to emulate the interface because they didn't have the horsepower.

      Anyway, that's the Apple-Cray story as I understand it. I'm sure that there is a lot more to the story than I know, of course.

      Thad Beier

      --
      I love Mondays. On a Monday, anything is possible.
  4. Re:*Shock* by lukewarmfusion · · Score: 1, Interesting

    No, the inventors of big supercomputers (couple million dollars a pop) are definitely scared of clustering.

    If you want a Cray supercomputer, you have to buy it from Cray. If you want a Linux cluster, you can buy it (or build it) from anyone.

    I'm sure there are applications for a supercomputer, but I see universities, production studios (Pixar!), and research labs moving toward clusters. The supercomputer companies will do anything it takes to either stop that from happeneing or to gain in that market.

  5. Re:agreed by jedidiah · · Score: 2, Interesting

    I'm not sure you do either.

    A NUMA machine is just a cluster where the wire is in the form of a bus rather than copper or fibre cabling. The communications protocol for the bus may be better optimized for "supercomputing". However, you can do the same thing for a MPP optimized network protocol.

    It's all ultimately just wires and protocols.

    The total lack of process migration between nodes in a cluster might actually give clusters and edge over some NUMA implementations.

    Watching a single process dance around a number of bricks in a Sun 15K can be rather entertaining.

    --
    A Pirate and a Puritan look the same on a balance sheet.
  6. Re:He basically said faster communications needed by vidarh · · Score: 2, Interesting

    And in doing so you are essentially building a super computer. However you'd have to keep in mind that it isn't all about total bandwidth - latency also needs to be extremely low. That said, HP is working on an open source Single System Image clustering support for Linux on "normal" hardware

  7. Re:The issues are progress and long-term usefulnes by einhverfr · · Score: 3, Interesting
    You might want to read the latest 10-K form from CRAY.

    http://www.sec.gov/Archives/edgar/data/949158/0000 89102004000325/v96761e10vk.htm

    Here they discuss the limitations of clusters and vector-based supercomputing.

    Basically, they offer three types of supercomputers aimed at different markets: vector, massively parallel, and multithreaded. Not really sure why multithreaded means in this context (Microkernel capable of threading itself across many processors i.e. UNICOS/mk?) but they do a decent job of explaining the whole thing:


    Cray Research pioneered the use of vector systems, from the Cray-1 to the Cray C90 and T90 systems. These systems typically use a moderate number (one to 32) of very fast custom processors in connection with a shared memory. Vector processing has proven to be highly effective for many scientific and engineering application programs which over the years have been written to maximize the number of long vectors. Traditional vector systems do not scale effectively (that is, increase performance by increasing the number of processors) past a limited number of processors. We currently market one classic vector supercomputer, the Cray SX-6 system.

    Massively parallel processing architectures typically link tens, hundreds or thousands of standard or commodity processors to act either on multiple tasks at the same time or together in concert on a single computationally-intensive task. Type T systems connect each processor directly to its own private memory and the programmer must manage the movement of data among memory units and processors. Consequently these systems can be difficult to program. Type C massively parallel systems, unlike low bandwidth clusters, have high bandwidth and low latency interconnect systems and are said to be "tightly coupled" -- the Cray T3E, Red Storm and the OctigaBay product are examples of balanced high bandwidth purpose built systems that employ standard microprocessors.

    The Cray X1 system is revolutionary in that it is the first supercomputer that combines the attributes of both vector and high bandwidth massively parallel systems. The Cray X1 system has up to 64 processors per cabinet and a shared memory. The Cray X1 system can run small problems as a vector processor would or, by focusing many processors on a task, the Cray X1 system operates as a massively parallel system with a system-wide shared memory and a single-system image. The Cray X1 system is designed to provide efficient scalability and high bandwidth to run complex applications at high sustained speeds. The Cray X1E system furthers this architectural design with increased processor speed and capability.

    Our MTA-2 project for NRL is designed to have sustainable high speed, be broadly applicable and easy to program, provide scalability as systems increase in size and have balanced I/O capability. The multithreading processors make the MTA-2 system latency tolerant and, with the system's flat shared memory, able to address data anywhere in the system.
    --

    LedgerSMB: Open source Accounting/ERP
  8. Let's do some bandwidth math... by JBMcB · · Score: 3, Interesting

    From Cray (From XD1 page):
    "A 96 GB per second, nonblocking, crossbar switching fabric in each chassis provides four 2 GB per second links to each two-way SMP and twenty-four 2 GB per second interchassis links."

    -So for a dual-opteron XD1 processor unit, there is 8GB total bandwidth available.

    Total aggregate PCI bandwidths (Accepted standards):

    PCI32 33MHz = 133MB/s
    PCI32 66MHz = 266MB/s
    PCI64 33MHz = 266MB/s
    PCI64 66MHz = 533MB/s
    PCI-X 133MHz = 1066MB/s
    PCI Express = 200MB/s (Per slot)
    PCI Express x16 = 3000MB/s (Usable bandwidth)

    -So for PCI Express x16 we're talking 3GB/second

    SMP Opteron with two PCI Express x16 slots can do 6GB/second aggregate bandwidth. A couple of Infiniband links can easily saturate that. I'm sure this all costs quite a bit less than Cray's propriatary stuff.

    --
    My Other Computer Is A Data General Nova III.
  9. Re:Two words: by iPaul · · Score: 4, Interesting

    Not quite true. First off, you get much higher bandwidth between processors using proprietary (NUMA) based interconnects than you can with commodity hardware. Why? Because you can optimize for your situation. Second you can exploit things like cache-coherency between processors (even if they're in different "nodes") and therefore true shared memory. So, a 1024 processor SGI Altrix, or a 256 processor Cray is one computer as far as the OS and user-land stuff is concerned.

    There's another advantage Cray has on the SV and X series and that's a vector unit on the processor. That allows you to conduct operations on arrays of numbers at once instead of having to cycle through the numbers in a loop. For example, the dot_product between two small arrays might be accomplished with one or two instructions, as opposed to a loop. Apple's AltiVec is also a vector unit.

    If you took money out of the picture it would be easier to deal with a big-honkin' super computer like an SGI or Cray rather than a cluster. One computer is easier to manage and you could always use threads and plain old heap memory (which is much faster than message passing over a network).

    Add money back in and 500,000 goes a lot farther in raw compute power when you're buying racks of DELLs and infiniband interconnects. However, depending on the application, you may be faster, slower, or even dog-slow compared to the cray. If you need the answer today, and the $ is not a factor, go to Cray or SGI with a blank check. If you have to balance cost and time, then a cluster might be better.

    Essentially, it boils down to how much communication you do between nodes. Cray does it orders of magnitude faster than off-the-shelf stuff. If you hardly ever pass messages between nodes, clusters are fast. If you have to pass a lot of messages between nodes, one big computer will trounce lots of little ones.

    --
    Leave the gun, take the cannoli -- Clemenza, The Godfather
  10. Re:The issues are progress and long-term usefulnes by Tiosman · · Score: 2, Interesting

    I work a lot with it, like ~3000 customers, almost half of them are industry (non academic or gvt).

    You found bugs ? Care to share them ? Hardware failed ? Did you get it replaced ?

    Can you give me the tech support ticket numbers so I can see if your complaints are reasonable (and have been addresses) or are just plain FUD ?

  11. Exploiting parallelism vs. efficient computation by billstewart · · Score: 2, Interesting
    If you're trying to run 1024 cases with different starting conditions, then a 1024-processor cluster lets you run them all at once. A supercomputer with the same price as the cluster probably has only 1/10th the raw GFLOPS as the cluster, because supercomputer designs are much more complex and commodity cluster hardware is dirt cheap.
    • So if each cluster CPU can run a single instance the problem efficiently, it's 10 times as cost-effective to use the cluster.
    • On the other hand, if a single instance of the problem doesn't really fit in a cluster CPU, it might be 1/10th as efficient as the supercomputer CPU, because you're spending more time doing swapping or communications to get the numbers to crunch than you are crunching them, in which case it's a tie with the supercomputer.
    • But on yet another tentacle, if it's 1/100th as efficient to use the cluster CPU as the supercomputer CPU, because you have to spend a LOT more time swapping, then the supercomputer is a big win, 10 times as cost-effective as the cluster.

    Back in the mid-80s, my department had a huge VAX 780 with 4 MB of RAM (16KB chips, I think), and we were working on a network simulation system that needed 12-14 MB RAM to run. I spent a while playing with different versions of 4.1BSD and Unix System VR2, but fundamentally the machine spent all its time swapping data in and out of disk, and the main performance with was helping the physics jocks who wrote the application get better algorithms and better localization and good checkpointing because the computer didn't always stay running for the full week it took to finish a simulation run. A year or two later, we got the budget to buy another 4MB of RAM (in 64KB chips, about $50K IIRC), which helped a bit, and a year or two after that, we got enough budget to buy another 8MB of RAM (maybe 256KB chips? not sure. Also about $50K), and suddenly the application could complete in under an hour instead of a week, because RAM really is a couple orders of magnitude faster than disk drives with a couple more orders of magnitude less latency, so our problem changed from being disk-bound to being CPU-bound.

    That speedup not only improved the utilization of the equipment, it made a qualitative difference in the kinds of problems we could address because of the way we could interact with it. That's why people buy supercomputers if they need them - it really can be orders of magnitude faster for some problems. The first year or so, we really had all the RAM that could fit in the double-refrigerator-sized VAX cabinet. Once the denser RAM chips became available, we probably should have spent a bit more manager time beating up on the accounting department, because an extra $50K for hardware could have more than doubled the efficiency of 3-4 physicists, but of course the accounting droids don't think in terms of efficient use of physicists unless it lets you buy half as many of them, which was _not_ the objective here...

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks