Slashdot Mirror


10-TFlop Computer Built from Standard PC Parts

OrangeTide writes "Using PCI host adapters and Xeon processors, engineers at Lawrence Livermore National Labs have achieved 10-TFlops relatively cheaply. More information can be obtained from this article at EETimes." Lately, Linux seems to be the operating system of choice for new supercomputers, and this one's no different. It's cool to see big iron made cheaply.

14 of 247 comments (clear)

  1. imagine the future by ryochiji · · Score: 5, Insightful
    From the article:
    >The 1- to 10-teraflops processing range is opening up a revolutionary capability for scientific applications

    In the not too distant future, that kind of processing power could very well be available in home PCs. Imagine what that would do to...well, I mean, dang it, what the heck will we do? Game frame rates can only go so high. Even realism of 3D graphics may have it's limits. Oh sure, we'll find something, but it's difficult for us to imagine now...

  2. Does that mean... by haxor.dk · · Score: 2, Insightful

    ...that Apple's glorious supercomputers are obsolete?

    Damn... :/

  3. Re:Connections through PCI bus? by Dr.+Spork · · Score: 3, Insightful
    Wouldn't they have bus timing issues? There must be some extra piece of hardware involved... If that's the case isn't it basically just a new kind of NIC?

    I don't know much about this type of stuff, but wouldn't it be awesome if the way they made it work is through software? If they did, Linux/Beowulfing would be about to take a huge leap forward.

    Anyway, maybe this is not the sort of thing you can solve with software. Whatever they're using to connect the computers sounds very practical though, not just for supercomputers but for general fast networking. When can we do this at home?

  4. Re:Processing power by sql*kitten · · Score: 4, Insightful

    Anyways, what I'm trying to point out is that it is actually becoming very convinient to build a super computer with lots of PCs that just lie idle. I am not sure if Saddam has heard about cheap linux systems. But what if he could build a super computer cluster?

    Well, it depends. A Linux cluster is a good way to render a movie, because you can easily parallelize that task - send a frame to each node you've got, wait for it to come back, send out the next one, then when you're done composite them into an animation. That's easy, because you can make each task essentially stateless. For example, you don't have to wait for frame 1 to rasterize before you know how to light frame 2.

    But in many scientific computations, there is a limit to how you can subdivide a task. Say you are modelling the movement of a gas in 3 dimensional space, you cannot partition your space 3x3x3 and send it to 27 compute nodes, because what happens in each partition both influences and is influenced by what happens in adjacent partitions. If you did try to do something like this on a cluster designed for rendering movies (or brute forcing a cipher, or serving web pages) performance would be terrible because of the overhead of communication between nodes. For that, a Single System Image machine has a vast advantage.

    So the question is (and I don't know, I didn't study nuclear physics beyond A-level), are the significant computational problems associated with the development of nuclear weapons easy to parallelize, or do they require a real supercomputer?

  5. Re:Parallel computing by sql*kitten · · Score: 4, Insightful

    I can't think of a reason why we shouldn't be getting hyped about these teraflops. We use a 8 node AppleSeed cluster at work and I've seen that thing hump out 4-6 gigaflops of crunching power. It takes as long as a week to run some of our molecular dynamics simulations. If we had 10 teraflops of power in our hands those simulations could take somewhere on the order of minutes instead of days.

    As an aside, I have to wonder whether or not that's a good thing. I have noticed in myself and almost everyone I've worked with that having massive amounts of CPU at your disposal makes you sloppy - people tend to take a "shotgun" approach, rather than thinking through a problem, they just "try something" until it works. Of course in some cases, CPU really is cheaper than developer time, but in just as many cases, it's an excuse for laziness. I see this all the time, people will build an over-complex solution using technologies like J2EE and EJBs when something much simpler and more efficient would suffice. For another example, every Slashbot who has complained about bloat in MS Office knows exactly what I mean.

    Roll on the teraflops, but not before developers have the self-discipline to use them well.

  6. Re:Connections through PCI bus? by tap · · Score: 5, Insightful
    There are chips designed to connect two PCI busses together, called PCI-PCI Bridges. For instance, I have an Intel dual port ethernet card with one:

    Bus 0, device 12, function 0: PCI bridge: Digital Equipment Corporation DECchip 21152 (rev 3). Master Capable. Latency=64. Min Gnt=4.

    But you can't use this to connect a rack of computers. For one thing the max cable length for connecting two busses would be just a few inches. For putting PCI cards in 1.75" high 1U rackmount cases, there are PCI risers with a short ribbon cable that connects to the PCI slot. Even these short cables often cause timing problems. For instance, with the riser, cards may only work in the first one or two slots that will otherwise work in all the slots.

    But even if you could cable all the computers together on one giant PCI bus, it would still be a bad idea. A good 24 port gigabit ethernet switch (~$2000) has a 480MB/sec switching fabric, to support full speed full duplex on each port. 32 bit 33Mhz PCI is only about 132 MB/sec, not nearly as fast. You'd need a 64 bit 66 Mhz PCI bus to keep up. And there are more expensive gbit switches with more ports that have 100 Gbit/sec fabric. And this is just gbit ethernet, the slowest and cheapest of the high speed interconnects used in modern Beowulf clusters.

    There are faster ways to connect computers than gigabit ethernet. The EE times article is very untechnical, but this one has some more information. LLNL has used a very fast and very expensive interface called quadrics. This is probably the fastest way to connect computers in a Beowulf. People like Cray/SGI and IBM have faster things still, but they cost real big bucks. Other ways to connect a Beowulf are the above mentioned gigabit ethernet (~$100-$250 a node for up to 24 nodes), myrinet (~$1400-$2000 /node up to 128 nodes), and SCIhardware and software (~$1400-$2100 /node). Myrinet uses a switch like gigabet ethernet and the largest switch they have is 128 ports. SCI is switchless, each card has multiple cables (1-3), and is connected in into a ring, 2D or 3D torus.

  7. Re:Why XEONs? by tap · · Score: 3, Insightful

    If you check current prices, the Xeon isn't much more expensive than the AthlonMP. Pricewatch has the 2.2Ghz xeon at $245 and the athlonMP 2200+ at $204. Each of these machines is interconnected with a Quadrics board that probably costs more than $2000, so an extra $80 for CPUs isn't much.

    Why not use AMD anyway? There are xeon motherboards with chipsets like the Intel E7500 and ServerWorks GC-HE that have greater memory bandwidth and PCI bandwidth than the AMD 760MPX. For many problems in scientific computing, memory bandwidth is what is important, not CPU speed.

  8. Re:yeah, well.. by Kibo · · Score: 2, Insightful

    Interesting observation. So whatever happened to DEC anyway? Oh yeah.... It's not always the sick or weak, sometimes it's the unaware that end up being prey.

    --
    --Jimmy has fancy plans; and pants to match.
  9. This is not "Big Iron" by sdeath · · Score: 5, Insightful

    The title says it all. Big Iron is _engineered_. No matter how big or how spiffy a Beowulf cluster is, it's still just a bunch of PC motherboards kludged together with a bunch of network cards. There is a reason Crays are expensive - they are _worth it_ from a performance standpoint, because not every problem lends itself easily to the solution of a Beowulf cluster. Some problems require the exchange of a lot of data between a lot of nodes, and a little math will show that it won't take much data interchange to saturate even a GigE switch. Adding more machines is not going to help; craftily designing and overengineering the network _might_, but by the time you get this whole damned thing glued together well enough to approximate a Cray's performance, you'll have spent enough to have just flat-out bought a Cray in the first place.

    As others have noted, while this thing may have a theoretical peak performance of 10 TFLOPS, I'm willing to bet that number goes down like Monica Lewinsky on Quaaludes when you feed this magical supercomputer a problem that's _not_ suitable for distributed.net (i.e. one where computations on one node are dependent on computations on another node, like fluid-dynamics problems, turbulence, etc.)

    Yeah, it's interesting as a curiosity, but this is by no means spectacular. Beowulf is good for what it's good for, which is a "poor-man's supercomputer" that works well for coarsely-parallel problems that don't require a lot of internode communication. It's not the Philosopher's Stone, folks.

    -SD

    --
    I am Chaos. I am alive, and I tell you that you are Free. -Eris
  10. But it's only 32-bit by Anonymous Coward · · Score: 2, Insightful

    Which means there's a 4 GB hard limit on the amount of RAM a process can use, and a big performance hit if a node has more than 4 GB RAM. Of course with 10 TB, there's some room to spare, but if a calculation is not very parallelizable, you're still limited to the speed of one node.

  11. Networks and parallel algorithms are the key by dsfd · · Score: 3, Insightful

    The distributed memory Crays (T3D, T3E) are just the same: boards and network cards. The processors they use are not faster than the last generation PC processors. The difference are the NICs, that have about 10 times more bandwith and 10 times less latency (compared with standard fast ethernet cards).

    There is the difference. As you say, for certain problems, this means that the whole machine is about 10 times faster than a Beowulf.

    However, if/when conventional NICs are fast enough, specially in terms of latency, both systems can be equivalent again. In the meantime, a lot of people are trying to develop parallel algorithms that minimize the number and size of the messages, allowing to use cheap PCs as supercomputers.

  12. Re:Processing power by FuzzyDaddy · · Score: 5, Insightful
    So the question is (and I don't know, I didn't study nuclear physics beyond A-level), are the significant computational problems associated with the development of nuclear weapons easy to parallelize, or do they require a real supercomputer [sgi.com]?

    I believe the calculations needed are massive finite element calculations. And I would imagine that things happen quickly enough in a nuclear explosion that there's a lot of significant stuff going on over a time period much shorter than it takes for any change to move from one side of the simulated device to the other.

    As an analogy, suppose you wanted to simulate a large number of gravitating bodies. You would break the problem up into sections. Even though each body acts on every other, bodies outside a certain distance can be treated by their average force. So you can simulate things near each other on the same node, and have the nodes talk to pass the information about the "average" field. It requires some communication between nodes, but a large amount of work can be done on an individual nodes.

    Or for your gas example, if you broke the problem up into boxes, you would have to "hand off" a particle as it passed from one box to another, and perhaps pass off information about forces close to the box boundaries. But if a lot of stuff is happening in a single box (like, say, chemical reactions), you can still get a big benefit out of parallalization.

    Also, if designing nuclear bombs is anything like designing microwave components, you would have several simulations going at the same time, to try different variations on one design. Or you would design several subparts and have them running at the same time.

    In short, I think that the problem very much lends it self to parallel computing.

    --
    It's not wasting time, I'm educating myself.
  13. Re:Howlingly funny? by Anonymous Coward · · Score: 1, Insightful

    Why not both?

    There are benefits to combining various clustering methods.

  14. Re:Processing power by LWATCDR · · Score: 5, Insightful

    Since the first Atomic Bomb was made in 1944-45 and worked the first time. All you need is a computer equal to what they had in 1944.
    To make a small portable nuke is harder.

    --
    See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.