Compaq To Build DEC Beowulf Supercomputer
Tower writes: "Compaq Computer (Digital) and the Pittsburgh Supercomputing Center have won a $36 million contract to build a 2,728-processor supercomputer using 1.1 GHz EV68 processors in a 682 node Beowulf setup. Check it out here." This is a different machine than this one: That one was supposed to be used to calculate nuclear explosions, this one will be used by the National Science Foundation to work on biophysics, global climate change, astrophysics and materials science, according to the article.
http://www.globalfilesystem.org
Very cool technology. I have been following this for quite a while and it shows tremendous promise for solving all kinds of disk scalability problems.
Yeah I believe the AI department at our university do something simliar with their workstations when they have lots of calculations to do.
:)
:)
They've got a tonne of ultra10 and a few ultra60 machines and as I understand it they just start idle priority threads in the background of everyones machine.
However i'm sure they run down to play with the supercomputers on our campus when they get bored
MMmmm if you like big computers look at this but it looks far better in real life
Well, the Höchstleistungsrechner in Bayern, which is 5th on the top 500 list (and stands about 50m from where I'm typing this :) uses up 600kW. They had to reinforce the floor above it before they could install the cooling systems, before they could install the computer itself.
The illegal we do immediately. The unconstitutional takes a little longer.
--Henry Kissinger
According to their website... "PSC operates five supercomputing-class machines: a 512 processor Cray T3E, two eight-processor Cray J90s, a four-processor Alphaserver 8400 5/300 system, and an Intel cluster with 10 4-processor compute nodes."
This page provides a description of the work researchers plan to do with the new supercomputer.
The center is a joint venture between Carnegie Mellon, The University of Pittsurgh, and the old Westinghouse Electric company.
It's also intersting to note that the PSC & CMU formed the NCNE Gigapop that provides the internet to CMU, PITT, WVU, and Penn State.
A lot of these problems, like climate modelling can be worked on by partioning the problem into cells. You just need to fix up at the edges, on each iteration though. Independent systems but joined together, particularly with a low latency interconnect fit this sort of problem space well.
Obviously, there are some problems, where the dependencies between the data sets are nil, where commodity Intel/Athlon/Alpha Linux boxes are ideal. Still more where the are cost-efficient ;)
Supercomputing facilities are best equipped with a mixture of these. For some jobs a steamroller is better than a Porsche. When you've got a specific requirement, and lots of money is involved, off the shelf components are not always the best bet.
they surely lack the memory bandwidth that makes traditional mainframes and supercomputers so powerful.
Yes, but these aren't Beowulf clusters. Quadrix hardware is not some cheap and cheerful solution like switched Gigabit Ethernet ;)
> I dunno -- seems to me like the author is saying that it really is a Beowulf cluster.
I took it to be a "Beowulf clone" or a "Beowulf-style cluster". AFAIK (please correct me!), "Beowulf" refers specifically to a GPL'd Linux kernel hack, and thus any "Beowulf cluster" would be a Linux cluster. But I would assume it would be more or less straightforward to implement on Unices, at least for parties who have the source code, in which case I would call it a "Beowulf type cluster", or give it a new name altogether. But perhaps the term has been generalized; I think it has already generalized once from refering to "the" Beowulf cluster (the original one), to refering to all clusters built with the same kernel patch.
OTOH, there was a [epithet of your choice for a moron here] on the Beowulf mailing list for a while, who was adamant that his NT cluster was a "Beowulf" system. I never figured out why he even subscribed, since any exchange of information there would be completely irrelevant to his situation. Shows the importance of bragging rights in the IT world, I suppose.
--
Sheesh, evil *and* a jerk. -- Jade
From the Beowulf FAQ:The more general term is NOW, Network of Workstations, which includes Beowulf, Beowulf-like systems on non-open OSes, and perhaps other types of cluster as well.
So strictly speaking, this is not a Beowulf. Of course the meaning of the term may be drifting, as with "hacker" and "cracker". (Languages do that.)
--
Sheesh, evil *and* a jerk. -- Jade
Just for the moment, assuming that you've got an ideal application and totally ignoring all other factors, what is the cheapest MIPS source?
Would you get more MIPS/buck out of massive piles of $5 microcontrollers, or out of, say, K6 500 MHz chips with cheap MOBOs?
Again, just totally ignoring all other factors, no matter how silly you think that is.
Personally, I'd like to hijack a top-of-the-line fab and put grids of hundreds of little computers, each with a few K of memory, on dies that would normally be used for one microprocessor. I don't know what I'd do with them, but I'm sure I'd find some cool app like massive neural nets.
Ahhh... to set up a massive pile of millions of parallel processors that could start from "I think therefore I am" and get all the way up to deducing the existence of rice pudding and income tax before I hook up the data banks...
---
Despite rumors to the contrary, I am not a turnip.
I always wonder how much power goes into these kind of beasts.
:-)
Let's try to estimate it: 682 systems each containing 4 processors. I guess that they will need a 300 W power supply. So that makes about 204 KW just for the computers (when working at full speed only, OK)!
At 110 V this thing would eat 1860 Ampere, not something you'd like to try at home or something (imagine the electricity bill
Every expression is true, for a given value of 'true'
Ok, that one is faster... (>770 MB/s internode using MPI, no mentioning of latency). But it doesn't qualify as a beowolf-style machine; it is all specialized Hitachi stuff.
This compaq is in my opinion 'beowolf-style': it uses standard 8-way SMP machines using PCI network cards and fast switches for interconnection. For this, the QSW products still look impressive to me.
Yes, I work for Compaq. No, I don't speak for them.
If you want massively parallel systems then I would honestly think that something like processtree would be a good solution since you can rent a phenomenal block of cpu time.
Well, obviously these machines are something inbetween the extremes you mention, and there are applications for which this is sort of a sweet-spot.
I have used an application for which this type of machines are excellent: molecular dynamics simulations.
The usual strategy for this type of software is to partion your system by giving every proc a share of the atoms. Then you start calculating forces and motions etc for each part for a short time period, and then compare them. Many forces extend to neighbouring parts, and atoms can move to other parts, so quite a lot of communications between the nodes is necessary. After exchanging this info, each node can compute the next timestep. This works quite well if most interactions between atoms are relatively short.
This type of app is excellently suited for a large cluster. It is naturally suited for message-passing, so programming it using MPI is easy. If you partion the system well, the memory use of one node is quite small, and fits for a large part in cache. IO between nodes has to happen quite often, so latency is a problem. So processtree is obviously no option.
These simulations scale quite well to larger molecular systems. Unfortunately, many researchers don't want more atoms in their systems, they want the simulation of their small system done faster. Unfortunately, this scaling is bad; if you end up with only a few atoms per node, the communication overhead boggs it down.
FYI, here are some old benchmarks of the software i used (gromacs). Although this software is considered to scale excellent, a 64 node machine is only 32 times as fast as a single-node machine...
Sorry if all this is incrompehensible, i guess i want to say too much too fast...
Yes, exactly:
Seti units coming up sir!
Real supercomputers solve problems that require massive communications between the nodes. So pretty much everything depends on the "switches" they'll use to connect the nodes, and there's no realy information about those at all in the article. At least, they seem to be custom-built by a company the sepcializes on such things.
The illegal we do immediately. The unconstitutional takes a little longer.
--Henry Kissinger
If you read the C|Net page carefully you will see it says the machines are to be 4-CPU Compaq boxes running Tru64 Unix.
The writer did mention Beowulf, but only to say that it was similar.
__
Conclusions are easy to jump to. Just be prepared to jump again...
I have to wonder what the point is in massive beowulf clusters like these. Sure they are fast and give you more Mips than flanders next door, but they surely lack the memory bandwidth that makes traditional mainframes and supercomputers so powerful.
:)
If you want massively parallel systems then I would honestly think that something like processtree would be a good solution since you can rent a phenomenal block of cpu time.
Each of these 682 nodes will be running Compaq's Tru64 Unix, which is capable of sharing a single file system
Wow if only home computers could share disks like that!!! This actually makes me think that the nodes are operating as independant computers rather than part of a whole... but hey i'm probably wrong
These machines are basically MPI boxes: they run an optimized MPI implementation (not on top of TCP/IP) that takes advantage of the special features of the underlying switch , such as reflective memory, where memory writes on one node automatically appear on all other nodes, hardware broadcasts to all nodes, etc.
Alastair McKinstry,
AlphaServer SC Engineering (who make these machines)
Compaq.
Anyone who believes exponential growth can go on forever in a finite world is either a madman or an economist
The article was vague with the 'souped-up beowulf'. These AlphaServer SC machines are not just connected by fast ethernet, they share a Quadrics switch that provides ~200 MB/s bandwidth with 5us latency per node.
Alastair McKinstry
AlphaServer SC Engineering, Compaq.
Anyone who believes exponential growth can go on forever in a finite world is either a madman or an economist
Nooo! Say it ain't so. I put one in for you.
Don't try to do real web site development with Mozilla and manual hacking. Get Dreamweaver.
Well, those machines are most commonly employed to solve numerical problems (as in: huge systems of differential equations). For that kind of work, High Performance Fortran can be used. HPF basically consists of extensions to Fortran that allow you to explicitly divide data (i.e. parts of matrices) between nodes and still use standard operations on it. The compiler takes care of the inter-node communication, and if you divided the data wisely, there hopefully won't be too much of it.
The illegal we do immediately. The unconstitutional takes a little longer.
--Henry Kissinger
The PSC has a release here
I was involved with the pittsburgh supercomputing center in high school. We were given a grant for processing time, something like $40,000, to compute the heat loss of my community due to improper insulation. Admittedly, I was on the fray of the group but I know they have been using massively parallel systems for a while. They also had an Internet connection which is where I first used Lynx.
At that time they had a T3D and a "DEC supercluster" which was IIRC 256 Digital Alpha computers. They had some other supercomputers but I can't remember what they were. The supercluster was later upgraded to 512 processors. It seems that this is the same thing, updated and built by Compaq (who bought Digital).
fear is the mind killer
Compaq is also in bed with and the University of Western Ontario building a 48 processor beowulf (alpha+Linux) . Compaq seems to be all hot and bothered about supercomputing as of late.
;)
Now the *exciting news* is that they are teaming together with upto three other university's and build a "Beowulf of Beowulf's" (think 4 of these babys Connected together through *very* fast network connections, so you can submit your job and "it" would decide if there's too much going on at Western it can queue part of your job up at another university. Thus creating a beowulf of beowulf's
Baldric the student run beowulf is also (read hopefully) going to be a part of this with our donation of 50 some nodes (just off the truck) from Sprint Canada. (ok that was a blatent plug
Here's a copy of the Carnegie Mellon press release on the topic. This is from the CMU 8 1/2 x 11 News, which is also posted on the CMU bboards. It includes some info not in the other articles, so I figure I'll post it here:
(The "8 1/2 x 11 News" is published each week by the Department of Public Relations. The newsletter is available on the official.cmu-news and cmu.misc.news bulletin boards.)
NSF Awards $45 Million to Supercomputing Center for "Terascale" Computing
The Pittsburgh Supercomputing Center (PSC) has been awarded
$45 million from the National Science Foundation to provide "terascale"
computing capability for U.S. researchers in all science and engineering
disciplines. Through this award, PSC will collaborate with Compaq Computer
Corporation to create a new, extremely powerful system for the use of
scientists and engineers nationwide.
Terascale refers to computational power beyond a "teraflop" -- a trillion
calculations per second. While several terascale systems have been
developed for classified research at national laboratories, the PSC system
will be the most powerful to date designed as an open resource for
scientists attacking a wide range of problems. In this respect, it fills a
gap in U.S. research capability -- highlighted in a 1999 report to
President Clinton -- and will facilitate progress in many areas of
significant social impact, such as the structure and dynamics of proteins
useful in drug design, storm-scale weather forecasting, earthquake
modeling, and modeling of global climate change.
The three-year award, effective Oct. 1, is based on PSC's proposal to
provide a system, installed and available for use in 2001, with peak
performance exceeding six teraflops. To achieve this, PSC and Compaq
proposed a system architecture, based on existing or soon to be available
components, optimized to the computational requirements posed by a wide
range of research applications and which, at this level of performance,
pushes beyond simple evolution of existing technology.
The brain of the proposed six teraflop system will be an interconnected
network of Compaq AlphaServers, 682 of them, each of which itself contains
four Compaq Alpha microprocessors. Existing terascale systems rely on other
processors, but extensive testing by PSC and others indicates that the
Alpha processor offers superior performance over a range of applications.
Development of this system will draw on a history of collaboration between
PSC and Compaq, and represents an extension of PSC's history of success at
installing untried, new systems -- resolving the myriad of unanticipated
hardware and software glitches that come up -- and turning them over
rapidly to the scientific community as productive research tools.
The PSC terascale system, to be located at the Westinghouse Energy Center,
Monroeville, will be a component of NSF's Partnerships for Advanced
Computational Infrastructure (PACI) program, supplementing other
computational resources available to U. S. scientists and engineers.
"The PSC has -- with its partners at Carnegie Mellon University, the
University of Pittsburgh and Westinghouse -- an excellent record of
installing innovative, high-performance systems and operating them to
maximize research productivity," said NSF director Rita Colwell.
"We're pleased that NSF's terascale initiative gives us this opportunity to
use PSC's proven capability in high-performance computing, communications
and informatics in support of the national research effort," said PSC
scientific directors Michael Levine and Ralph Roskies in a joint statement.
"Working in partnership with Compaq, we'll create a system that enables
U.S. researchers to attack the most computationally challenging problems in
engineering and science."
"Compaq is looking forward to working with the National Science Foundation
and the Pittsburgh Supercomputing Center and we are committed to the
success of the terascale initiative," said Michael Capellas, Compaq's
president and CEO. "With our AlphaServer systems and Tru64 UNIX, we are
providing the technology infrastructure for some of the most advanced
computing projects in the world. This is further proof of Compaq's
leadership in high-performance computing and our commitment to help open
new frontiers in science and technology."
Development and implementation of the terascale system, including software
and networking, will draw on fundamental research in computer science. A
significant strength of PSC is its tri-partite affiliation with
Westinghouse and with Carnegie Mellon University and the University of
Pittsburgh and the pooled computing-related expertise of faculty and staff
at both universities.
"This award, which comes as the culmination of a national competition,
recognizes PSC's leadership in high-performance computing and
communications," said Jared L. Cohon, president of Carnegie Mellon. "And it
provides another key building block for our region's technology future,
enhancing our international stature in the development and application of
advanced computing technology."
"A gap exists between the computing resources available to the classified
world and the open scientific community," said Mark Nordenberg, chancellor
of the University of Pittsburgh. "It is ideal that PSC, a world leader in
acquiring and deploying early the most powerful computers for science and
engineering, can contribute to filling this gap. This award also
demonstrates the unique scientific strengths that exist in Pittsburgh when
its major research universities partner with each other and with leaders in
industry."
"Today's terascale award is one more in a long list of PSC's major
achievements," said Charlie Pryor, president and CEO of Westinghouse
Electric Company. "Westinghouse is proud of PSC's contribution to the
nation's scientific community and is pleased to have been associated with
PSC since its inception."
Under the proposal, PSC will by the end of this year install an initial
system with a peak performance of 0.4 teraflops. The six teraflop system,
which will use faster Compaq Alpha microprocessors not yet available, will
evolve from this system. The four-processor AlphaServers use
high-bandwidth, low-latency interconnect technology developed by Compaq
through a U.S. Department of Energy advanced technology program.
The Pittsburgh Supercomputing Center is a joint effort of Carnegie Mellon
University and the University of Pittsburgh together with the Westinghouse
Electric Company. It was established in 1986 and is supported by several
federal agencies, the Commonwealth of Pennsylvania and private industry.
# # #
An artist's rendition of PSC's terascale system and examples of potential
research applications are available at:
http://www.psc.edu/publicinfo/tcs
Can you imagine... a beowulf cluster of these?
/my/ supercomputer.
hrmmmmm...
I don't know about you, but I wouldn't trust Compaq building
Ever get the impression that your life would make a good sitcom?
Ever follow this to its logical conclusion: that your life is a sitcom?
"I don't care about the Constitution!" --Bill O'Reilly, November 17, 2009
Take a look at this - it's the tech specs for suns E10000 Starfire server. Not quite in the supercomputer leagues and yet it has a memory bandwidth of 102.4GBit/s and a latency less than 500nS.
I keep hearing about these projects, and the means by which the nodes of these machines are connected, but what I really want to know is how these clusters are programmed. More to the point, how is it data and process parallelism implemented (or not) when you are talking about a high complexity environment and a fairly low level of abstraction.
I write software for MPP & large scale SMP machines, but I use tools like Ab Initio or Torrent Orchestrate to abstract away much of the complexity for traffic control, checkpointing, hash partitioning data, etc... in my cursory examination of PVM and the MPI implementation, it seems pretty primitive, and the code must be a nightmare to implement properly, much less maintain.
Is anyone working on a GNU componentized approach similar to the commercial packages I mentioned earlier to take care of this? Is anyone interested in doing this? This could be a pretty cool project.
The other reservation I have when I look at the whole beowulf architecture is the node latency issue. Unless you have highly partitioned code, with independent processes, these machines are gigantic toasters, spending most of their lives waiting for IO. A well designed, partitioned app should be CPU bound. Most of the business apps I develop don't exhibit these (well partitioned) characteristics all the way through the process. It makes me wonder how effective these machines really are.
~Religion is O.K., as long as it gets you laid.