"Cplant" Parallel Computing Tool
SEWilco writes "Sandia National Laboratories has released its "Cplant" massively parallel processing software. This is related to the software used in their ASCI Red supercomputer, and eliminates several scalability problems to allow hundreds of nodes for algorithms which can't be parallelized for Beowulf-type clusters. This is now number 2 on the TOP500 supercomputer list. The press release refers to "licensing terms", but the license is the GPL.
We discussed this in a Linux clusters discussion and several earlier reports as ASCI Red grew."
A standalone non-clustered one of these.
In order to simulate new weapons configurations, it takes an awful lot of computing power. Just try to imagine all the factors that have to be tracked and taken into account in order to produce an accurrate and thorough simulation. Simulated tests have a lot of advantages, obvious (no radiation) and non-obvious (costs).
You've been reading YRO too much. Trust me. The government has a lot better uses to put its supercomputers to than breaking our SSH and PGP keys--like big guns and bombs for laying waste to the known world!
"Doubt your doubts and believe your beliefs." -- Switchfoot, Ode to Chin
The Scyld Beowulf software is very nice for quickly setting up small to medium size clusters where users use the whole cluster more-or-less serially. IMHO, it doesn't fare quite so well for production oriented shops like Sandia, where things like accounting and scheduling become important. The Scyld software also has very limited support for Myrinet, which is a very nice (and very fast) interconnect for clusters.
You also need to remember that the Cplant stuff was specifically designed to emulate the user environment of the ASCI Red machine, which inherited its environment from Sandia's Paragon. That was done presumably to keep the retraining of Sandia's user base to a minimum. The Scyld software has no such requirements.
(Disclaimer: One of my coworkers used to work on Cplant, and we've borrowed some of Cplant's ideas [though not any of the software] for the clusters we have at OSC.)
"My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac
Both Linux, and GNU bash are licensed under the GPL as "free software." Stallman has stated that free software stems from "Freedom Zero", namely "the freedom to run the program for any purpose, any way you like. "
To my knowledge, use restrictions would violate both the GPL and Open Source Initiative's Open Source Definition.
I've never worked in one of the supercomputer-happy departments at Sandia, but here's a few applications I've talked with others about:
Nuclear simulation: This is the big one. With popular opinion and world politics the way it is, it's likely we won't set off another thermonuclear detonation for a very long time. Unfortunately, we have a few thousand warheads that are aging and decaying, and we want to be sure (and make everyone else sure) that our final deterrent isn't turning into duds under our noses. This is pretty much the sole official justification for the national labs' supercomputing programs.
More nuclear simulation: After New Mexico's devastating summer fires last year, they stepped up research on the effects of fire on stored warheads (no, they won't go nuclear, but cleanup could still be awful). Simulating something that turbulent isn't easy, but it'll be nice to know if there are any further precautions Los Alamos needs to take.
Computational Fluid Dynamics - refining supercomputer code to cut down on the need for even more expensive wind tunnel time. Military and civilian uses: the two I saw were hypersonic parachute unfolding for bombers and drag-reducing plastic attachments for big rig trucks.
Impact testing - this is one of the big commercial apps of supercomputers; I don't know how much of it they're doing at Sandia right now. You can make vehicles a lot more crash safe cheaply if you can virtually destroy them (and refine their frame designs) hundreds of times before actually mangling hardware.
As for crypto breaking... no. For example, the Teraflops has 9 or 10,000 processors (just upgraded to 3xx Mhz Xeons, I'm told, since those are the fastest things that could be massaged into the old PPro sockets) - That's on the order of how many distributed net computers brute forced 64 bit encryption... so for 128 bit encryption you'd just need 16 quintillion more Teraflops supercomputers. Your PGP key is infinitely more likely to be snagged by some hacker's trojan and keylogger than it is by a government supercomputer.
eCos is probably the best bet. And I think it's an open-source kernel, too (could be wrong, though).
However, you are also missing the benefits of COTS (commodity-off-the-shelf). Although I doubt bombs use too many COTS parts, It is cheaper to use a known-working thing than it is to build something custom. Even if a bomb doesn't need megs of RAM (Linux does _not_ require an HD, just a flash card), it might be worth it to go ahead and use it to save the expense of writing a custom solution.
Engineering and the Ultimate
He, I've seen and worked with 215. Pretty non awe inducing group of 96 dual Pentium III boxes using a Myrinet interconnect. You can find some pictures at the Kepler homepage. It's amazing what good lighting and a good photographer can do :-) That aside, the system itself is pretty cool (and damn fast, even if there's some contention among its users for computing time)
They do nuclear weapons detonation simulations and modelling. They use the computing model as a substitute for REAL testing.
It depends on what you mean by "simulating nuclear explosions". Sandia is using ASCI Red to model the effects of a nuclear blast (like how far away from a blast of X kilotons do the buildings get knocked down), but they are legally prohibited from modeling the actual reactions that occur within a nuclear device, the part we call the "physics package." Los Alamos and Livermore are using their ASCI machines to do these calculations and we need all the horsepower we can get to model the phenomena accurately enough to certify our aging stockpile without nuclear testing.
Learn to spell: nickel, missile, lose, solely, amendment, speech, kernel, probably, ridiculous, deity, hierarchy, versus
Something you might find interesting: at one time the worlds largest repository of free and open-source software was at the (then) Army Ballistic Research Laboratory, open to anyone who could FTP there. It was an important resource during the 1980's when the free software community, a community that included the late Mike Muuss of the BRL, was taking shape.
This is only one example of many from that era. (I hope it's not too trivial to point out that the Internet itself originated with the "War Pigs.") Had the GPL included an anti-military clause, there is a good chance that much of GNU would not exist -- if the movement had happened at all. Don't forget, the "War Pigs" paid for Stallman's ARPANET connection (via MIT, which was on the ARPANET by virtue of being a major military contractor).
I'm not attempting to justify the military, here, just pointing out that blindly excluding them may not be the best of ideas...
Nuclear testing's been mentioned. Other massive computing efforts include theoretical protien chemistry, astronomy, particle accelerator analysis, weather simulation, materials reserch (more theoretical chemistry). Getting good excited state properties of molecular systems scales as the 12th power of your basis set size. Doubling the size of the system you're looking at increaces computational effort by a factor of 4000. It's like, real easy to chew up unlimited amounts of computer power in computational science.
Although this sounds good for Linux, now in the number 2 most powerful computer in the world. Another sign that Linux is on the rise and not "dead"
I don't see a reference to Linux in the description of this supercomputer. I see the following link to the specs which describes the OS as:
The operating system used for the Service, I/O, and System Partitions is Intel's distributed version of UNIX (POSIX 1003.1 and XPG3, AT&T System V.3 and 4.3 BSD Reno VFS) developed for the Paragon XP/S Supercomputer. The Paragon OS presents a single system image to the user. This means that users see the system as a single UNIX machine despite the fact that the operating system is running on a distributed collection of nodes.
As much as I like to push Linux (I use it as my desktop) it just isn't correct to say it is in the #2 in the top 500 list.
Yes, lots of machines do modeling and analysis well. The problem is that the questions the analysts are most interested in require more and more computing resources. When someone is testing a nuclear weapons simulation, with over 100 million degrees of freedom, it can take the entire resources of a Cplant or ASCI Red machine a couple of days just for the first seconds of the event. Speaking from my experience, I've never heard of any of the processing power of either the Cplant or ASCI Red machine being used for something as mundane and non-engineering-related as cracking codes to see if Aunt Bessie thinks that Communist men are cuter than Democratic men. These people have much more fun things to do.
This is related to what is running on ASCI Red. The pages suggest that this is not exactly what is running on ASCI Red.
The press release actually mentions that it is necessary to agree to some licensing terms before downloading. This unnamed license turns out to be the GPL, which many of us know.
pretty well, the only problem is the bouncing cards at the end fly by WAY too fast.
--
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
Sure, TCP/IP will be slower than Myrinet, but that's a speed problem, not a scalability problem, right? You can have an ethernet switch which does the same function as a Myrinet switch, just slower. Again, it's speed, not scalability.
--
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
If clusters were limited to "thousands of nodes", I promise you nobody would notice.
You're talking about clusters of uniprocessors. I don't research this area myself, but I have been told that clusters of 4-way SMP machines saturate the memory bus before the netrork. That's because you have four CPUs and a Myrinet card all competing for the same bus, not to mention other DMA devices.Sure, if you use uniprocessors in your cluster, then the network becomes the bottleneck. No big insight there. And, it has nothing to do with TCP/IP.
You mean to say the TCP/IP scaling problems in clusters are the same as in the Internet? I think not.Look, I just think a claim like "TCP/IP has fundamental scaling problems" could have been phrased better, because it seems ridiculous at first glance. I'm sure their claim is valid, whatever it is, but clearly there is ample evidence that TCP/IP itself scales to far larger systems than any cluster.
--
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
I like where they say TCP/IP has inherent scalability limitations. Have they heard of the Internet?
--
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
The original poster wasn't just talking about using the code "in bombs." The primary thing the Sandia people are doing with this code is simulating nuclear explosions. The code doesn't have to actually be in the bomb to have helped kill the people. It could be used on the supercomputer that ran the simulations that allowed the designing of the bomb.
Also, I worked in defense contracting for Lockheed for the Aegis defense system. Of course the guided missles (like Tomahawks) don't run anything even remotely similar to what we would call an "OS". There aren't different levels of apps running on top of each other, theres just dedicated circuitry hardware designed to do what the missle needs to do. Running something like linux or even QNX would be utterly assinine.
With about 30 machines dedicated to "research" surely one of those already perform the job well. FYI the government of the USA has 19 classified machines as well which are most likely NSA, and military machines. All of it can't be for so called nuclear research. At least in my eyes
Want Root?
I always wondered what big brother does with these super computers, the FAQ says little about what tasks these perform, and I doubt you would need that much supercomputing for research.
So the question is, just what is Sandia doing with this? Making super comps to crack codes perhaps for the NSA? Aside from that maybe some sole company should look into recovering the hundreds of obsolete PC's that are being tossed and create a super comp to test with and perhaps create the ultimate crypto algorithm. (Yes I know slightly off topic)
Does anyone have any idea as to what these machines are truly doing?
Want Root?
is it now THE GPL, as in THE Ohio State University?
Supreme executive power derives from a mandate from the masses, not from some farcical aquatic ceremony.
That being said, as far as this hardware - a commodity cluster system - goes, it seems that this is a pretty decent set of tools and optimizations.
Now that they've been "un-bought", Cray gets to put its name back on the list as an independent company.
As a side note, SGI sold the Cray division because it was "unprofitible" and a fiscal liability. Yet, Cray Inc. made a profit last quarter, and SGI has lost about $2/share for the last several quarters in a row, and just layed off another 1/3 of their workforce. Oh, and Cray's stock price is higher. Go fig. :)
And, I quote: "This is related to the software used in their ASCI Red supercomputer, and eliminates several scalability problems to allow hundreds of nodes for algorithms which can't be parallelized for Beowulf-type clusters." This is a pretty big over-statement. From exploring their site, it seems pretty clear that, while they made a few scalability enhancements (like cutting out the TCP/IP stuff, etc), they're main goal was to make large commodity cluster systems (Beowulf or not) more usable. They made a lot of good progress in this area by porting over several tools from their learning experience with ASCI Red. I also found it funny that their "commodity machine" had a custom-made myrinet switch. I think it must be hard to resist the "if we don't have it, we'll build it" mentality of a National Lab. Very cool. Oh, and I'm not sure when the source was put up, but from what I can tell, the site hasn't been updated in almost a year.
I'm afraid that I'm not really following you.
If you are saying that the word "the" in front of GPL in unnecessary, I would disagree.
Read both of the following, aloud:
"This software is licensed under GNU Public License."
"This software is licensed under the GNU Public License."
-Peter
The press release refers to "licensing terms", but the license is the GPL.
What in the world does this mean? The GPL was a license and had several terms last time I checked.
-Peter
Why bother?
Use ATLAS (http://www.netlib.org, platform self tuning BLAS and LAPACK) and FFTW (run-time algorithm optimized Fourier transforms).
Both are portable and both approach or beat the performance of proprietary hand-tuned assembly written libraries.
But don't take my word for it. MATLAB (http://www.mathworks.com) now uses the ATLAS implementation of LAPACK / BLAS and MIT's FFTW in the their computational core.
I've used the ASCI Red BLAS and FFT stuff. I think the reason that it is not freely distributed is that it was developed in colloboration with Intel employees. However, ASCI Red libraries always had the disclaimer to the effect that if you had a compelling reason to have the source something could be worked out.
Check out how FFTW works. It is one of the few things I've seen that I would actually consider clever. Basically, FFTW designs a algorithm at run-time which is optimal for your cache size, register file depth, memory bandwidtch and transform type; powers of two sizes are not required. What really impressed me is that FFTW's codelet generator stumbled across a couple of hitherto unknown algorithms with reduced flops for computing strange sized FFTs.
ATLAS is pretty clever too. For kicks, run the installation and watch it tune the kernels. The routines for portably diagnosing FPU register size, FPU MAC performance and cache sizes are useful to have around.
Kevin
528 Pentium III-800 computer, running Red-Hat Linux. (well, usually it is some less, since a couple are out of order at any time). Just normal mini-towers in long rows of shelves. And a huge network switch. Normally, the info can be found here, but since Murphy lives, it seems like our webserver is down right now, another picture.
Well, back to abusing this machine.
Some of the particular issues surrounding Sandia's Cplant project were the subject of a previous story on Slashdot.
AFAICT, the upshot is all the tweaking that must be done to coax higher performance on numerically intensive codes with that many processors.
As many in the numerical simulation community already know, message-passing codes abuse a network in a way that web browsers do not; demanding lower latency and higher bandwidth than can be provided by plain ole 10/100 Mb Ethernet (at least for large numbers of high SPECfp processors with any reasonable memory speed.)
The existence of Linux open source code facilitates the creation of their Portals layer that sits underneath MPI and above the Myrinet hardware on these Alpha machines.
"Provided by the management for your protection."
This is great stuff. This is similar to what the commercial applications Ab Initio and Torrent Orchestrate do. What this software does is provide a standardized, consistent worldview of the all the resources in your parallel system. It should allow you to partition data, pass out processes to nodes, and handle internode communication between them transparently.
:-)
This is an important software release, because it is a step away from hand rolled, low level message passing, toward a standarized means of communicaton between nodes at a much higher level of abstraction. Think of it this way: You don't want to have to write all of the control logic for processes that are divvied out to the nodes when you are writing an application. Instead, you provide base classes of behaviour, distribute them to all of the nodes, and then inherit and instantiate specialized behaviors for _EACH JOB_ from a control partition.
This provides a nice level of abstraction for the programmer. It also puts Linux MPP systems in the same class as your IBM SP/2, NCR/Teradata, and Clustered Solaris systems, among others. I think that I will be doing some work on enhancing this software!
Oh, and yes, I do professional parallel programming for a living.
Cheers and kudos to Sandia for releasing this as GPL!!!
~Religion is O.K., as long as it gets you laid.
Not Found /cplant/doc/man/yod.html was not found on this server.
The requested URL
Treatment, not tyranny. End the drug war and free our American POWs.
Treatment, not tyranny. End the drug war and free our American POWs.
See my user info for links.
Will he post any changes? Will the Russkies? Will the Chinese Communists? Enquiring minds want to know. :)
Unfortunatly, the trend in software development has been to use the capabilities of the hardware to mask the lousy performance of the software. Unfortunately, it's often cheaper to upgrade your hardware than it is to pay a team of programmers for 6 months to make the software more efficient. The attitude in business is all too often that of "slap somthing together and release it, so you can get on to the next project". Most internally-developed software (and a large percentage of commercial apps) are in perpetual beta. There's never time to do it right, but there's always time to do it over.
Why is it that the proponents of "one nation under God" are so eager to get rid of "liberty and justice for all"?
see subject
Linus is great, but lets get our facts straight...
Just because an application uses a cluster does not automatically mean that it's running on a stack of comodity PCs running Linux in a beowulf-style cluster interconnected via GigE or Myrinet. It also doesn't automatically mean that the application isn't capible of running well on a single large machine.
For example, NOAA recently put together a cluster for computing weather models for the upcoming hurricane season. Their cluster is actually 8 machines, each a 128 CPU SGI Origin 3800 runing IRIX 6.5. The 8 machines are interconnected through a thick mesh of GSN (gigabyte system network, a modern version of HiPPI that can transfer 800 megabytes/sec per link). The messaging protocols used are a mixture of shmem, OpenMP, and MPI.
Linux is great and all, but ASCI Red uses Intel's Paragon OS, a derrivative of Unix.
Linux isn't running on the second-fastest supercomputer either, Paragon OS is.
Heh, I can just see a good trivia question...
Who is the "Cray woman on the upper right-hand side of most cray.com pages?"
http://www.cray.com/products/index.html
Cray, Inc. is much more alive than their former owner, SGI...
Lots of new products and they're even making a profit.
http://www.cray.com/products/systems.
Nice varitey of systems, from their own SV1/SV1ex/SV2 machines, to Linux clusters, to maspar Alphas, to NEC vector-based machines, and more.
Lets see, ALPHAs. 'nuff said.
Alphas aren't anywhere near dead, as many people have said they are, neither is cray.
High Performance Message Passing: In order to support application-level communication, such as MPI, as well as system-level communication, such as that which occurs between the compute node daemons and the launcher, a flexible, high-performance data movement layer is needed. Much of the work on the Intel MPP machines focused on providing a communication layer that could deliver the highest possible percentage of network resources to these applications. The result of this work are Portals , which are the data movement layer supported on the Intel TFLOPS machine.
don't know why... but does anyone remember in Pulp Fiction when the black guy (Jules) is threatening that guy with the gun and says "English, motherf*cker, do you speak it?" and then almost blows his brains out? Sometimes that's how i feel.
Moon Macrosystems. Sun's biggest competitor.
With all of the talk about Beowulf clusters of this and that, I'm surprised that Intel has only one appearance on the the Top500 Supercomputers list.
MayorQ
How well does this scale in contrast to Unicos/mk?
--
"You have been charged by the War Crimes Tribunal with genocide, crimes against humanity, unprovoked aggression against peaceful neighbors ... and violating the terms of a license for free software!"
"I was just following orders! But I was ordered to recompile my kernel!"
here! Enjoy!
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
I wonder how well Solitaire runs on one of these.
as it allows the creation of super computers based upon a clustered set of smaller computers?
-- If no truths are spoken then no lies can hide --
Will the x86-optimised ASCI Red BLAS, FFT and Extended Precision libraries also be open-sourced and licensed under the GPL instead of the binary-only releases to-date?
Scroogle
NSA, NASA, Sandia, etc.
Users of Cplant wanted to checkpoint their computation every 10 minutes or so to graphically observe the progress of the program. This is a very common technique in large-scale computation. Unfortunately, it took more than an hour to do that on the incarnation of the hardware that had six OC3s for external communication. And that was before the machine grew bigger. Commodity machines tend to have commodity io. Get what you pay for (sometimes). I thought Cplant did a nice job of applying two-level structure to an otherwise flat sea of cluster nodes. Think of it as worker bees and team leader bees. A rack of workers would have a team leader responsible for that rack. There's a 1024p SGI machine at NASA Ames. You can run your program through the compiler, get an a.out, run it, get all of those puppies honking at the same time while file io can be instantly visible to any or all of them. File io could easily be in the GBytes/sec range to/from a single file descriptor and single file. Wake me up when a cluster can approximate that functionality (compile+go, unified io) albeit if not at similar performance levels.
Now watch all the posts on Beowulf clusters come in!
Although this sounds good for Linux, now in the number 2 most powerful computer in the world. Another sign that Linux is on the rise and not "dead"
Slashdot Hypocrisy at work?
With the PS2 Linux Kit this could result in some interesting games.
Cool! The T3E is kind of dull-looking, compared to the 'classic' Crays, but the SV1 looks great, and clusters!
(Go on, someone say it)
"What are we going to do tonight, Bill?"
www.lucernesys.comHorizon: Calendar-based personal finance
Was anyone else suprised at the number of Crays on the list? I thought they had been obsoleted years ago, when SGI bought Cray. Obviously not.
"What are we going to do tonight, Bill?"
www.lucernesys.comHorizon: Calendar-based personal finance
Linux is running on the second fastest Supercomputer(via. clusters of parallel computers) in the world. Call Guiness.
----
Just because a bunch of people believe or do something stupid, doesn't make it any less stupid.
If their going to drop a guided bomb, they're certainly not going to use Linux. What does a bomb need with a Bash prompt anyways. All it needs is some guided sensors, some basic logic and *maybe* a way to communicate to the mothership about its current situation. This is customized and hardware specific code for the current device. For cost reasons, they probably want to keep this on a small integrated chip, not an internal HD with megs of memory and an Open Source Kernel. It really doesn't make sense to use Linux.
----
Just because a bunch of people believe or do something stupid, doesn't make it any less stupid.
You might be right. Unfortunately, this is also the government. They have the money and the resources to build customized chips. They certainly wouldn't want your precious money go to waste by not using it, would they? The economics of non-profit governments. Spend as much money as possible so that you can claim that you have insufficient funding(so you can get more.) Ironic, isn't it.
----
Just because a bunch of people believe or do something stupid, doesn't make it any less stupid.
First rule of life. Don't state the obvious. Of course the Bourne Again Shell is not Linux. The kernel is Linux, nothing else. Like duh. Next topic, it could probably be used for a smart bomb. I'm sure you could put MS-DOS on an integrated chip, add some additional digital logic, and design a bomb that would have an accuracy of 95%+. The question is, why? The government would have to write most of the logic itself and what ever is included in the Linux kernel is probably just unneeded extras. It could save money in the long-run but the government is not about saving money, it is about keeping the people employed and educated so that they can squeeze more money out of them in the long run. Their is a balance of interest here. Bigger government with more people employed by it or a smaller government with a smaller budget and fewer people employed. It all depends on the current status of the economy and whose in power to which one they choose.
Secondly, their is a problem with the GPL license. Sinced no one "owns" the license, their is no one to sue them. Sure, if some big organization breaks the license, many people of the community could get together enough funds to hire a good lawyer and sue their ass, but that is based on the power of the people to cooperate. Their is no central organization to control the power and thus could result in chaos when people break the license, money is short, or no one wants to sue them. You're not going to spend a million dollars to sue the little guy because he broke the license agreement, it is uneconomical. This is a type of federalism and it can, when times are tough, plain suck.
Third of all, GPL only works as an End User License agreement if everybody cooperates. If only a few are willing to cooperate while others do their thing, it breaks up. Fortunately, so far, groups have cooperated and it has worked. We may not be so lucky forever.
----
Just because a bunch of people believe or do something stupid, doesn't make it any less stupid.