TeraGrid v. Distributed Computing

← Back to Stories (view on slashdot.org)

TeraGrid v. Distributed Computing

Posted by timothy on Sunday July 18, 2004 @10:14AM from the lots-of-pieces dept.

Nevyan writes "After three years of development and nearly a hundred million dollars the TeraGrid has been running at or above most peoples expectations for such a daunting project. On January 23, 2004 the system came online and provided 4.5 teraflops of computing power to scientists across the country. However, the waiting list for TeraGrid is long, including a bidding process through the National Science Foundations (NSF's) Partnerships for Advanced Computational Infrastructure (PACI) and many scientists with little funding but bright ideas are being left behind. While the list of supercomputer sites and peak power is growing how is the world of Distributed Computing faring? "

23 of 124 comments (clear)

Min score:

Reason:

Sort:

Grid and Distributed comptuing by Anonymous Coward · 2004-07-18 10:32 · Score: 5, Informative

Important to remember that the Grid is a _kind_ of distributed computing. But the main thing about The Grid (like The Internet, The Grid is basically TeraGrid in the US + European Data Grid) is that it is suitable for handing off parallel jobs with high intercommunication needs to (i.e. MPI jobs). Not necessarily because these jobs can run across different nodes of the grid (though they can with MPI/Nexus or whatever it's called), but because each "node" in the Grid network is a HUGE MOFO LINUX CLUSTER or similar. The grid gives lots of physicists access to computing resources for parallel processing jobs that would otherwise be sitting idle.

What /.ers generally mean by distributed computing is a bit different - most apps there are "embarrassingly parallel" ones you can just farm out. They don't need to chatter to eachother, just process some data and send it back to Central.
The Google Compute Project by BoneThugND · 2004-07-18 10:33 · Score: 5, Informative

Google's distributed OS has been discussed a lot on Slashdot, but it is more than just a search algorithm on their own servers:

Google Compute is a feature of the Google Toolbar that enables your computer to help solve challenging scientific problems when it would otherwise be idle. When you enable Google Compute, your computer will download a small piece of a large research project and perform calculations on it that will then be included with the calculations performed by thousands of other computers doing the same thing. This process is known as distributed computing.

The first beneficiary of this effort is Folding@home, a non-profit academic research project at Stanford University that is trying to understand the structure of proteins so they can develop better treatments for a number of illnesses. In the future Google Compute may allow you to also donate your computing time to other carefully selected worthwhile endeavors, including projects to improve Google and its services.

- The Google Compute Project
1. Re:The Google Compute Project by billstr78 · 2004-07-18 10:38 · Score: 3, Informative
  
  Hmmm. I think you are confusing the distributed OS additions they've made to Linux for their own clusters with the idle process harvesting of thier Google Toolbar.
  
  The distributed OS and Filesystem in thier own clusters is far more advanced than a SETI@Home parallel work distribution algorithm. This OS/FS and projects like it are where the grid's heritige lies. There are many problems unique to the grid, but none of it could exist without the distributed system problems first solved in local area clusters.
TeraGrid doesn't use "Public" computers by StateOfTheUnion · 2004-07-18 10:38 · Score: 2, Informative

Distributed computing has its uses, but remeber: the public will only be willing to help you as long as they feel like they're contributing to something worthwhile. Uh, I'm not sure what this has to do with the TeraGrid . . . The TeraGrid is a distributed computing system . . . but it does not use the "public's" computers. It uses university and computing center machines across the USA (e.g. NCSA, Argonne National Labs, Purdue, etc.) .
1. Re:TeraGrid doesn't use "Public" computers by deanj · 2004-07-18 13:32 · Score: 2, Informative
  
  Well, that's an interesting definition of "public" that doesn't appear to be anyone else's.
  
  Despite that fact, you are correct... most of the work that's run on those things is done by people that aren't part of the supercomputer centers, or the ANL. (There are a few "chief scientists" that DO run their work there, so I wouldn't say it's the 99.99-whatever% that another poster did).
  
  It's NOT available for the general public's use though. Even if you work at those places, that doesn't give you ANY certainty that you'll get to run anything on it....it's likely you WON'T, unless you are running tests or something like that.
Re:My Personal Vision by jpr1nd · 2004-07-18 10:38 · Score: 5, Informative

The BOINC platform (that seti@home is switching over to) has the ability to divide work between project as you suggest. Though I'm not really sure that there are very many other projects running on it.
Re:Look, this is really very simple by billstr78 · 2004-07-18 10:47 · Score: 4, Informative

As noted in earlier comments, the TeraGrid's individual nodes _are_ NUMA clusters. This allows large, non-parallel computations to be run without individual service level agreements, login coordination and scheduling issues gumming up the process. The TeraGrid is an effort to remove the administrative nightmare's keeping most clusters from being fully utilized and most small-time scientists work from being completed.
Mac OS X users! by arc.light · 2004-07-18 10:58 · Score: 1, Informative

Charles Parnot of Stanford University is looking for your spare CPU cycles for his distributed XGrid@Stanford project.
Re:Did I read that right? by nevyan · 2004-07-18 11:08 · Score: 2, Informative

NSF award in August 2001 of $53 million for intial funding of four sites: NCSA, SDSC, CACR, ANL.

Pittsburgh Suprecomputing Center joined in when NSF announced supplementary funding with $35 million.

$10 million was supplied by NSF in September 2003 adding ORNL, Purdue, Indiana U., and TACC.

Total: $98,000,000.00 roughly.

What does government spending on the TeraGrid give you? 4.5 Teraflops distributed...

Nice.
Misinformed by Seanasy · 2004-07-18 11:17 · Score: 3, Informative

...and many scientists with little funding but bright ideas are being left behind.

Care to cite a source?

When you apply to the PACI program you get a grant of Service Units -- i.e. time on the computers. You don't need huge amounts of funding. The requirements state that you need to be a researcher at a U.S. institution. It also helps if you can show that you actually need and can use that kind of computing power.

And, please, distributed computing and supercomputing are not synonymous in terms of what problems they address. Distributed computing cannot replace supercomputers in every case. DC is good for a limited set of problems.

Lastly, an example of Teragrid research: Ketchup on the Grid with Joysticks.
Re:Did I read that right? by Anonymous Coward · 2004-07-18 11:28 · Score: 2, Informative

PSC has 6+TF alone in the TCS cluster, plus a giant NUMA system.. let alone the four DTF sites with 3TF+ *each*. Plus, they all have at least 30Gb/s uplink to each other.

Yes, the money was spent somewhere, and not on toilet seats.
Wolfgrid by admiralfrijole · 2004-07-18 11:34 · Score: 5, Informative

Wolfgrid, the NCSU Community Supercomputer, is coming along nicely.
It is based on Apple's XGrid, and uses volunteers from the Mac community here at NCSU, as well as some of the lab macs, and soon we will hopefully have official Linux and Windows clients, maybe even Solaris, to run on more of the computers around campus.
There is even a really nice web interface that shows the active nodes and their status, as well as the aggregate power of the two clusters.
Its really nice, anyone who is part of the grid can just fire up the controller and submit a job, I am a part of the lower power grid since my TiBook is only a 667, but I was able to connect up and do the Mandelbrot Set thing that comes with XGrid at a level equal to around 7 or 8 GHz.
There are some screenshots here

--
e to the pi i plus one equals zero
Meta-programs for distributed computing by giveuptheghost · 2004-07-18 11:51 · Score: 3, Informative

This is exactly why there's BOINC, distributed.net, Grid.org, etc. that have multiple projects served to one user-installed program: Instead of projects having to compete for real-time resources, they run work units one after the other, and users can pick and choose which projects are run. This should prevent said burnout for most users.
Re:Did I read that right? by Seanasy · 2004-07-18 12:17 · Score: 3, Informative

One project on the Teragrid used 17 Teraflops. The poster hasn't done his/her research on the Teragrid.
Re:Did I read that right? by Seanasy · 2004-07-18 12:52 · Score: 3, Informative
- PSC: 6 TFlops
- NCSA + Caltech + ANL + SDSC: 15 TFlops
I would encourage you to visit www.teragrid.org and read more than the front page to learn what the Teragrid is
Re:Recommend good cause to donate my free cycles t by antispam_ben · 2004-07-18 13:55 · Score: 4, Informative

There's this one, it's probably what you want:

http://www.stanford.edu/group/pandegroup/folding/

But I'm quite selfish (and actually interested in primes abd or at least know more about them than I do about protiens), and there are entities offering big prize money for big primes, and if one of my machines finds one, I'll get big bucks:

http://mersenne.org

--
Tag lost or not installed.
Re:My Personal Vision by Corporal+Dan · 2004-07-18 14:30 · Score: 5, Informative

From ClimatePrediction.net:

Hi, we are still rolling along with BOINC, hoping for an alpha test by the end of the month, beta in July, and hopefully a release in August when David Anderson from SETI/BOINC will be visiting us for a few weeks.

We threw together a simple sign-up page to be contacted (just once or twice when we're ready for beta testers), so if you want to try out the Windows, Linux, or Mac versions of CPDN please signup here!

http://climateprediction.net/misc/beta.php
I call BS by Prof.+Pi · 2004-07-18 14:34 · Score: 4, Informative

True Distributed Computing is the way to go and shows positive results. Now we just need to tinker with it some more!
It's too bad that whoever modded this Insightful doesn't know much about parallel applications.
DC is fine and very cost-effective for its niche of applications, which is those that are "embarassingly parallel." This is (somewhat circularly) defined as being very easy to parallelize on a DC machine. What characterizes these apps is very low communications between different tasks, which works for DC because the high network latency doesn't get in the way.
I've love to see you try to put Conjugate Gradiant (CG) on a distributed system. It involves large matrix-vector multiplies that inherently require lots of vector fragments passing between the processors. CG is one of the 8 NAS Parallel Benchmarks, and if you look at Beowulf papers that use NAS, you'll see that they often leave out CG because performance is so bad. If it's low on a Beowulf, where the network is presumed to be local and dedicated, it will totally suck on anything with a typical high-latency/low-bandwidth network.
Re:Recommend good cause to donate my free cycles t by scoser · 2004-07-19 00:59 · Score: 2, Informative

Since someone has already posted the Aspenleaf list of projects, I'd like to point out my personal favorite, Find-a-Drug. It has actually returned positive anti-cancer and anti-AIDS results that have been lab tested and verified by the National Institute of Health. If that doesn't have immediate benefit to mankind written all over it, I don't know what does.
This isn't Grid Computing. by hal2814 · 2004-07-19 01:23 · Score: 2, Informative

"The "Grid" portion of the TeraGrid reflects the idea of harnessing and using distributed computers, data storage systems, networks, and other resources as if they were a single massive system." (from the TeraGrid FAQ)

It looks like TeraGrid is latching onto a catchword in order to boost awareness of their system. What they are describing here is not Grid computing at all. Grid computing was designed to take advantage of all the dead cycles that computers typically have. The idea is that someone might have a large group of computers that do not take full advantage fo their computational cycles (like a large lab for reading e-mails and browsing the Internet). With Grid computing you would take these computers (not some Itanium cluster like TeraGrid is doing) and distribute work accross these nodes that can be performed during otherwise dead cycles. (I have no sources immedeately available but check out Grid computing through the ACM or something and you'll see plenty of info on what Grid computing really is.)

This is what Seti@home does. It takes underutilized machines and runs computations on them. TeraGrid on the other hand, takes large clusters of otherwise unused machines and lays an abstraction over them that makes them look like one large supercomputer. This is nothing more than a distribution strategy. It looks like a nice distribution system that has the potential to scale well, but it's not Grid computing and it's nothing new.
1. Re:This isn't Grid Computing. by deanj · 2004-07-19 03:54 · Score: 2, Informative
  
  Whoa..you're so off-base on this it's not funny.
  
  The TeraGrid people (ANL, Ian Foster, etc) are the ones that coined the term "Grid" in the first place!
  
  You might not like their use of that term, but since they're the ones that came up with it in the first place, they're more right than you are.
Teragrid runs a lot of Linux by haruchai · 2004-07-19 02:08 · Score: 4, Informative

It's an all *nix environment presently totalling around 4200 CPUs of which 96 ( in a single cluster)
is AIX 5.2, 3128 (WOW!!) is on Tru64 (in 2 clusters) and the rest, distributed in 5 clusters
are some form of Linux.
Two of the clusters have a second phase which together will add 316 CPUs on Linux.

As of October 1 of this year, 5 clusters at 3 sites will be added with the OS / CPU breakdown as follows:
Linux : 1800 CPUs in 3 clusters
AIX 5.1 : 320 in 1 cluster
Solaris 9 : 256 in 1 cluster

That's an awful lot of Unix and a buttload of Tru64 and Linux

--
Pain is merely failure leaving the body
Re:Distributed Computing by ChaosDiscord · 2004-07-19 08:22 · Score: 2, Informative

The problem with using distributed computing for everything is that the number of people willing to let others use processing power on their computer is not infinite.

Off-topic. Teragrid is a dedicated distributed computing system. Various research centers are purchasing dedicated clusters to participate. For example, instead of three universities each purchasing a large cluster which will sometimes be idle; each will purchase a slightly smaller cluster and use each other's resources when available. In the particular case of high-energy physics, multiple sites were already collaborating to distribute the monstrous amounts of processing needed. Teragrid attempts to simplify this so that instead of human beings meeting and hand-distributing work, one person can simply run the "process-todays-events.sh" script and know that computers around the world are working on it.
The holy grail in grid computing is to be able to purchase compute time much like you purchase electricity. You don't build a power-plant next door just because you need power for your manufacturing company. Why should you purchase, maintain, and upgrade a cluster for your, say, DNA sequencing? Why not just purchase computing power from a dedicated company? Need several thousand hours of compute time in a rush (say, because you're up against a deadline)? Rent it! (That's an over simplification, there are obviously reasons you do many things in house. Don't take it as a complete argument, take it as the elevator summary.)

--
Search 2010 Gen Con events