Slashdot Mirror


TeraGrid v. Distributed Computing

Nevyan writes "After three years of development and nearly a hundred million dollars the TeraGrid has been running at or above most peoples expectations for such a daunting project. On January 23, 2004 the system came online and provided 4.5 teraflops of computing power to scientists across the country. However, the waiting list for TeraGrid is long, including a bidding process through the National Science Foundations (NSF's) Partnerships for Advanced Computational Infrastructure (PACI) and many scientists with little funding but bright ideas are being left behind. While the list of supercomputer sites and peak power is growing how is the world of Distributed Computing faring? "

22 of 124 comments (clear)

  1. Distributed Computing by Iesus_Christus · · Score: 5, Insightful

    The problem with using distributed computing for everything is that the number of people willing to let others use processing power on their computer is not infinite. It is a very large number, but eventually everyone who wants to/knows how to help out their favorite cause will have something already installed. In addition, the more useful endeavors that use distributed computing, the less users you will get for each, and only the 'interesting' projects will get many users. Who wants to use their computing power to analyze some boring old physics experiment when you could be finding aliens or curing cancer?

    Distributed computing has its uses, but remeber: the public will only be willing to help you as long as they feel like they're contributing to something worthwhile.

    1. Re:Distributed Computing by samael · · Score: 5, Insightful

      Now the public can chose what problems that it wants solved

      Jesus, there's a horrible thought. I've met the public (and seen it's choice in TV). I'd rather have monkeys choose.

    2. Re:Distributed Computing by hunterx11 · · Score: 5, Insightful
      Mathematics and science are neither arbitrary inventions nor entirely self-evident discoveries. They are our attempts to understand and categorize the universe in the most objective manner possible. I would even argue that language is a much more abstract type of categorization. Despite mutually unintelligible differences in languages, all languages are used to describe the same reality.

      What I'm trying to say is that the semantics of how we describe the universe may be arbitrary, but the universe is objectively describable.

      --
      English is easier said than done.
    3. Re:Distributed Computing by Alkonaut · · Score: 5, Interesting
      Why aren't websites sponsored by applets doing distributed tasks? I'm thinking mainly websites with huge numbers of visitors, where visitors tend to stay long enough to do any meaningful work. (Like GMail for example). Most people don't use more than 5% of computing power when surfing the web, and an applet is safe and easily distributed.

      Personally, I'd much rather have an applet using 10% of my cpu power instead of an annoying flash banner (which probly itself uses 10% cpu...).

      Obviously someone has to pay for internet content, and that to me would be the least intrusive way. Popup-blockers will be inefficient by the end of the year. Ads will be inside the site content. Or worse still, the popup window is the main window, while the actual content is spawned as "pop under" meaning that if you have a popup stopper, all you get is the ad window...

      Cpu cycles is the perfect internet currency. Everyone who visits a website has them.

  2. Looks good to me by DruidBob · · Score: 4, Interesting

    There have been big projects like SETI@home, Great Internet Mersenne Prime Search, RC5-64 and many others.

    There are some like the Casino-21 http://www.climate-dynamics.rl.ac.uk/ and Evolution-at-Home http://www.evolutionary-research.org/ too.

    It's becoming easier to create the required code for distributed projects, and it most certanly has become easier to actaully get them distributed.

  3. Grid and Distributed comptuing by Anonymous Coward · · Score: 5, Informative

    Important to remember that the Grid is a _kind_ of distributed computing. But the main thing about The Grid (like The Internet, The Grid is basically TeraGrid in the US + European Data Grid) is that it is suitable for handing off parallel jobs with high intercommunication needs to (i.e. MPI jobs). Not necessarily because these jobs can run across different nodes of the grid (though they can with MPI/Nexus or whatever it's called), but because each "node" in the Grid network is a HUGE MOFO LINUX CLUSTER or similar. The grid gives lots of physicists access to computing resources for parallel processing jobs that would otherwise be sitting idle.

    What /.ers generally mean by distributed computing is a bit different - most apps there are "embarrassingly parallel" ones you can just farm out. They don't need to chatter to eachother, just process some data and send it back to Central.

  4. The Google Compute Project by BoneThugND · · Score: 5, Informative

    Google's distributed OS has been discussed a lot on Slashdot, but it is more than just a search algorithm on their own servers:

    Google Compute is a feature of the Google Toolbar that enables your computer to help solve challenging scientific problems when it would otherwise be idle. When you enable Google Compute, your computer will download a small piece of a large research project and perform calculations on it that will then be included with the calculations performed by thousands of other computers doing the same thing. This process is known as distributed computing.

    The first beneficiary of this effort is Folding@home, a non-profit academic research project at Stanford University that is trying to understand the structure of proteins so they can develop better treatments for a number of illnesses. In the future Google Compute may allow you to also donate your computing time to other carefully selected worthwhile endeavors, including projects to improve Google and its services.

    - The Google Compute Project

  5. Re:My Personal Vision by jpr1nd · · Score: 5, Informative

    The BOINC platform (that seti@home is switching over to) has the ability to divide work between project as you suggest. Though I'm not really sure that there are very many other projects running on it.

  6. Payment for Work Units by Iesus_Christus · · Score: 5, Interesting

    The idea of payment for work units is interesting. While it would certainly provide incentive for participating in distributed computing projects, I can see two problems with it already:

    1) Getting the money to pay people. One advantage of distributed computing is that you don't have to pay for time on expensive cluster. That advantage disappears when you pay distributed computing users. Of course, it may still turn out to be cheaper, and there may be users willing to participate for free.

    2) Botnets and profit. We all know of spammers using zombies to peddle goods, and of script kiddies using them to DDoS. What if some enterprising but immoral person decided to use the computing power of his zombies to profit off of the distributed computing payments? With enough zombies, he could easily make a good amount of money off of other people's computers.

    1. Re:Payment for Work Units by Caseylite · · Score: 5, Interesting

      Another way to pay people would be to offer incentives such as allowing me to write off your process time (wear and tear on my system) as a charitable donation to your non-profit group. ~Casey

  7. Look, this is really very simple by bersl2 · · Score: 4, Insightful

    If you can divide your problem into very many independent subproblems, clustering or distributed computing will work well. If not, your best bet is a true supercomputer.

    So: SETI@Home splits up its scans into sections, each of which do not depend on any other; therefore, a distributed solution is efficient. However, the Earth Simulator deals with chaotic systems (or so I would assume), which do not independently parallelize; this is where having hundreds of processors and terabytes of RAM and using something like NUMA is greatly more efficient.

    In short: use the right tool for the job.

    1. Re:Look, this is really very simple by billstr78 · · Score: 4, Informative

      As noted in earlier comments, the TeraGrid's individual nodes _are_ NUMA clusters. This allows large, non-parallel computations to be run without individual service level agreements, login coordination and scheduling issues gumming up the process. The TeraGrid is an effort to remove the administrative nightmare's keeping most clusters from being fully utilized and most small-time scientists work from being completed.

  8. Re:My Personal Vision by billstr78 · · Score: 4, Interesting

    IBM is already making that vision a realization.

    They are in beta stages of a massive computation cycle for hire program that will allow organizations without the funding for an entire cluster to purchase cycles provided by a large IBM Power cluster.

    It will allow for a computation cycle market to eventually arise, much like the wheat, corn or gold markets. Companies will compete to provide cheaper cycles, small-time scientists around the world will be able to have thier computation intensive problems solved at a fraction of the current cost possible today.

  9. Access and Denial by nevyan · · Score: 4, Insightful

    The problem with large projects like TeraGrid, EarthSimulator and other supercomputer sites is that the underfunded _brilliant_ ideas are left behind by those who can afford to pay for or build these centers and sites.

    While TeraGrid is a powerfool tool it is one that thousands of scientists and laboratories are standing in line to use. Meanwhile Distributed Computing is available, cheap and relatively quick.

    While it may look good on your project to say you used a IBM BlueGENE or DeepComp 6800 is it really worth the extra cost and waiting in line for your chance to use?

    True Distributed Computing is the way to go and shows positive results. Now we just need to tinker with it some more!

    1. Re:Access and Denial by Seanasy · · Score: 4, Interesting
      The problem with large projects like TeraGrid, EarthSimulator and other supercomputer sites is that the underfunded _brilliant_ ideas are left behind by those who can afford to pay for or build these centers and sites.

      What are you talking about? These are publically funded resources. You apply to the NSF for time on these machines. If you're at a U.S. institution and you have a real need for supercomputing you can get time on these machines.

      While TeraGrid is a powerfool tool it is one that thousands of scientists and laboratories are standing in line to use. Meanwhile Distributed Computing is available, cheap and relatively quick.

      And Distributed Computing can't even begin to solve some of the problems that supercomputers are designed to address.

      While it may look good on your project to say you used a IBM BlueGENE or DeepComp 6800 is it really worth the extra cost and waiting in line for your chance to use?

      Yes. When you want to simulate every molecule of a proteing in a water solution (~17000 atoms worth) you need a supercomputer. DC can't do it.

      True Distributed Computing is the way to go and shows positive results. Now we just need to tinker with it some more!

      DC is neither a religion nor a panacea.

  10. Why the versus? by Anonymous Coward · · Score: 5, Insightful

    I don't understand why we are asking how a hammer is doing compared to a screwdriver? Both are varied computational models, and are at best architectural descriptions as titles; TeraGrid v. Distributed Computing. They have specific application domains and are used to solve different types of problems. One dealing with non-discrete data and experimental calculations (TeraGrid), the other focused on discrete chunks of data being filtered or rendered and are non-time nor message dependent (Distributed Computing; as defined by the Nevyan's reference). You have two tools in your tool chest. What makes one better than the other? They have completely different jobs that they tackle. They both will be successful. They need not be in competition.

  11. Re:Did I read that right? by Detritus · · Score: 4, Insightful

    FLOPS are easy, low latency, high bandwidth communications paths are hard.

    --
    Mea navis aericumbens anguillis abundat
  12. Wolfgrid by admiralfrijole · · Score: 5, Informative
    Wolfgrid, the NCSU Community Supercomputer, is coming along nicely.

    It is based on Apple's XGrid, and uses volunteers from the Mac community here at NCSU, as well as some of the lab macs, and soon we will hopefully have official Linux and Windows clients, maybe even Solaris, to run on more of the computers around campus.

    There is even a really nice web interface that shows the active nodes and their status, as well as the aggregate power of the two clusters.

    Its really nice, anyone who is part of the grid can just fire up the controller and submit a job, I am a part of the lower power grid since my TiBook is only a 667, but I was able to connect up and do the Mandelbrot Set thing that comes with XGrid at a level equal to around 7 or 8 GHz.

    There are some screenshots here

    --
    e to the pi i plus one equals zero
  13. Re:Recommend good cause to donate my free cycles t by antispam_ben · · Score: 4, Informative

    There's this one, it's probably what you want:

    http://www.stanford.edu/group/pandegroup/folding/

    But I'm quite selfish (and actually interested in primes abd or at least know more about them than I do about protiens), and there are entities offering big prize money for big primes, and if one of my machines finds one, I'll get big bucks:

    http://mersenne.org

    --
    Tag lost or not installed.
  14. Re:My Personal Vision by Corporal+Dan · · Score: 5, Informative
    From ClimatePrediction.net:
    Hi, we are still rolling along with BOINC, hoping for an alpha test by the end of the month, beta in July, and hopefully a release in August when David Anderson from SETI/BOINC will be visiting us for a few weeks.

    We threw together a simple sign-up page to be contacted (just once or twice when we're ready for beta testers), so if you want to try out the Windows, Linux, or Mac versions of CPDN please signup here!

    http://climateprediction.net/misc/beta.php
  15. I call BS by Prof.+Pi · · Score: 4, Informative
    True Distributed Computing is the way to go and shows positive results. Now we just need to tinker with it some more!

    It's too bad that whoever modded this Insightful doesn't know much about parallel applications.

    DC is fine and very cost-effective for its niche of applications, which is those that are "embarassingly parallel." This is (somewhat circularly) defined as being very easy to parallelize on a DC machine. What characterizes these apps is very low communications between different tasks, which works for DC because the high network latency doesn't get in the way.

    I've love to see you try to put Conjugate Gradiant (CG) on a distributed system. It involves large matrix-vector multiplies that inherently require lots of vector fragments passing between the processors. CG is one of the 8 NAS Parallel Benchmarks, and if you look at Beowulf papers that use NAS, you'll see that they often leave out CG because performance is so bad. If it's low on a Beowulf, where the network is presumed to be local and dedicated, it will totally suck on anything with a typical high-latency/low-bandwidth network.

  16. Teragrid runs a lot of Linux by haruchai · · Score: 4, Informative

    It's an all *nix environment presently totalling around 4200 CPUs of which 96 ( in a single cluster)
    is AIX 5.2, 3128 (WOW!!) is on Tru64 (in 2 clusters) and the rest, distributed in 5 clusters
    are some form of Linux.
    Two of the clusters have a second phase which together will add 316 CPUs on Linux.

    As of October 1 of this year, 5 clusters at 3 sites will be added with the OS / CPU breakdown as follows:
    Linux : 1800 CPUs in 3 clusters
    AIX 5.1 : 320 in 1 cluster
    Solaris 9 : 256 in 1 cluster

    That's an awful lot of Unix and a buttload of Tru64 and Linux

    --
    Pain is merely failure leaving the body