TeraGrid v. Distributed Computing
Nevyan writes "After three years of development and nearly a hundred million dollars the TeraGrid has been running at or above most peoples expectations for such a daunting project. On January 23, 2004 the system came online and provided 4.5 teraflops of computing power to scientists across the country. However, the waiting list for TeraGrid is long, including a bidding process through the National Science Foundations (NSF's) Partnerships for Advanced Computational Infrastructure (PACI) and many scientists with little funding but bright ideas are being left behind. While the list of supercomputer sites and peak power is growing how is the world of Distributed Computing faring? "
Is for a different kind of distributed computing client, one that allows you to sign up for different kinds of research programs. For example, you could say "donate half my spare time to aids research, and 1/4 to math reserach, and 1/4 to seti research". Also integrate a method of possible payment for work units completed (and a checking process to remove cheaters) and I think you will have an increase in effeciency in the entire way that we treat computers. Maybe instead of everyone shelling out thousands for top of the line computers whose peak output they only need for 5% of the time, they shell out a lot less for a networked computer that buys time from other people's machines. Clearly this wouldn't work in all applications (particularly those requiring low latency) but with improving network connections I think this is a possible future.
I submitted this story last night, and it didn't get posted.
There have been big projects like SETI@home, Great Internet Mersenne Prime Search, RC5-64 and many others.
There are some like the Casino-21 http://www.climate-dynamics.rl.ac.uk/ and Evolution-at-Home http://www.evolutionary-research.org/ too.
It's becoming easier to create the required code for distributed projects, and it most certanly has become easier to actaully get them distributed.
The idea of payment for work units is interesting. While it would certainly provide incentive for participating in distributed computing projects, I can see two problems with it already:
1) Getting the money to pay people. One advantage of distributed computing is that you don't have to pay for time on expensive cluster. That advantage disappears when you pay distributed computing users. Of course, it may still turn out to be cheaper, and there may be users willing to participate for free.
2) Botnets and profit. We all know of spammers using zombies to peddle goods, and of script kiddies using them to DDoS. What if some enterprising but immoral person decided to use the computing power of his zombies to profit off of the distributed computing payments? With enough zombies, he could easily make a good amount of money off of other people's computers.
Only if they are investigating cannibalism. The purpose of science is the advancement of knowledge. Service to humanity, if it happens is incidental.
What are you talking about? These are publically funded resources. You apply to the NSF for time on these machines. If you're at a U.S. institution and you have a real need for supercomputing you can get time on these machines.
And Distributed Computing can't even begin to solve some of the problems that supercomputers are designed to address.
Yes. When you want to simulate every molecule of a proteing in a water solution (~17000 atoms worth) you need a supercomputer. DC can't do it.
DC is neither a religion nor a panacea.
There isn't a single Grid. Grid is a concept not an actual physical infrastructure, a way of working. In fact Grids can be ephemeral and dynamic based on the related concept of Virtual Organisations (VO)
There are various collections of machines which have been designed to facilitate Grid computing (instances of Grids), TetraGrid being one of them. Some systems or Grids are suitable for some types of jobs, some for others. As you rightly note, for the likes of MPI you need relatively closely coupled nodes.
Essentially Grid is aimed at being a way to link services (compute, data, visualisation, etc) into a cohesive whole such that ultimately you can have a fire-and-forget interface, with your work going to the place that best suits it, based on additional restrictions such as security, how much you are willing to pay for the results, and how long you are prepared to wait for them.
The back-end processing can include ad-hoc conglomerations of machines that some see as traditional distributed computing (e.g. machines running BOINC based clients and so on).
Currently much work is being done on the top-level wrappers that allow the groups of machines to be abstracted using web services and various transactional models based on web services (see www.gridforum.org), services built into multi-component workflows, and so on.
AaronGTurner
Personally, I'd much rather have an applet using 10% of my cpu power instead of an annoying flash banner (which probly itself uses 10% cpu...).
Obviously someone has to pay for internet content, and that to me would be the least intrusive way. Popup-blockers will be inefficient by the end of the year. Ads will be inside the site content. Or worse still, the popup window is the main window, while the actual content is spawned as "pop under" meaning that if you have a popup stopper, all you get is the ad window...
Cpu cycles is the perfect internet currency. Everyone who visits a website has them.
Actually, as an experimental and theoretical biophysicist-in-training who knows about proteins, I'd say the folding project is only marginally more useful than the prime number search. Most biology research projects, especially computational ones, has to be sold on the basis of potential benefits to human medicine. Such advertising does not actually mean that medical benefits exist.
While there's much to learn from studies of protein folding, there's very little medical importance to purely theoretical simulations. Since the delusion that we'll be able to replace laboratory research with really big computers is attractive to people who know nothing about biology, the impact of this type of research gets vastly overstated.
On the other hand, Folding@Home has already yielded far more interesting results (if not exactly "useful" outside of the world of biophysics) than SETI@Home probably ever will, so go for it.
As other people have said whether there is "The
m aps/lcg2.html
Grid" or Grids is like "The Internet" vs multiple
IP-protocol networks, including the private ones.
However, for practical purposes there is one "The
Grid" which will probably evolve into The Grid
without the quotes, and that is the worldwide
LHC Computing Grid, currently spread across North
America, Europe and northwest Pacicifc Rim.
Through EGEE (in Europe) and Open Science Grid
(in the US) LCG technology will spread out into
the wider scientific and research community.
Here's one of LCG's current monitoring maps,
showing the geographical spread of sites which
are part of the production service today:
http://goc.grid-support.ac.uk/gppmonWorld/gppmon_
AM
You're right on here. And to limit the potential even further is the effect that the *cause* might have on the altruism of the user. I for one would think differently about what I do with my spare clock cycles depending on who the end user is. Consider a pure research, potentiallly common good project versus a *pure research* funded by a AN Other Megacorp project. Its still an interesting idea though - BOINC would seem to me to be the next step in allowing different actions (programmed to a certain extent) to be carried out in a distributed fashion.
----- Every day we get up and make the choice that the thing we are doing is the most valuable use of our time. -----
Jesus, there's a horrible thought. I've met the public (and seen it's choice in TV). I'd rather have monkeys choose.
You might be right about those monkeys. In Holland, we have the Beursgorilla (http://www.beursgorilla.nl/). This gorilla decides what stock to buy or sell based on the bananas presented to him. He proves to be better at "advising" than most of the other "real" and expensive advisors.
For me, the DistributedComputingGorilla might decide what project will run on my computer.
""The "Grid" portion of the TeraGrid reflects the idea of harnessing and using distributed computers, data storage systems, networks, and other resources as if they were a single massive system." (from the TeraGrid FAQ)
It looks like TeraGrid is latching onto a catchword in order to boost awareness of their system. What they are describing here is not Grid computing at all."
No, they are right and you are wrong.
Using spare cycles is one thing you can
do with Grid technology, but it is not the
essential quality of Grids. "Grid" was coined by
Ian Foster et al by analogy with electricity power
grids. You plug into the wall and "it just works."
Here is Ian's discussion of what is and is not
a Grid from a couple of years ago:
http://www.gridtoday.com/02/0722/100136.html
AM