Distributed Computing Economics
machaut writes "In a ClusterComputing.org article, Jim Gray, director of Microsoft's Bay Area Research Lab, provides an interesting economic analysis for building distributed systems. When do you choose a grid over a cluster or a supercomputer?
When does it pay off to move a task to the data vs moving the data to the task? He takes current hardware and networking costs into account to answer those questions."
How much does it cost to keep hundreds of regular computers (with all their extra peripherals) crunching away vs. a specially designed computer/set of computers.
Also, as long as people are still allowed to decide what runs on their own computer you will have to convince them that they should help you with your distributed computing task.
SETI@Home worked so well because people want to know the answer. People are interested in the results. If you tried to do a distributed apple browning application nobody would download it.
i don't like my old sig.
And how is this different from you or I act?
I don't know for you, but I make GPL software, I give it away for free and therefore I give time and money to the community, partly to pursue a certain idea of the computer industry I desire.
In a way, it's just like people who run the Seti@Home client : they don't do it just "to get a free screensaver" like that Microsoft guy narrowly thinks, they also do it because they want to feel part of a greater, more noble effort than just getting rich quick.
When was the last time Microsoft gave anything open-source or for free that didn't serve one of their short, medium or long term plans ? I mean, it's okay, they're there to make money and they admit it, there's nothing wrong with this goal as long as they try to achieve it morally and legally, but why should it be the same for everybody ?
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash
I'm sorry, I don't follow your maths here...
There are 2678400 seconds in a month (assuming 31 days...), so that makes 2678400 Megabits transmittable in a month, or 334800 MegaBytes. Each of your $100 buys 3348 MB, which is 3.3 GB - same order of magnitude as the author suggests...
Perhaps you meant 2678400Mb per month.
A few years back when Grid computing was all the rage we sat down with some investment partners and worked out the figures. We came pretty much to the same conclusion. The "average" commercial supercomputing application (pharma, oil drilling, simulation) would not benefit from "free" cycles on the network.
Essentially, any commercial computation valuable enough to require that amount of effort can justify purchasing a hundred thousand node beowulf cluster and run locally. The reduction in network costs, the advantages of total control and tight security more than pay for the difference in computing cost.
Non-commercial computations such as SETI will benefit from grid computing, and we expect to see more efforts long those lines (RSA, Mersenne, Stanford DNA). But remember, we were thinking about starting a business, and none of those pay for the services, so we moved on.
I think it's probbably safer to say that seti@home has a huge surplus of computational power, and uses it to verify each result (though it's not strictly necessary to do so). With only one data source (Aerecibo) you can only produce data so quickly, and once you have enough computational power to do the analysis in real time any extra is just surplus that can be used to verify. They did, however later add some extra analysis to the data to take better advantage of the huge surplus of computing power they have.
The important point though, is that for seti@home each individual workunit, while important isn't critical to the whole project. If a small percentage of workunits aren't computed perfectly it's not catastrophic. In other words there's a certain amount of tolerance for innacuracy. For a project like the OGR (Optimum Golomb Ruler) by distributed.net each workunit must be calculated perfectly, as the goal is to prove which ruler is the optimum one. If workunit isn't verified you haven't really proven anything, since it's possible (and probbably likely) that hardware failure produced an innaccurate result somewhere in the millions of workunits calculated. (Or perhaps a modified client produced innacurate results). Other distributed computing tasks have different amounts of tolerance for innacurate results.
Your underlying point is a good one though. For some projects the need for integrity of the results is very high, so larger computing power may be necessary to verify each result.
AccountKiller