Distributed Computing Economics
machaut writes "In a ClusterComputing.org article, Jim Gray, director of Microsoft's Bay Area Research Lab, provides an interesting economic analysis for building distributed systems. When do you choose a grid over a cluster or a supercomputer?
When does it pay off to move a task to the data vs moving the data to the task? He takes current hardware and networking costs into account to answer those questions."
Ungodly numbers of "Beo-Wolf" cluster jokes arriving now!
The preceding post was not a Slashvertisement.
When do you choose a grid over a cluster or a supercomputer?
When you have a really high-paying job where you are paid to make such decisions.
I was happy that Gray covered SETI@Home as I think the nature of SETI is akin to where certain aspects of distributed computing may go in the future. However, I argue that he left some some key parts of SETI economics at the door; most notably, data integrity and security. As I understand it, *over half* of SETI's processing power, bandwidth, and so forth is used to verify data integrity as it's using untrusted hosts to do it's calculations.
This doesn't make SETI a poor supercomputer, but it does change the economics of it. An economic model of computing resources which accounts for work done by untrusted hosts as involving different overhead as that done by trusted hosts would be a much more useful metric to think in terms of.
Wow, what a world. $1 will now buy:
1 GB sent over the WAN
10 Tops tera-CPU instructions
8 hours of cpu time
1 GB disk space
10 M database accesses
10 TB of disk bandwidth
1 large beverage
1 of everything in the $1 store
1 unlimited phonecall from some 10-10-### phone company.
5 packets of cool aid
10 packets of generic cool aid
2 cans of coke
When I was a child, data was expensive, and food was cheap...
The author points out: "The ideal mobile task is stateless (needs no database or database access), has a tiny network input and output, and has huge computational demand."
"And of course, SETI@Home is a good example: it computes for 12 hours on half a megabyte of input."
So, for projects that fit this model, then they should save money over supercomputers. But few projects fit this model, with the author mentioning web and data processing, data loading, CFD, ie anything that "generates a continuous and voluminous output stream" as economically unfeasible. So, car companies really do need those supercomputers to virtually crash their cars. =)
It's a nice piece of analysis. Someone could have done it 8 years ago when Java came out; the facts are not significantly different (The values are different of course but the ratios involved are pretty similar. I did some thinking along these lines back then, and then in 2000 when considering working for a "hot P2P company" that an old acquaintance of mine was running.)
My thinking went something like this: There are only a few, "niche" applications which need more compute power and which people pay for (distributed rendering, CFD, FEA, maybe a couple others). Maybe you could build that into a 10-30 million dollar business if you overcame a zillion obstacles but it didn't look like a billion or multi-billion dollar business. The applications for which people buy beefy servers, and which have a monetary payback, are mostly database applications. For those, you need to move the entire database near to the number-crunching PC, and that's not really feasible due to the cost of transporting Gigabytes of data or the unlikelihood that the PC's hard disk can store all the giga/terabytes of information potentially relevant for the computation. Not to mention the security problem.
And Jim Gray's analysis lays out in more precise economic terms why it doesn't make sense. I like how he even calculated the relative merits of a Beowulf-like cluster of PCs versus P2P which I never really analyzed (I lumped them together as basically similar.)
That said, has anybody even made a stab at designing or implementing a relational database with a P2P architecture? I know that there's Oracle Cluster Server, but I'm thinking of something more low-end and more distributed.
--LP
We only look at the cost of SETI from our perspective here on earth...but if you ever consider the enormous cost space aliens have to incur to make their secret communications appear as background noise, then I think more people would oppose the project.