Distributed Computing Economics
machaut writes "In a ClusterComputing.org article, Jim Gray, director of Microsoft's Bay Area Research Lab, provides an interesting economic analysis for building distributed systems. When do you choose a grid over a cluster or a supercomputer?
When does it pay off to move a task to the data vs moving the data to the task? He takes current hardware and networking costs into account to answer those questions."
Ungodly numbers of "Beo-Wolf" cluster jokes arriving now!
The preceding post was not a Slashvertisement.
How much does it cost to keep hundreds of regular computers (with all their extra peripherals) crunching away vs. a specially designed computer/set of computers.
When do you choose a grid over a cluster or a supercomputer?
When you have a really high-paying job where you are paid to make such decisions.
.. have already figured it out - let other willing users pay the power bill, bandwidth cost, etc. and crunch the data in their spare time. Seems to be working well for seti@home, etc.
Of course, if you are working with sensitive data (military stuff, major trade secrets, etc.) your security/privacy needs will outweigh the costs involved with doing it all in house.
Don't blame me, I voted for Kodos
Just imagine how much SETI@Home would be out if they had to pay for all of that...
interesting thought, but what is the difference between this and the age old concept of the cost/benefit relationship...? im not trolling, it seems that it is jsut that concept with a tech twist..
xao
http://TheHillforum.hopto.org
Oh I see somebody beat me to the Seti@home first... oh well... off to craking my measly 5 units a day.
I was happy that Gray covered SETI@Home as I think the nature of SETI is akin to where certain aspects of distributed computing may go in the future. However, I argue that he left some some key parts of SETI economics at the door; most notably, data integrity and security. As I understand it, *over half* of SETI's processing power, bandwidth, and so forth is used to verify data integrity as it's using untrusted hosts to do it's calculations.
This doesn't make SETI a poor supercomputer, but it does change the economics of it. An economic model of computing resources which accounts for work done by untrusted hosts as involving different overhead as that done by trusted hosts would be a much more useful metric to think in terms of.
The articles states that SETI@home has a whopping 54 teraflops of computing power. This is an unfathomable number of cpu cycles, and guess what, it is alled used FOR FREE! This is a great example of how a community of users is willing to sacrifice something (unused cpu cycles and small amounts of bandwidth) to meet some great future goal (contact with extraterrestrials). Did I mention it is FREE? I wonder how much money researchers could save or use in a better fashion if they all used distributed computing instead of expensive (super)computers. Some already have done this, and I know that there are distributed efforts currently processing ways to fight cancer and aids.
Canadian Cynic, canadian politics is less boring than you
Greedy overcharging telcos make grid computing over the internet more expensive than traditional supercomputing, unless you can get people to pay for you (SETI).
Beep beep.
All the work must(should) be double checked to make sure everything is correct
Cats: All your base are belong to us.
Captain: Take off every sig !!
Microsoft and IBM tout web services as a new computing model - Internet-scale distributed computing.
.NET message passing. Remember folks, Microsoft never does anything without a reason, and certainly never does anything for the good of anybody else but themselves.
They observe that the current Internet is designed for people interacting with computers. Traffic on the future Internet will be dominated by computer-to-computer interactions.
And that explains why Microsoft has suddenly declared war on spam : they have to free bandwidth for their own
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash
Give us back our "extract to here" option NOW! You can shove your hig up micheal de incasa's ass! Fix split pane in nautlilus too!
/.
Quit yer bitchin' in the wrong place. The correct place would be in a gnome forum, not
If it means that much to you, either learn to program, or pay someone to fix it for you.
to Microsoft Bay Area Research Facility
Conclusions
Put the computation near the data.
My own general take on all this is the Moore's Law for CPU/data costs vs time will beat the decrease in network latency costs vs time and we'll generally expect to see communications protocols become more "intelligent" to compensate up for the this barrier that cannot be overcome. BW will be relatively cheap, but the cost of building up and tearing down a connection will remain high enough to discourage multi-exchange handshaking (ie., UDP model vs TCP model).
"Provided by the management for your protection."
The last time I read an article from the Microsoft Research guys was in Communications of the ACM. The article was about media center computers (in the article, named Mbox) and digital consumer product consolidation/standardization. Of course there was no mention of Apple and just a brief acknowledgement of TiVo.
As a strange coincidence, HP and others announced their media center PCs shortly afterwards, followed by Microsoft releasing XBox Live.
Now the same Microsoft researcher is talking about grids and clusters? Expect a Microsoft cluster package soon...
Hey, I enjoyed that. Bash it if you will, but the breakdown analysis of the costs was quite interesting. It wasn't MS-fanboy propaganda.
From the article :
...
1 Mbps WAN link $100/month
From this we conclude that one dollar equates to:
$1=
1 GB sent over the WAN
Oh yeah ? 1 Mpbs for a month == 2678400Gb per month == 334800GB per month. 334800GB / 100 == 3.348TB
From this I conclude that one dollar equates to 3.348 terabytes sent over the WAN.
Gosh that was scary. I can restart xMule now
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash
Just last year we were discussing data transfer over the time it would take to overnight some data in a package, worked out that it was faster and wouldn't clog up our line to burn the DVDs and send them through an international package service vs send it over the T1s. I think with all but the largest businesses this is probably still true for larger (Gigabytes) amounts of data. Network costs are too high to be putting data far from where it is to be used. Whether CEOs realize it or not, this has a great effect on the ways businesses with multiple locations structure their company and work together.
Yes, there are plenty of systems available for distributed computing tasks.
Yes, there are plenty of free CPU cycles.
But bear in mind that the rate of growth of entrants to the Internet is not growing exponentially as it once was; if anything, current trends are flattening.
Whereas, if both science and industry fully embrace this mode of problem solving in the next few years, one has to wonder how many aps will it take to render it ineffectual? (1,000? 10,000?) or will we be able to go to this well forever?
Wow, what a world. $1 will now buy:
1 GB sent over the WAN
10 Tops tera-CPU instructions
8 hours of cpu time
1 GB disk space
10 M database accesses
10 TB of disk bandwidth
1 large beverage
1 of everything in the $1 store
1 unlimited phonecall from some 10-10-### phone company.
5 packets of cool aid
10 packets of generic cool aid
2 cans of coke
When I was a child, data was expensive, and food was cheap...
This is an old maxim of design of any multi-tiered system. The reason is this: computation is largely about selecting and filtering data, before sending the results on to further tiers. This selection and filtering process requires many times more bandwidth towards the data source than it does towards the client layers.
This only stops being true when there is no significant data, i.e. when the computation creates the data, as in the author's examples of render farms.
Ceci n'est pas une signature
You are so funny!!!!!! hahahahha!
when when will this affect you in some way? No harm, no foul.
OMG! Wau!
It's a nice piece of analysis. Someone could have done it 8 years ago when Java came out; the facts are not significantly different (The values are different of course but the ratios involved are pretty similar. I did some thinking along these lines back then, and then in 2000 when considering working for a "hot P2P company" that an old acquaintance of mine was running.)
My thinking went something like this: There are only a few, "niche" applications which need more compute power and which people pay for (distributed rendering, CFD, FEA, maybe a couple others). Maybe you could build that into a 10-30 million dollar business if you overcame a zillion obstacles but it didn't look like a billion or multi-billion dollar business. The applications for which people buy beefy servers, and which have a monetary payback, are mostly database applications. For those, you need to move the entire database near to the number-crunching PC, and that's not really feasible due to the cost of transporting Gigabytes of data or the unlikelihood that the PC's hard disk can store all the giga/terabytes of information potentially relevant for the computation. Not to mention the security problem.
And Jim Gray's analysis lays out in more precise economic terms why it doesn't make sense. I like how he even calculated the relative merits of a Beowulf-like cluster of PCs versus P2P which I never really analyzed (I lumped them together as basically similar.)
That said, has anybody even made a stab at designing or implementing a relational database with a P2P architecture? I know that there's Oracle Cluster Server, but I'm thinking of something more low-end and more distributed.
--LP
Of course you would find it an okay joke, you're too busy sucking off Taco to know what good humor is. Buy a clue, slashbot.
The recurrent theme of this analysis is that "On Demand" computing is only economical for very CPU-intensive (100,000 instructions per byte or a CPU-day per gigabyte of network traffic) applications.
This must be considered an attack on IBM's fairly visible On-Demand Computing campaign.
Beowulf clusters have completely different networking economics. [...] That is why rendering farms and BLAST search engines are routinely built using Beowulf clusters.
This reminds me of those Microsoft-funded TCO reports. They concede that Linux has cost advantages in a very specific field (webhosting; Beowulf clusters), because anyone intuitively knows it's true. For all the rest: use Microsoft stuff. That's what they are saying.
-------
Warning: Slashdot may contain traces of nuts.
disstricked.
well, it's definitely going to cost a lot more to use the infactdead payper liesense bugwear distributed buy the felonious kingdumb of Godless stock markup FraUDs, than to do it right, but that's how you learn.
I'm sorry if I lessened the impact of your Beo-Wolf cluster joke by posting mine. I really am.
Hey everybody, I know of an A/C that needs a group hug!
The preceding post was not a Slashvertisement.
A few years back when Grid computing was all the rage we sat down with some investment partners and worked out the figures. We came pretty much to the same conclusion. The "average" commercial supercomputing application (pharma, oil drilling, simulation) would not benefit from "free" cycles on the network.
Essentially, any commercial computation valuable enough to require that amount of effort can justify purchasing a hundred thousand node beowulf cluster and run locally. The reduction in network costs, the advantages of total control and tight security more than pay for the difference in computing cost.
Non-commercial computations such as SETI will benefit from grid computing, and we expect to see more efforts long those lines (RSA, Mersenne, Stanford DNA). But remember, we were thinking about starting a business, and none of those pay for the services, so we moved on.
When SETI@Home spent $10^6 to get everyone to spend $10^8 on electricity alone, how was that a good deal? Have extraterrestrials sent a message that they're about to touch down with a vaccine for AIDS, a formula for cold fusion, a permanent end to unemployment, a sure-fire way to get good representation in government? Could we have spent the money more wisely, Jim?
If Bill paid you folks to do something more than get technically-challenged investors excited, perhaps our software would work better. (And ASN.1 isn't that bad, by the way. Do I need everything going between my machine and the server to be verbose enough to read by hand? When I encode all my messages in XML, in how many cases will I miss the 10000 to 1 ratio just because the encoding is verbose?)
We only look at the cost of SETI from our perspective here on earth...but if you ever consider the enormous cost space aliens have to incur to make their secret communications appear as background noise, then I think more people would oppose the project.
EWe mushed bee ae vharry strAYng pers-on
when trolling always leave the main word misspelled. It helps to bring in comments.
His reasoning sounds good, but what the hey? It sounds pretty obvious that the most cost effective approach is to keep the data close to the CPU doing the crunching.
-- Many men would appreciate a woman's mind more if they could fondle it
A caveman dreams of being us, the incalculable power and riches. We dream of being Q, then what?
Total Cost of Ownership (TCO) is more than a trillion dollars per year.
Operations costs far exceed capital costs. Hardware and software are minor parts of the total cost of ownership.
-- microsoft software is cheap so you should keep buying it. Even if administering it is expensive.
Megaservices like Yahoo!, Google,et all have relatively low operations staff costs.
-- Open source if managed properly doesn't need many people. But this formula can't be applied to the propreitary software shit you buy.
Most applications do not benefit from megaservice economies of scale
--Most microsoft products. We will still take our chunk of flesh no matter what.
Outsourcing has often proved to be a shell game - moving costs from one place to another.
--having a third party vendor manage your Microsoft software for you isn't going to save you much.
Web services reduce the costs of publishing and receiving information.
--But you will need a huge support staff to manage it plus lots of licenses. (see above)
Most Web and data processing applications are network or state intensive and are not economically viable as mobile applications.
--especially once the MS licensing is thrown in.
...because people like me are willing to donate their computers time and a part of their [and their employers, hawhaw] electric bill.
I do so because I am interested in the project... not because I feel like I want to help cut someone's computing cost. If SAH was a for profit enterprise my interest would quickly evaporate.
I am very small, utmostly microscopic.
If it can't be done on an abacus by an infinite number of monkeys, then it can't be done. It is the simple monkey principal, similar to the duct tape principal.
I think it's probbably safer to say that seti@home has a huge surplus of computational power, and uses it to verify each result (though it's not strictly necessary to do so). With only one data source (Aerecibo) you can only produce data so quickly, and once you have enough computational power to do the analysis in real time any extra is just surplus that can be used to verify. They did, however later add some extra analysis to the data to take better advantage of the huge surplus of computing power they have.
The important point though, is that for seti@home each individual workunit, while important isn't critical to the whole project. If a small percentage of workunits aren't computed perfectly it's not catastrophic. In other words there's a certain amount of tolerance for innacuracy. For a project like the OGR (Optimum Golomb Ruler) by distributed.net each workunit must be calculated perfectly, as the goal is to prove which ruler is the optimum one. If workunit isn't verified you haven't really proven anything, since it's possible (and probbably likely) that hardware failure produced an innaccurate result somewhere in the millions of workunits calculated. (Or perhaps a modified client produced innacurate results). Other distributed computing tasks have different amounts of tolerance for innacurate results.
Your underlying point is a good one though. For some projects the need for integrity of the results is very high, so larger computing power may be necessary to verify each result.
AccountKiller
n/t
Black holes are where the Matrix raised SIGFPE
He does it for J.D. "Boss" Hogg!
Is to put crazy shit in the title like you are a 11 year old boy!!!
Black holes are where the Matrix raised SIGFPE
...and SkyNet became self-aware (again) at midnight July 3, 2003...
...but, there's other programs that people might find more socially useful/productive than SETI.
How 'bout...this from United Devices? They do a variety of biologically related projects, the most popular one, as far as I can tell, being cancer research...I've been running it for almost 2 years, and I have 100,000 points...how many points do you have?
And what OS where they using?
FRA: STFU GTFO
Clearly, this is an adaptation of a powerpoint presentation by someone who thinks in powerpoint. Why is the first sentence of every paragraph bold ? Why don't any of the paragraphs lead from one to another ? I think it's because each bold sentence used to be a bullet on a powerpoint presentation, and the paragraph was adapted from his notes on the presentation.
o cal_rates.jsp )
It reads like a bad sales pitch for a sucky mutual fund.
The only part that caught my eye:
"If telecom prices drop faster than Moore's law, the analysis fails. If telecom prices drop slower than Moore's law, the analysis becomes stronger. Most of the argument in this paper pivots on the relatively high price of telecommunications. Over the last 40 years telecom prices have fallen much more slowly than any other information technology. If this situation changed, it could completely alter the arguments here. But there is no obvious sign of that occurring."
Everyone knows that telecom == fraud. If Grande Communications can offer telephone land line for $8.50 a month, even as they are laying all new infrastructure, then how the hell does SBC and Verizon get away with charging $40 a month for an infrastructure that is already there and has been paid of decades ago ? ( http://www.grandecom.com/ProductsServices/phone_l
It is possible that Jim Gray's suggestion that there is "no obvious sign" of a telecommunications price collapse is too pessimistic. With a tight economy and slow but sure growth in consumer choice, the telecommunications giants could still collapse. It would be nice to see SBC completely implode, and those 300,000 parasites it employs be replaces with dozen companies of a few hundred employees each. The efficiency the economy would gain by not paying Ed Whitacre 90 million dollars a year, and savings of several hundred dollars a year by every household, would offset the burst of unemployed paper pushers.
Massive grid computing currently isn't economical for crunching the daily workload of insurance claims... not that it ever occurred to any of us that it would be, but it's nice to hear it from the world's leading expert on TP.
What if it became a self-employment option?
In other words, my home office, instead of being a revenue-sucking hole filled with computers was instead a source of distributed computing power I could sell on an ad-hoc basis? I eat the tab for the upkeep and get paid cash per work unit I'm able to get done.
This is a good analysis of the hardware side of the cost/benefit analysis of distributed computing, but that's nowhere near the full story.
For example, per the thesis of this article (that network communication is the largest expense of distributed computing), the Salesforce.com model isn't valid. Yet they're a great success story of computational outsourcing. Huh?
The key, I think, is that outsourcing eliminates distractions, and gets your employees back to working at your company's core competency. You're good at selling widgets, so you let Salesforce.com be your HR department.
This sort of analysis is useful for hardware, and could also be applied to meatware resources. It's the same essence of the TCO argument from Microsoft which, I think, does have validity.
When you bill your time at $50 an hour, you'll gladly pay $300 for an operating system rather than use a free one that takes 4 hours to install, 10 hours to learn command line computerese you never wanted to know anyway and many hours dealing with conversion and compatibility hassles.
It would be better if the post also mention that Jim Gray is a Turing award owner.
E.g. If you own a computer, you are most likely a person. People get old and die. You should do medical distributed computing projects since you might benefit from the findings. Looking for aliens is cool and all, but not much in the way of practical application (unless they are beaming encoded drawings of really cool devices).
On the third hand, if you are a stranded "traveler" (not the Irish kind)... Maybe puting out a "beam me up" signal will be of use.
This issue is a bit more complicated than you think.
Tierce
Tierce
Who sponsors your feelings?
When you've been giving lots of funding to do a grid because it's a neat buzzword.
When you're tired of getting lots of real work done and would rather spend all of your time debugging the dozen layer system, none of which do useful error reporting.
When you've decided that dealing with multiple sites, each with different requirements to get a login and different requirements for accepted certificate authorities, is fun.
When you've decided that you love working with alpha quality software that is unsupported because the middleware providers are in the middle of their sixth major rewrite in three years "which will fix all of the problems," but will probably just reintroduce old bugs that you debugged for them and sent them patches for.
When you have too much time on your hands, so you want to reimplement a front end, because the middleware providers won't provide it any no one else provides a useful front end. Of course, said front end needs to provide resource allocation and reliability, because the middleware people decided to just hand that off to end users.
When you've got a horde of bored sysadmins with nothing better to do that clean up the "droppings" left behind on disk by the middleware, which regularlly leaves behind unused files but is completely unable to tell you which files are unused.
I'm working on multiple grid projects, some of which have been running for several years at this point. "The Grid" is a filthy lie. It only works if you control all of the machines, and if you control all of the machines, just set up a cluster, you'll be happier.
In general, office parks, libraries, class rooms, and computer labs have a lot of idle time that could be exploited. Generally in cases like these the power is on anyway during certain hours and maintenance is covered anyway, so distributed computing might be used for many things.
Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.
This pidgeon-holing of distributed computing (ie. Grid) technologies is flawed thinking - and it would be a worry if IBM and Microsoft have built their "the future is web-services" strategies solely on this type of analysis. What the analysis forgets is that you have to get the data INTO the database before you can compute with it. For many processing tasks, it makes more sense to send the bulk of processing to the data rather than the other way round. The barriers to achieving this aren't in the economies, they're in the architecture of current databases. If you extend this thinking to data in a more general sense you'd have to conclude that file sharing networks aren't efficient ... and neither is the world wide web.
The game console sales are way higher than the online percentage of consol owners. If the MS gamers .net heaven vision does happen without a corresponding drop in spam traffic the good old /. effect will be meaningless. No one except perhaps Taco will get on.
OH THE SHAME I fell off the wagon and use sigs again!