Slashdot Mirror


Distributed Computing Economics

machaut writes "In a ClusterComputing.org article, Jim Gray, director of Microsoft's Bay Area Research Lab, provides an interesting economic analysis for building distributed systems. When do you choose a grid over a cluster or a supercomputer? When does it pay off to move a task to the data vs moving the data to the task? He takes current hardware and networking costs into account to answer those questions."

8 of 130 comments (clear)

  1. Interesting yet shallow economic analysis by Raindance · · Score: 5, Informative

    I was happy that Gray covered SETI@Home as I think the nature of SETI is akin to where certain aspects of distributed computing may go in the future. However, I argue that he left some some key parts of SETI economics at the door; most notably, data integrity and security. As I understand it, *over half* of SETI's processing power, bandwidth, and so forth is used to verify data integrity as it's using untrusted hosts to do it's calculations.

    This doesn't make SETI a poor supercomputer, but it does change the economics of it. An economic model of computing resources which accounts for work done by untrusted hosts as involving different overhead as that done by trusted hosts would be a much more useful metric to think in terms of.

  2. Spoiler by 4of12 · · Score: 3, Informative

    .
    .
    .
    Conclusions

    Put the computation near the data.

    My own general take on all this is the Moore's Law for CPU/data costs vs time will beat the decrease in network latency costs vs time and we'll generally expect to see communications protocols become more "intelligent" to compensate up for the this barrier that cannot be overcome. BW will be relatively cheap, but the cost of building up and tearing down a connection will remain high enough to discourage multi-exchange handshaking (ie., UDP model vs TCP model).

    --
    "Provided by the management for your protection."
  3. Re:SETI@home by flabbergast · · Score: 5, Informative

    The author points out: "The ideal mobile task is stateless (needs no database or database access), has a tiny network input and output, and has huge computational demand."

    "And of course, SETI@Home is a good example: it computes for 12 hours on half a megabyte of input."

    So, for projects that fit this model, then they should save money over supercomputers. But few projects fit this model, with the author mentioning web and data processing, data loading, CFD, ie anything that "generates a continuous and voluminous output stream" as economically unfeasible. So, car companies really do need those supercomputers to virtually crash their cars. =)

  4. Why SETI@Home works... by gatkinso · · Score: 2, Informative

    ...because people like me are willing to donate their computers time and a part of their [and their employers, hawhaw] electric bill.

    I do so because I am interested in the project... not because I feel like I want to help cut someone's computing cost. If SAH was a for profit enterprise my interest would quickly evaporate.

    --
    I am very small, utmostly microscopic.
  5. All this talk about SETI... by warriorpostman · · Score: 3, Informative

    ...but, there's other programs that people might find more socially useful/productive than SETI.

    How 'bout...this from United Devices? They do a variety of biologically related projects, the most popular one, as far as I can tell, being cancer research...I've been running it for almost 2 years, and I have 100,000 points...how many points do you have?

  6. Re:Does that include electrical costs? by cicadia · · Score: 2, Informative
    Didja read the article?

    SETI@Home harvested more than a million CPU years worth more than a billion dollars. It sent out a billion jobs of 1/2 MB each. This petabyte of network bandwidth cost about a million dollars. The SETI@Home peers donated a billion dollars of free CPU time and also donated 1012 watt-hours which is about 100M$ of electricity

    No, it doesn't include the value of user-performed maintenance, but as an economic analysis, it would be pretty negligent to not include the value of donated CPU-time and electricity.

    --
    Living better through chemicals
  7. Re:Anybody tried it? by Anonymous Coward · · Score: 1, Informative

    No.

    But in principle, the reason you want to go for a P2P based DBMS is not really scalability, which we can do today with 'shared nothing' clusters (lots of Motherboards + local disk connected by Ethernet, rather than 'shared disk' clusters which are lots of diskless Motherboards connected to a SAN/NAS over the network). Rather, its system availability.

    With shared disk clusters you have a central point of failure/synchronization: the disk (or the disk controller). Today's shared nothing DBMSs all adopt a system model which says that 'if one of us is down we are all down'.

    P2P constitutes an architectural model for having a go at problems like scheduled down-time. ie. Let's not take the system down while we upgrade the software/hardware.

  8. Re:Obligatory conspiracy theory by steve_l · · Score: 3, Informative

    no doubt ... to date the Grid is very java centric. Now maybe .NET could deliver a speedup, but the nice thing about Java is (a) the latest 1.4.2 JREs use the PIII SSE and P4 SSE2 register sets for better float and double performance, and (b) you can put some serious unix servers in the grid for bonus speed.

    One thing Jim ignored is cost of software. Because MS effectively charge per-CPU for their system, you cannot afford to build a beowulf cluster on windows, let alone a full grid. So if MS do want to play in grid space, they need a way to price their platform so it makes economic sense. Didnt see that in the paper.

    (nb, MS do clustering already, it is just focused at DBs and big IIS installations, and it costs big numbers)