Slashdot Mirror


Introduction to Distributed Computing

dosten writes "ExtremeTech has a nice intro article on distributed and grid computing." Someday someone will successfully implement something like Progeny's NOW and all of these assorted hacks at building a distributed computing system will be superseded.

23 of 95 comments (clear)

  1. We could spend millions to do this.. by btellier · · Score: 3, Funny

    Or we could just spend 8 hours finding a buffer overflow in Brilliant's Distributed Kazaa software and do it that way.

  2. Distributed... by Renraku · · Score: 3, Interesting

    Distributed computing is actually a pretty simple idea to come up with, seeing as how a lot of things are 'distributed' such as manufacturing, selling products, etc. The thing that makes distributed computing attractive is the speed of data and the unused potential of your average computer. It would be nice to see a company that needed a lot of data processed, and paid people for every data pack they processed and completed. Rules would have to be set up to prevent abuse, but it would be a nice system. Everyone wins.

    --
    Job? I don't have time to get a job! Who will sit around and bitch about being broke and unemployed then?
    1. Re:Distributed... by SamHill · · Score: 2

      Except for the ``paying people'' part, United Devices does just that.

      The downside of distributed computing is figuring out how to split a given problem into pieces that can be processed separately. Not all problems can be split up, and for those that can be split, figuring out the best way to do so isn't always trivial.

    2. Re:Distributed... by Krapangor · · Score: 2, Insightful

      While the principle is simple, the idea itself is massively overrated these days. It's not that distributed computing is exactly a new idea. Parallel massively machines are around for decades. And distributed computing is just using computers on a (large scale) network as a massively parallel machine. But history has already shown that many problems can't be solved by parallel computations therefore limiting the power of distributed computing. The only new benefit is that you don't need spend $$$ on cray systems etc. Just buy some processing power in a grid. However there won't be as much customers as you would expect.
      This stuff is just overhyped by some companies which think that they can make the big buck.

      --
      Owner of a Mensa membership card.
  3. The whole article at once by Spackler · · Score: 5, Informative

    The whole thing

    Rather than a popup ad per page.

  4. check this out by emir · · Score: 4, Informative

    if you are interessted in distributed computing over internet check out this url: http://www.aspenleaf.com/distributed/.

    there is short description of all distributed computing projects plus lots of other stuff.

    --
    -- http://electronicintifada.net --
  5. My university just got a grant to do grid comp. by paulydavis · · Score: 2, Interesting

    My University just got a 395,000 dollar grant from the NSF. for more info : http://inside.binghamton.edu/March-April/4apr02/gr id.html">

  6. Try a non-linux distributed protocol... by Frobnicator · · Score: 3, Informative

    ... like the dogma project at Brigham Young University is a distributed application system currently on used on a few thousand machines. It is written in pure Java, requires no persistant storage on the local machine, can be interrupted at any time, and is OS independant, to name a few things.

    --
    //TODO: Think of witty sig statement
  7. Fly in the ointment by Eric+Damron · · Score: 2

    I guess I really like the idea of distributed computing. In a world where everyone works together with common goals we would be able to achieve almost anything. The flies in the ointment, however, are the few individuals who would get their rocks off by ruining it for everyone else, the same type of people who write virii.

    Another networking subject that really interests me is wireless networking. I think that someday in the not too distant future we will see neighborhood networks forming and then a linking of various neighborhood networks to form a new kind of "internet." One that is absolutely not controlled by any group.

    --
    The race isn't always to the swift... but that's the way to bet!
    1. Re:Fly in the ointment by rtaylor · · Score: 2

      Sounds like communism in Russia.

      They really could accomplish nearly anything. Problem was the 'details' of everyday life were missed out on.

      --
      Rod Taylor
  8. Ya know ... by TheViffer · · Score: 2

    cs clan [tgk] 0wnz0red dis post!

    with a large scale distributed system, using the distributed translation project things like this may in the future look like this.

    "My buddies and I are wimps so we pretend to be big shots online. So therefore we have created a small group called cs group. Online we are also seen as [tgk] to signify our uniqueness from you. We (being cs group) would like to point out the fact that we know a lot on the topic of distributed systems and would like to tell you our thoughts. We know all our posts will get 5's"

    --
    -- Knowing too much can get you killed, but knowing who knows too much can make you rich.
  9. Re:Imagine.... by keesh · · Score: 2

    Some of us have enough karma and spare accounts that the occasional hit doesn't matter.

  10. The problem with distributed computing... by asparagus · · Score: 2, Insightful

    Is that for most intents and purposes, processor cycles are free.

    If a company/organization has an *actual* need for processor cycles (say genome research), it's cheaper to buy 1000 boxes and admin the stuff in-house. Even when ignoring issues such as sending valuable company data to thousands of internet users, most applications that require large compuation also require large amounts of bandwidth, generally provided over a LAN.

    This is why you'll never get to render a frame for Toy Story 5: Pixar will need to send you 5GB of data just to get back a 2k image.

    Once you consider the costs of admining a network, writing/distributing your code, against having a tangible financial benefit from the results, few companies will have a reason to turn to outsiders for a few minutes on their machines.

    1. Re:The problem with distributed computing... by asparagus · · Score: 2

      Yes, but neither Seti@home nor d.net are making any money. They're largely research projects.

      The companies looking to get into this are hoping to make money. I'm saying that's a bad business plan.

      And yes, Pixar already passes about that much data. Large scenes/complicated renders can even go higher per-frame.

    2. Re:The problem with distributed computing... by Greg+Lindahl · · Score: 2


      Sorry, that's a bad example. Pixar's existing compute farm doesn't need much networking.

    3. Re:The problem with distributed computing... by Rajesh+Raman · · Score: 4, Informative

      You're missing the point. Distributed computing is not about only running on machines that aren't yours, but also efficiently utilizing the machines that are yours (or at least have easy access to).

      Consider that a University of Wisconsin study showed that, on average, computers on desktops are idle at least 60% of the time. And that doesn't count the cycles burned lost between keystrokes --- I'm talking about extended periods of time. For example, almost all desktop machines are idle during nights. That's 50% already. Now add lunch time. Meetings, etc.

      That's when systems like Condor come in. Researchers at Wisconsin got hundreds of years of CPU time on machines they already had without impacting others.

      Coming back to your argument, the counter argument is that you may not even need to buy additional boxes --- just use the ones you already have more efficiently by utilizing distributed computing systems.

      As far as "freeness" of processor cycles, let me tell you that the optimization researchers can soak up as much cpu as you can possibly throw at them. Also, if you look up Particle Physics Data Grid (PPDG) and GriPhyn, you'll find out that many distributed computing problems are I/O driven.

      ++Rajesh

    4. Re:The problem with distributed computing... by Fizzlewhiff · · Score: 2

      on average, computers on desktops are idle at least 60% of the time

      Many of us need that 60% idle time to keep our CPU's running at a reasonable temperature. I have my CPU and case cooling under control but now I think I need to put muliple A/C zones in my house thanks to distributed.net. :)

      --

      'Same speed C but faster'
    5. Re:The problem with distributed computing... by asparagus · · Score: 2

      I've got no problem with research projects that use distributed computing. I myself run d.net and have thrown cycles to Seti@home and Genome@home. It's a great way to pick up free cpu cycles cheaply, if you've got the time.

      However, there's half a dozen companies now that think they're going to make money off people using these programs for large projects.

      The reality of the matter is, if d.net had to support itself financially, it'd get rid of it the internet users and stick to in-house boxes.

      I'm not dissing distributed computing: it has its benefits. But it will probabally always be limited to research/educational projects.

      My point is that if I'm a CGI guy who needs cpu cycles today, it's cheaper to buy them myself then to farm them out to a third party. So long as Moore's law holds up, this will remain true. There's a study on this I can't find right now.

    6. Re:The problem with distributed computing... by Zeinfeld · · Score: 2
      Sorry, that's a bad example. Pixar's existing compute farm doesn't need much networking.

      But it sure needs confidentiality, both of the rendering code itself and the data it is working on. Otherwise we will all see random frames from every Pixar movie in advance.

      Plus the rendering code is quite likely huge and has a lot of dependencies on proprietary codebases. I doubt the stuff would run well on Direct-X.

      The liquid metal effect in Terminator cost a million or so to develop and sold for that the first time after which it was quickly copied so that no you can get it in a movie for a few $10K.

      The idea of using the internet to do distributed computing is as old as the net itself. We were building SETI type configurations back in the mid 80s, as soon as the price performance of the workstation rendered mainframes obsolete.

      Believe it, if Pixar need more compute cycles they will go to Dell and buy a room full of cheapo machines. It will cost much less to manage than scraping processing time up from arround the net.

      --
      Looking for an Information Security student project suggestion?
      Try http://dotcrimeManifesto.com/
  11. Notes and comments by pridkett · · Score: 3, Informative

    First of all, be sure to check out the links at the end of the article to some of the projects that are going on right now. Some of the ones that I find more interesting are the Particle Physics Data Grid and the Access Grid (no link in article).

    One of the great benefits of Grid computing over distributed computing is the access to resources, such as storage. This is what PPDG seeks to do, provide access to physicists, in near real time, to the results of experiments. The problem is that the experiments may be performed at CERN and the researcher may be at CalTech. While normally for a telnet or what not, this isn't a problem, it is a problem when an experiment can produce Petabytes of data. For more information on that see http://www.ppdg.org. There is another project called NEESGrid that will provide access to earthquake simulation equipment remotely. Truly cool.

    I also encourage you to check out Globus. Using a system like the Globus Toolkit along with MDS, I can locate a machine and execute my program on it transparently. This transparency is taken care through a network of resource managers, proxys and gatekeepers. It's pretty cool and is pretty easy to install on your favorite Linux box.

    Programming Grid enabled applications is pretty easy. There are software libraries called CoG Kits that provide simple APIs for Java, Python and a few other languages. In just a few lines of code you can have a program that looks up a server to run your executable on, connects, executes and returns the data to you.

    The current push right now is towards OGSA which is Open Grid Services Architecture. This will form the basis for Globus 3.0. OGSA will take ideas from web services, like WSDL, service advertisement, etc, and implement them to create Grid services. This will be the next thing with services easily able to advertise themselves and clients easily able to find services.

    --
    My Slashdot account is old enough to drink...
  12. ignore the speeds and feeds by xtp · · Score: 3, Informative

    These projects when described in the lay press nearly always skip over any analysis of the kinds of algorithms that can work well on a distributed system. The first metric to look at is the ratio of communication to computation. That is, how many bytes of data does a compute exchange with neighbor(s) before continuing with the next step of computation.

    Render farms are embarrasingly parallel requiring no communication with neighbors while rendering a frame. They do require a large amount of data before starting on the next frame, but you can either pipeline that (which they don't do usually) or double up on the number of compute nodes (which is more common).

    Suppose instead you want to solve a big mesh problem like a 3D cube with 10^10 points on a side. And its a fairly simple computation. You might need 10^5 or 10^6 nodes and the data traffic between nodes would look like a DOS attack if it took place on the internet.

    And then there is the rich space of possibilities between these two extremes and the crossproduct with storage. It is a fascinating area to work in because there is much yet to learn and the possibilities for new networks and processors and storage evolves all the time. Things that were impossible to do last year are within reach this year or next year.

    But.... just as 100 Volkswagon beetles may have the same horsepower as a huge earthmoving machine, the beetles cannot readily move mountains... and 100 or 1000 or 10000 PCs with a low-cost interconnect are not equal to a supercomputer or a supercluster that may support 10^6 greater communcations to computation ratio - and thus a much greater range of useful distribution algorithms.

  13. More a more technical introduction... by BillGodfrey · · Score: 2
    Have a read of my guide, it's at http://www.bacchae.co.uk/docs/dist.html

    This one covers issues such as parasite attacks, spoiler attacks, etc.

    Slashdot rejected my guide when I submitted it. Whine whine gripe gripe.

  14. TNC by jc42 · · Score: 2

    This shows a profound lack of knowledge of the Computing literature. Back in 1982 (December issue IIRC), there was an article published describing The Newcastle Connection. This was a fully-distributed unix system built on exactly the same model. It was a unix system that incorporated other systems as components, treating the network as a bus. The result was a large multi-processor unix system.

    They weren't nearly the last ones to announce that they had done such a thing. For a while, in the mid-80's, it was somewhat of an inside joke. It seemed that everyone was making their own distributed unix system using the same design.

    I built one myself, and so did a fellow down the hall from me (at Project Athena at MIT). We both spent about a month of our spare time on it, and both of ours worked. One of my demos consisted of a Makefile with source scattered across as many machines as I could get accounts on. I showed that, despite the fact that the clocks on some machines were off by hours or days, my code correctly adjusted for clock skews and compiled the right things. I didn't need to modify make or the compiler, I just linked them to my libcnet.a, which replaced all the system calls with my distributed routines, and they corrected for the clock problems.

    The problem isn't the difficulty in building a truly distributed system. Any competent software engineer should be able to do that. The problem is that the commercial world has no interest in selling such a thing, and the non-commercial world remains ignorant of things like this that were demoed several decades ago.

    One of the true frustrations from having built such a system is having to work with things like NFS, that still can't get its clocks right (at least not without requiring super-user permissions on every subsystem). When I decided to solve this problem so that make would work, it took me a morning, and I didn't use super-user permissions anywhere.

    BTW, the Newcastle system was used internally in a number of corporations. But the many attempts to make it more widespread just hit brick walls. So now we have the kludgery of HTTP and URLs rather than the simple, elegant schemes that the various distributed-system people have used.

    --
    Those who do study history are doomed to stand helplessly by while everyone else repeats it.