Introduction to Distributed Computing
dosten writes "ExtremeTech has a nice intro article on distributed and grid computing." Someday someone will successfully implement something like Progeny's NOW and all of these assorted hacks at building a distributed computing system will be superseded.
Or we could just spend 8 hours finding a buffer overflow in Brilliant's Distributed Kazaa software and do it that way.
Distributed computing is actually a pretty simple idea to come up with, seeing as how a lot of things are 'distributed' such as manufacturing, selling products, etc. The thing that makes distributed computing attractive is the speed of data and the unused potential of your average computer. It would be nice to see a company that needed a lot of data processed, and paid people for every data pack they processed and completed. Rules would have to be set up to prevent abuse, but it would be a nice system. Everyone wins.
Job? I don't have time to get a job! Who will sit around and bitch about being broke and unemployed then?
The whole thing
Rather than a popup ad per page.
if you are interessted in distributed computing over internet check out this url: http://www.aspenleaf.com/distributed/.
there is short description of all distributed computing projects plus lots of other stuff.
-- http://electronicintifada.net --
My University just got a 395,000 dollar grant from the NSF. for more info : http://inside.binghamton.edu/March-April/4apr02/gr id.html">
... like the dogma project at Brigham Young University is a distributed application system currently on used on a few thousand machines. It is written in pure Java, requires no persistant storage on the local machine, can be interrupted at any time, and is OS independant, to name a few things.
//TODO: Think of witty sig statement
I guess I really like the idea of distributed computing. In a world where everyone works together with common goals we would be able to achieve almost anything. The flies in the ointment, however, are the few individuals who would get their rocks off by ruining it for everyone else, the same type of people who write virii.
Another networking subject that really interests me is wireless networking. I think that someday in the not too distant future we will see neighborhood networks forming and then a linking of various neighborhood networks to form a new kind of "internet." One that is absolutely not controlled by any group.
The race isn't always to the swift... but that's the way to bet!
cs clan [tgk] 0wnz0red dis post!
with a large scale distributed system, using the distributed translation project things like this may in the future look like this.
"My buddies and I are wimps so we pretend to be big shots online. So therefore we have created a small group called cs group. Online we are also seen as [tgk] to signify our uniqueness from you. We (being cs group) would like to point out the fact that we know a lot on the topic of distributed systems and would like to tell you our thoughts. We know all our posts will get 5's"
-- Knowing too much can get you killed, but knowing who knows too much can make you rich.
Some of us have enough karma and spare accounts that the occasional hit doesn't matter.
Is that for most intents and purposes, processor cycles are free.
If a company/organization has an *actual* need for processor cycles (say genome research), it's cheaper to buy 1000 boxes and admin the stuff in-house. Even when ignoring issues such as sending valuable company data to thousands of internet users, most applications that require large compuation also require large amounts of bandwidth, generally provided over a LAN.
This is why you'll never get to render a frame for Toy Story 5: Pixar will need to send you 5GB of data just to get back a 2k image.
Once you consider the costs of admining a network, writing/distributing your code, against having a tangible financial benefit from the results, few companies will have a reason to turn to outsiders for a few minutes on their machines.
First of all, be sure to check out the links at the end of the article to some of the projects that are going on right now. Some of the ones that I find more interesting are the Particle Physics Data Grid and the Access Grid (no link in article).
One of the great benefits of Grid computing over distributed computing is the access to resources, such as storage. This is what PPDG seeks to do, provide access to physicists, in near real time, to the results of experiments. The problem is that the experiments may be performed at CERN and the researcher may be at CalTech. While normally for a telnet or what not, this isn't a problem, it is a problem when an experiment can produce Petabytes of data. For more information on that see http://www.ppdg.org. There is another project called NEESGrid that will provide access to earthquake simulation equipment remotely. Truly cool.
I also encourage you to check out Globus. Using a system like the Globus Toolkit along with MDS, I can locate a machine and execute my program on it transparently. This transparency is taken care through a network of resource managers, proxys and gatekeepers. It's pretty cool and is pretty easy to install on your favorite Linux box.
Programming Grid enabled applications is pretty easy. There are software libraries called CoG Kits that provide simple APIs for Java, Python and a few other languages. In just a few lines of code you can have a program that looks up a server to run your executable on, connects, executes and returns the data to you.
The current push right now is towards OGSA which is Open Grid Services Architecture. This will form the basis for Globus 3.0. OGSA will take ideas from web services, like WSDL, service advertisement, etc, and implement them to create Grid services. This will be the next thing with services easily able to advertise themselves and clients easily able to find services.
My Slashdot account is old enough to drink...
These projects when described in the lay press nearly always skip over any analysis of the kinds of algorithms that can work well on a distributed system. The first metric to look at is the ratio of communication to computation. That is, how many bytes of data does a compute exchange with neighbor(s) before continuing with the next step of computation.
Render farms are embarrasingly parallel requiring no communication with neighbors while rendering a frame. They do require a large amount of data before starting on the next frame, but you can either pipeline that (which they don't do usually) or double up on the number of compute nodes (which is more common).
Suppose instead you want to solve a big mesh problem like a 3D cube with 10^10 points on a side. And its a fairly simple computation. You might need 10^5 or 10^6 nodes and the data traffic between nodes would look like a DOS attack if it took place on the internet.
And then there is the rich space of possibilities between these two extremes and the crossproduct with storage. It is a fascinating area to work in because there is much yet to learn and the possibilities for new networks and processors and storage evolves all the time. Things that were impossible to do last year are within reach this year or next year.
But.... just as 100 Volkswagon beetles may have the same horsepower as a huge earthmoving machine, the beetles cannot readily move mountains... and 100 or 1000 or 10000 PCs with a low-cost interconnect are not equal to a supercomputer or a supercluster that may support 10^6 greater communcations to computation ratio - and thus a much greater range of useful distribution algorithms.
This one covers issues such as parasite attacks, spoiler attacks, etc.
Slashdot rejected my guide when I submitted it. Whine whine gripe gripe.
This shows a profound lack of knowledge of the Computing literature. Back in 1982 (December issue IIRC), there was an article published describing The Newcastle Connection. This was a fully-distributed unix system built on exactly the same model. It was a unix system that incorporated other systems as components, treating the network as a bus. The result was a large multi-processor unix system.
They weren't nearly the last ones to announce that they had done such a thing. For a while, in the mid-80's, it was somewhat of an inside joke. It seemed that everyone was making their own distributed unix system using the same design.
I built one myself, and so did a fellow down the hall from me (at Project Athena at MIT). We both spent about a month of our spare time on it, and both of ours worked. One of my demos consisted of a Makefile with source scattered across as many machines as I could get accounts on. I showed that, despite the fact that the clocks on some machines were off by hours or days, my code correctly adjusted for clock skews and compiled the right things. I didn't need to modify make or the compiler, I just linked them to my libcnet.a, which replaced all the system calls with my distributed routines, and they corrected for the clock problems.
The problem isn't the difficulty in building a truly distributed system. Any competent software engineer should be able to do that. The problem is that the commercial world has no interest in selling such a thing, and the non-commercial world remains ignorant of things like this that were demoed several decades ago.
One of the true frustrations from having built such a system is having to work with things like NFS, that still can't get its clocks right (at least not without requiring super-user permissions on every subsystem). When I decided to solve this problem so that make would work, it took me a morning, and I didn't use super-user permissions anywhere.
BTW, the Newcastle system was used internally in a number of corporations. But the many attempts to make it more widespread just hit brick walls. So now we have the kludgery of HTTP and URLs rather than the simple, elegant schemes that the various distributed-system people have used.
Those who do study history are doomed to stand helplessly by while everyone else repeats it.