All-Purpose Distributed Computing

← Back to Stories (view on slashdot.org)

All-Purpose Distributed Computing

Posted by Roblimo on Monday August 9, 1999 @01:05AM from the today's-leading-edge-is-tomorrow's-everyday-tool dept.

Markee writes "Wouldn't it be great if any program you write could be executed automatically in a parallel, heterogenious environment? The TSIA model (Task System and Architecture) suggests a way of programming in which any given program is divided into so-called tasks. An underlying operating system extension (similar to a scheduler) could dispatch these tasks across processors and systems, providing for inherently parallel, fail-safe execution in a heterogenious environment. Actually, the TSIA model is a generaliziation of what is already done in existing distributed computing environments like SETI@home. An implementation of this model would need only minor syntax extensions to given programming languages and a relatively modest OS extension for scheduling and dispatching. The TSIA web site looks primitive and features too many buzzwords, but nevertheless the documents are worth taking a look at. "

8 of 150 comments (clear)

Min score:

Reason:

Sort:

Re:This seems a little dimwitted to me by jd · 1999-08-08 21:03 · Score: 2

Actually, no. There -are- parallelising compilers, developed by the University of Manchester. Such things exist, although they are horribly inefficient.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Re:Automagic Parallel Programming by EngrBohn · 1999-08-08 21:26 · Score: 2

What you're describing is still explicitly identifiying available parallelization. That it's part of the language doesn't alter the fact that you're telling the compiler where it can parallelize the code.
Christopher A. Bohn

--
cb
Oooh! What does this button do!?
Not just a simple recompile by anticypher · 1999-08-08 21:33 · Score: 2

I've been seeing this promise for years. There are a number of stumbling blocks...

The problem has to lend itself to parallel computation. This means most simple inline code with simple branches doesn't take advantage of parallelism. So the program has to be written to break the problem into small chunks which can be processed asynchronously in parallel. The program has to re-assemble the results into a cohesive whole, re-calcing the missing bits as needed.

For multi-machine parallel processing, there has to be a whole suite of network communication protocols. These protocols have to ensure parts are distributed correctly to waiting machines, and valid results are returned. On top of that, you have to re-assemble the parts before returning them to the calling program, since the processing power of individual nodes is generally unknown, forcing results to be returned at random times. There also has to be a mechanism to duplicate the work sent to a node in case no answer is returned within a reasonable amount of time.

There was a parallel computing model called Linda put out in the early 1980s which tried to take advantage of networking. The idea was for something like distributed.net, with hundreds of machines all participating in parallelism. Some machines would be designated as Compute Servers, basically Crays sitting on the networks for any spreadsheet to take advantage of. The designers were eventually overwhelmed with the logistics keeping track of all the outstanding queries and responses. The overhead in the main controlling program grew expotentially as large problems were spread to hundreds of other machines, and eventually the main machine was doing nothing but keeping track, not working on the problem at hand.

Parallism is a different mindset from sequentialism (turing machines), where every large problem has to have clearly defined interfaces between each subset of small problems. It works for cracking MD5 keys, because each subset is a range of keys, and for S@H with their time and frequency ranges.

the AC

--
Hemos is like...sci-fi fans;he thinks technology is cool, but he hasn't bothered to understand the science it's based on
Re:Automagic Parallel Programming by EngrBohn · 1999-08-08 22:35 · Score: 2

You're right. A simple loop with no interloop dependences is something a parallelizing compiler should be able to handle on its own. The problem is such loops are often only a small part of the application, and Amdahl's Law tells us that the speedup we get by parallelizing an application is a function of the fraction of the code that gets parallelized. It would be a waste to through 100 processors at this application if your parallelizing compiler could only parallelize 2% of the code.
IMO, to get near-optimal parallelization, you need the insight only available to a human programmer.
Christopher A. Bohn

--
cb
Oooh! What does this button do!?
Automagic Parallel Programming by EngrBohn · 1999-08-08 20:27 · Score: 2

Unless you identify the parallel nature of your software, then this won't be able to help an individual application (though it might help with distributing multiple apps similar to what you can see with SMP). No OS can be expected to identify the parallel nature on its own. The programmer must do it, either with threading or with message passing. You might try a parallelizing compiler, but they have only limited insight into possible parallel nature, and cannot be expected to provide near-optimal parallelization.
Christopher A. Bohn

--
cb
Oooh! What does this button do!?
1. Re:Automagic Parallel Programming by EngrBohn · 1999-08-08 20:31 · Score: 2
  
  Oh, yeah -- nearly forgot. Another issue is that software that provides good performance on SMP can actually run slower on a distributed system than it would on a uniprocessor, due to the communication overhead introduced by the network.
  Christopher A. Bohn
  
  --
  cb
  Oooh! What does this button do!?
University of Manchester & UMIST by jd · 1999-08-08 20:36 · Score: 2

Have researched into automagically parallelising programs.
One approach was to develop a compiler which attempted to parallelise at the statement level, rather than the procecure level. From what I understand, this worked (kind-of), but the message overheads were very high.
A second approach was to manually parallelise as far as physically possible. The OS would then automatically simulate the "ideal" number of processors over the physical number. (There is a subtle difference between this, and simply allocating threads in a round-robin fashion. For a start, it's adaptive.)
To the best of my knowledge, though, conventional parallel processing use PVM or MPI, which can be royal pains. Give me a Transputer and Occam any day. In any of these cases, though, adaptive parallising is not possible. The program is either written for N processors, or it isn't.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Re:uh, yeah by Sun+Tzu · 1999-08-08 20:51 · Score: 2

heh. "any program you write" is a little bit inconsistent with "minor syntax changes" unless you write everything for the parallelizing compiler. I'm afraid that, if the programmer isn't designing the program explicitly for parallel execution, performance will suck. Add me to the skeptics list too, please. ;)

--
Geeky modern art T-shirts