New Languages Vs. Old For Parallel Programming
joabj writes "Getting the most from multicore processors is becoming an increasingly difficult task for programmers. DARPA has commissioned a number of new programming languages, notably X10 and Chapel, written especially for developing programs that can be run across multiple processors, though others see them as too much of a departure to ever gain widespread usage among coders."
Parallel is not going to go anywhere but is only really valid for certain types if applications. Larger items like operating systems or most system tasks need it. Whether it is worthwhile in lowly application land is a case by case decision; but will mostly depend on the skill of programmers involved and the budget for the particular application in question.
"Maybe this world is another planet's hell"
Aldous Huxley
A lot of problems are I/O driven -- I would like to see more database client libraries allow a full async approach that lets us not block the threads we are trying to do concurrent work on.
There are some packages on CRAN that claim to implement parallel processing for R -- go to http://cran.r-project.org/web/packages/ and search for the text "parallel" to find several examples. I haven't tried any of them out yet, but sooner or later I'm going to have to.
And actually, I think that "scripting" languages in general will have a very bright future in the parallel processing world. If memory management and garbage collection are implemented invisibly (and well!) in the core language, then the programmer can concentrate on the application logic and not have to worry about the kind of allocation headaches discussed in TFA. Python and R, where I spend most of my coding time these days, both offer very nicely implemented versions of function mapping, which I see as the key to making multiple processors useful for a wide variety of tasks. And no, the memory management and GC aren't quite there yet in either language, but they will be.
The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
Not trying to troll or anything, but I'd always hear of how parallel programming is very complicated for programmers, but then I learnt to use pthread in C to parallelise everything in my C program from parallel concurrent processing of the same things to threading any aspect of the program, and I was surprised by how simple and straightforward it was using pthread, even creating a number of threads depending on the number of detected cores was simple.
OK, maybe what I did was simple enough, but I just don't see what's so inherently hard about parellel programming. Surely I am missing something.
You just got troll'd!
Erlang is an older established language designed for parallel processing.
Erlang was first developed in 1986, making it about a decade older than Java or Ruby. It is younger than Perl or C, and just a tad older than Python. It is a mature language with a large support community, especially in industrial applications. It is time tested and proven.
It is also Open source and offers many options for commercial support.
Before anyone at DARPA thinks that they can design a better language for concurrent parallel programming then I think they should be forced to spend 1 year learning Ada, and a second year working in Ada. If they survive they will most likely be cured of the thought that the Defense department can design good programming languages
The example in the article is atrocious.
Why would you want the withdrawal and balance check to run concurrently?
Bullshit.
Tell that to apache, and oracle, and basically anything that runs in a server room.
Threading i don't count as parallel processing for the desktop. I don't even hear of any games or applications built for parallel.
Uhhhhhhhhhhh? Yes, well done with that...
Check out Clojure. The only programming language around that really addresses the issue of programming in a multi-core environment. It's also quite a sweet language besides that.
Rehash time...
Parallelism typically falls into two buckets: Data parallel and functional parallel. The first challenge for the general programming public is identifying what is what. The second challenge is synchronizing parallelism in as bug free way as possible while retaining the performance advantage of the parallelism.
Doing fine-grained parallelism - what the functional crowd is promising, is something that will take a *long* time to become mainstream (Other interesting examples are things like LLVM and K, but they tend to focus more on data parallel). Functional is too abstract for most people to deal with (yes, I understand it is easy for *you*).
Short term (i.e. ~5 years), the real benefit will be in threaded/parallel frameworks (my app logic can be serial, tasks that my app needs happen in the background).
Changing industry tool-chains to something entirely new takes many many years. What most likely will happen is transactional memory will make it into some level of hardware, enabling faster parallel constructs, a cool new language will pop up formalizing all of these features. Someone will tear that cool new language apart by removing the rigor and giving it C/C++ style syntax, then the industry will start using it
This is a subject near and dear to my heart. I got to participate in one of the early X10 alpha tests (my research group was asked to try it out and give feedback to Vivek Sarker's IBM team). Since then, I've worked with lots of other specialized programming HPC programming languages.
One extremely important aspect of supercomputing, a point that many people fail to grasp, is that application code tends to live a long, long, long time. Far longer than the machines themselves. Rewriting code is simply too expensive and economically inefficient. At Los Alamos National Lab, much of the source code they run are nuclear simulations written Fortran 77 or Fortran 90. Someone might have updated it to use MPI, but otherwise it's the same program. So it's important to bear in mind that those older languages, while not nearly as well suited for parallelism (either for programmer ease-of-use/effeciency, or to allow the compiler to do deep analysis/optimization/scheduling), are going to be around for a long time yet.
To make laws that man cannot, and will not obey, serves to bring all law into contempt.
--E.C. Stanton
If threading isn't parallelism, then what is? At what level of separation between separate streams of execution does an application become "parallel"?
We all know what to do, but we don't know how to get re-elected once we have done it
You know what, I don't think I'm going to use modern English, either.
Don't you know that early modern English was invented to have something standard into which the bible could be translated? For shame!
As a devoted secularist, I'll just burn all my shakespeare and rushdie after I delete all my perl code.
The good and new comes from no quarter where it is looked for, and is always something different from what is expected.
Parent is kinda flamebait, and it's exactly the opposite of my experience.
Scientists (I am one) who also write some of their own code, have much better things to do with our time than to try and make the software efficient. When we figure out what we want done, we hand it over to professional programmers who, if the cost:benefit analysis works out, will parallelize or optimize it as they're told is needed. Even lousy programmers are expensive, and hardware is cheap.
I 100% agree with the end of his statement - was it 10, 15 years ago scientific computing was still done in fortran FOR A REASON - the optimizing compiler didn't completely suck? Some scientific computing is still done in FORTRAN but that's been purely a legacy thing since the optimizing compilers for C caught up. I'm sure someone clever will find some way to get an interpreted language to figure out what depends on what and parallelize your code for you. This is a very hard problem to do perfectly, but sensible people will quickly realize that's okay. For some cases, I can beat an optimizing compiler by writing assembly - am I ever going to do that? Hell no.
Now, this may result in additional good coding practices which will be required of us so that the optimizing compiler can make easier sense of our code. Might it be lower overhead to create an optimization friendly programming language, which I suspect will end up amounting to making such practices an explicit requirement? Probably not, but it depends on how closely these new programming languages adhere to existing languages (I haven't looked at either example discussed in the article.)
The good and new comes from no quarter where it is looked for, and is always something different from what is expected.
I've been very disappointed in parallel programming support. The C/C++ community has a major blind spot in this area - they think parallelism is an operating system feature, not a language issue. As a result, C and C++ provide no assistance in keeping track of what locks what. Hence race conditions. In Java, the problem was at least thought about, but "synchronized" didn't work out as well as expected. Microsoft Research people have done some good work in this area, and some of it made it into C#, but they have too much legacy to deal with.
At the OS level, in most operating systems, the message passing primitives suck. The usual approach in the UNIX/Linux world is to put marshalling on top of byte streams on top of sockets. Stuff like XML and CORBA, with huge overhead. The situation sucks so bad that people think JSON is a step forward.
What you usually want is a subroutine call; what the OS usually gives you is an I/O operation. There are better and faster message passing primitives (see MsgSend/MsgReceive in QNX), but they've never achieved any traction in the UNIX/Linux world. Nobody uses System V IPC, a mediocre idea from the 1980s. For that matter, there are still applications being written using lock files.
Erlang is one of the few parallel languages actually used to implement large industrial applications.
There needs to be an equivalent of Donald Knuth's "Art of Computer Programming" as a definitive reference for parallel algorithms. Until then, I don't care how many cores you have, you won't get the most out of them.
I void warranties.
If they are inherently sequential, then obviously they cannot be made to run in parallel. The truth is that the vast majority of computing applications, both existing and future, are inherently parallel. As soon as some maverick startup (forget the big players like Intel, Microsoft, or AMD because they are too married to the old ways) figures out the solution to the parallel programming crisis (see link below), get ready for a flood of super complex parallel applications to hit the market, especially in the AI, gaming and simulation fields. Cars will drive themselves and robots will maintain your home, that kind of stuff. The possibilities are mind boggling.
Now the reason that the old timers cannot solve the problem is that they are all addicted to the Turing Machine model of computing and last century's multithreaded approach to concurrency. The Turing Machine model is evidently no help in solving the crisis and threads are inherently non-deterministic. There is an urgent need to move away from antiquated and flawed paradigms that do not contribute to the solution. Indeed, they got us into this mess to begin with.
How to Solve the Parallel Programming Crisis
Looking at the 99 bottles Chapel code (from original article)
http://99-bottles-of-beer.net/language-chapel-1215.html
This looks like the way you do stuff in Haskell. Functions compute the data and the I/O routine is moved into a "monad" where you need to sequence. This doesn't seem outside the realm of the possible.
I have not read the article (par for the course here) but I think there is probably some confusion among the commenters regarding the difference between multi-threading programs and parallel algorithms. Database servers, asynchronous I/O, background tasks and web servers are all examples of multi-threaded applications, where each thread can run independently of every other thread with locks protecting access to shared objects. This is different from (and probably simpler than) parallel programs. Map-reduce is a great example of a parallel distributed algorithm, but it is only one parallel computing model: Multiple Instruction / Multiple Data (MIMD). Single Instruction / Multiple Data (SIMD) algorithms implemented on super-computers like Cray (more of a vector machine, but it's close enough to SIMD) and MasPar systems require different and far more complex algorithms. In addition, purpose-built supercomputers may have additional restrictions on their memory accesses, such as whether multiple CPUs can concurrently read or write from memory.
Of course, the Cray and Maspar systems are purpose-built machines, and, much like special-build processors have fallen in performance to general purpose CPUs, Cray and Maspar systems have fallen into disuse and virtual obscurity; therefore, one might argue that SIMD-type systems and their associated algorithms should be discounted. But, there is a large class of problems -- particularly sorting algorithms -- well suited to SIMD algorithms, so perhaps we shouldn't be so quick to dismiss them.
There is a book called An Introduction to Parallel Algorithms by Joseph JaJa (http://www.amazon.com/Introduction-Parallel-Algorithms-Joseph-JaJa/dp/0201548569) that shows some of the complexities of developing truly parallel algorithms.
(Disclaimer: I own a copy of that book but otherwise have no financial interests in it.)
I think you're confusing two different uses of parallelism. One is "small" parallelism -- the kind you see in graphical user interfaces. That is to say, if Firefox is busy loading a page, you can still click on the menus and get a response. Different aspects of the GUI are handled by different threads, so the program is responsive instead of hanging. That's done by using threading libraries like Posix and the like. But that's really a negligible application of parallelism. The really important use of parallelism is for really large programs that require lots of hardware to run in a reasonable amount of time.
The threading libraries that you use for GUI applications don't work well for computationally intensive applications requiring parallelism. They require a shared-memory architecture. (A shared memory architecture is one in which all processors see the same values in the RAM. E.g, if processor #1 writes X to memory block 0x87654321, and processor #2 reads 0x87654321, it returns X instead of whatever value processor #2 last wrote there) Shared-memory architectures don't scale -- the biggest ones you can buy have about 64 CPUs. If you do want to run computationally intensive applications on shared memory architectures, then OpenMP is the library of choice. It's also fairly simple to use.
If you want to run a big applications, you need to use a distributed memory architecture. And MPI (message passing interface) is pretty much the only game in town where that is concerned. It's by far the dominant player.
To make laws that man cannot, and will not obey, serves to bring all law into contempt.
--E.C. Stanton
No widely spoken human natural language was "invented," Modern English included. Where do people come up with these things? Modern English evolved out of Middle English, just as Spanish evolved out of Latin, etc. Modern English was not "invented" in any meaningful sense of "invernted."
reference for WikiWeenies.
Whoever told you that is mistaken.
The easiest way to take advantage of a multiprocessing environment is to use techniques that will be familiar to any high level programmer. For example, you don't write for loops, you call functions written in a low level language to do things like that for you. Those low level functions can be easily parallelized, giving all your code a boost.
Recent versions of gcc support OpenMP, and there's now experimental support for a multithreading library that I gather is going to be in the next c++ standard. These don't solve everyone's problems, but certainly it's getting easier, not harder, to take better advantage of multi-processor multi-core systems. I recently test retrofit some of my own code with OpenMP, and it was ridiculously easy. Five years ago it would have been a much more irritating process. I realize not everyone develops in c/c++, nor does everyone use a compiler that supports OpenMP. But I doubt it's actually getting harder, probably just the rate at which it's getting easier is not the same for everyone.
The main problem faced by each new language is "How do I access all the stuff that's already been done?"
The "Do it over again" answer hasn't been successful since Sun pushed Java, and Java's initial target was an area that hadn't had a lot of development work. Sun spent a lot of money pushing Java, and was only partially successful. Now it probably couldn't be done again even by a major corporation.
The other main answer is make calling stuff written in C or C++ (or Java) trivial.Python has used this to great effect, and Ruby to a slightly lesser one. Also note Jython, Groovy, Scala, etc. But if you're after high performance, Java has the dead weight of an interpreter (i.e., virtual machine). So that basically leaves easy linkage with C or C++. And both are purely DREADFUL languages to link to, due to pointer/integer conversions and macros. And callbacks. Individual libraries can be wrapped, but it's not easy to craft global solutions that work nicely. gcc has some compiler options that could be used to eliminate macros. Presumably so do other compilers. But they definitely aren't standardized. And you're still left not knowing what's a pointer so you don't know what memory can be freed.
The result of this is that to get a new language into a workable state means a tremendous effort to wrap libraries. And this needs to be done AFTER the language is stabilized. And the people willing to work on this aren't the same people as the language implementers (who have their own jobs).
I looked over those language sites, and I couldn't see any sign that thoughts had been given to either Foreign Function Interfaces or wrapping external libraries. Possibly they just used different terms, but I suspect not. My suspicion is that the implementers aren't really interested in language use so much as proving a concept. So THESE aren't the languages that we want, but they are test-beds for working out ideas that will later be imported into other languages.
I think we've pushed this "anyone can grow up to be president" thing too far.
In the let the compiler decide attitude of the C language families ... Inmos C had the correct solution. You add two new keywords to the language, parallel and sequential.
sequential
{
stmt1;
stmt2;
stmt3;
}
as opposed to
parallel
{
stmt4;
stmt5;
stmt6;
}
The stmt1 must be executed before stmt2 which must be executed before stmt3 in the sequential construct. C languages actually already support this in a bit more awkward way with the ravel operator. But sequential is an easier to understand and read method, and balances nicely the parallel keyword. The compiler and runtime have been told that stmt4, stmt5, and stmt6 can be executed in parallel. There is implicit synchronization at the end of the statement block.
This is all well and good and many people look and say that it would not be so tough to do this in other ways, and so on. But combine this with fast iterators as are in Objective C 2.0 and it gets much more interesting. Or for the generalized case where any place a left brace is permissible, either of these two constructs could be substituted. This generalizes to braces enclosing a conventional block of statements as exists now, a forced sequential block of statements (so that side affects from say external inputs or other volatile entities can be dealt with at the specific case where needed) or a statement block where the contained statements may be executed in parallel. The programmer still has to have a bit of knowledge here, but the compiler and runtime can really lighten the load. And it does not have a syntax clash with either C, C++ or Objective C so could be adopted by all of them.
I used this back in 1980s and it was awesomely easy to deal with dispatch of hundreds of lightweight instances. Essentially fibers in a more modern vernacular. By partitioning the work between the complier and the runtime systems I ran the same binary code across quad processor and 64 processor arrays. (Ancillary to this discussion was that Inmos Transputers had also built in message passing on dedicated links in hardware. Of course Fortran was also supported as was Pascal, but the main pushed language was Occam. And hardware timers were there as a data type too to make scheduling a breeze. Processors w/o hardware timers just mimicked them in the runtime. And locks were supported in the hardware as well...)
The point is this was a elegantly solved problem in the 1980s that was mostly forgotten. It was a simple matter to have the runtime aware of the fabric an individual process could access and just turn stuff loose. But that part is a bit outside the main discussion, like I don't drift enough already!
- Tjp
I am in wallow with my inner money grubbing capitalistic pig. ... Oink!
That's not what he meant. Yes, technically the index variable is changing, and it may also be used inside the loop. But the relevant questions are: "does it matter what order the iterations run in", and "does one iteration have to finish before the next can begin".
If the answer to both questions is "no", then you can run several loop bodies at once on different processors. Bingo, instant speedup.
Threads Cannot be Implemented as a Library. That means pthreads is bad. Read: http://www.hpl.hp.com/techreports/2004/HPL-2004-209.pdf
Then after a few years, work on Java memory model has found a good solution. Read: Foundations of the C++ concurrency memory model [based on the Java memory model] http://www.hpl.hp.com/techreports/2008/HPL-2008-56.pdf
How fugly can this be for all you C++ wannabe fanguys??? (Phun intended!)
Haeleth is correct. The problem with a for loop is that it may or may not be parallelizable. There are some compilers that attempt to guess, but they generally do a poor job.
The secret to efficient programming in interpreted languages is that whenever you want to do something where you'd normally use a for loop, you call a compiled function to do that for you. Those utility functions are generally explicitly either parallelizable or not, so the compiler doesn't have to guess - it knows.
Suppose I want to do C = A + B, where A and B are large arrays. In a language like C I would write a for loop:
for (int i=0; iaLength; i++) {
C[i] = A[i] + B[i]
}
That's a fairly trivial example, but that simple loop can foul up a parallelizing compiler. The equivalent in, say Python, is:
C = A + B
where the + operator calls a compiled C function. Whoever writes that C function KNOWS the necessary loop can be calculated in parallel, and so can code it that way. From then on, everyone who uses the + operator benefits.
Yes, you can do the same thing with parallel libraries in compiled languages, but programmers in those languages tend not to be used to that way of thinking. As an interpreted programmer you see many, many clever tricks to, for example, do large array computations using provided compiled functions rather than straightforward for loops. Those tricks are precisely the ones you need to learn as a major component of effective parallel programming.
Mandarin is/was the court language of China, it was created in a rather clever way.
The country, being very large had a number of different dialects. Mandarin was developed so that while the spoken word might be different in different areas of the country the Mandarin text was identical regardless of where you were. It was a language of scholars and regular people never really learned it(they didn't need to). The spoken form wasn't used much by anyone who wasn't a courtier(though neither was the written form).
For example, Peking and Beijing are the same place, Peking is what they call(ed) it in the south, and Beijing is what they call it in the north, the written mandarin for both words is the same.
To a certain extent nearly all written languages were initially created because so very few people actually wrote in the early days that there wasn't really any sort of natural evolution for a lot of written languages. For a more specific example, during the early years of the soviet union when the soviets were encouraging the development of their ethnic minorities(as opposed to kicking them off their land and putting them in work camps as they did a few years later. The Soviet government actually sent linguists out to some of the more nomadic of these groups to develop a written form of their language, which prior to this effort had never existed. That's not even counting resurrected languages which likely bear no significant resemblance to their previous forms, but which are spoken by real live people, or made up languages like Klingon that regular folks .
Languages are both created and naturally evolve, and written languages and spoken languages do not always begin at the same time, are not always used by the same people, and are sometimes rather arbitrary.