Slashdot Mirror


New Languages Vs. Old For Parallel Programming

joabj writes "Getting the most from multicore processors is becoming an increasingly difficult task for programmers. DARPA has commissioned a number of new programming languages, notably X10 and Chapel, written especially for developing programs that can be run across multiple processors, though others see them as too much of a departure to ever gain widespread usage among coders."

26 of 321 comments (clear)

  1. Re:I'm waiting for parallel libs for R by Daniel+Dvorkin · · Score: 2, Interesting

    There are some packages on CRAN that claim to implement parallel processing for R -- go to http://cran.r-project.org/web/packages/ and search for the text "parallel" to find several examples. I haven't tried any of them out yet, but sooner or later I'm going to have to.

    And actually, I think that "scripting" languages in general will have a very bright future in the parallel processing world. If memory management and garbage collection are implemented invisibly (and well!) in the core language, then the programmer can concentrate on the application logic and not have to worry about the kind of allocation headaches discussed in TFA. Python and R, where I spend most of my coding time these days, both offer very nicely implemented versions of function mapping, which I see as the key to making multiple processors useful for a wide variety of tasks. And no, the memory management and GC aren't quite there yet in either language, but they will be.

    --
    The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
  2. What's so hard? by 4D6963 · · Score: 2, Interesting

    Not trying to troll or anything, but I'd always hear of how parallel programming is very complicated for programmers, but then I learnt to use pthread in C to parallelise everything in my C program from parallel concurrent processing of the same things to threading any aspect of the program, and I was surprised by how simple and straightforward it was using pthread, even creating a number of threads depending on the number of detected cores was simple.

    OK, maybe what I did was simple enough, but I just don't see what's so inherently hard about parellel programming. Surely I am missing something.

    --
    You just got troll'd!
    1. Re:What's so hard? by Unoti · · Score: 3, Interesting

      The fact that it seems so simple at first is where the problem starts. You had no trouble in your program. One program. That's a great start. Now do something non-trivial. Say, make something that simulates digital circuits-- and gates, or gates, not gates. Let them be wired up together. Accept an arbitrarily complex setup of digital logic gates. Have it simulate the outputs propagating to the inputs. And make it so that it expands across an arbitrary number of threads, and make it expand across an arbitrary number of processes, both on the same computer and on other computers on the same network.

      There are some languages and approaches you could choose for such a project that will help you avoid the kinds of pitfalls that await you, and provide most or all of the infrastructure that you'd have to write yourself in other languages.

      If you're interested in learning more about parallel programming, why it's hard, and what can go wrong, and how to make it easy, I suggest you read a book about Erlang. Then read a book about Scala.

      The thing is, it looks easy at first, and it really is easy at first. Then you launch your application into production, and stuff goes real funny and it's nigh unto impossible to troubleshoot what's wrong. In the lab, it's always easy. With multithreaded/multiprocess/multi-node systems, you've got to work very very hard to make them mess up in the lab the same way they will in the real world. So it seems like not a big deal at first until you launch the stuff and have to support it running every day in crazy unpredictable conditions.

    2. Re:What's so hard? by Yacoby · · Score: 2, Interesting

      Data communication in a foolproof way. Writing a threaded program is easy if the program is simple. You can even get a bit more performance out of a program using multiple threads if you use locking. If you use locking, you end up with the possibility of race conditions, deadlock and other nightmares.

      Extending this to something like a game engine is much harder. Say we split our physics and rendering into two threads. How does the physics thread update the render thread? We could just lock the whole scene graph, but then we don't get much of a performance increase, if at all. We then could use two buffers. The renderer renders the data from one, and the physics thread updates the other. When we are ready to update the frame, we just swap the buffers. Then we end up with some input lag. There are still complications. What happens if we add an AI thread. How does that add data to the buffer in a way that doesn't conflict with the physics thread?

      We could use lock free lists, which are very hard to get right. Even some implementations that I have seen end up locking the heap, which we want to avoid. But even then we end up with some issues.
      Don't get me started on debugging threaded applications. Finding that while it works fine on one and two cores. 0.1% of the time on a quad core there is a deadlock.

      So to sum it up. Anyone can write a threaded application where it is easy to split the tasks. If you are designing it from the ground up, it is even easier. If you need to write performance critical maintainable code that involves a lot of communication, it suddenly gets much harder.

    3. Re:What's so hard? by Anonymous Coward · · Score: 1, Interesting

      I guess this is where 'restrict' comes in. If a, b and c can be determined as aliases and non-overlapping, the compiler may auto-vectorise that for you on an appropriate architecture.

      That said, in Handel C, a dying dialect of C which targeted FPGA would let you do the following:

      par (i = 0; i < 1000; i++)
      c[i] = a[i] + b[i];

      This would build a massive amount of logic to perform the 1000 adders in parallel on an FPGA, but it's nice syntax. The par could also be replaced with seq to make a sequential version (sill using lots of logic since it seq is like an unrolled loop).

  3. Re:Parallel is here to stay but not for every app by Daniel+Dvorkin · · Score: 4, Interesting

    True enough, but the class of applications for which parallel processing is useful is growing rapidly as programmers learn to think in those terms. Any program with a "for" or "while" loop in which the results of one iteration do not depend on the results of the previous iteration, as well as a fair number of such loops in which the results do have such a dependency, is a candidate for parallelization -- and that means most of the programs which most programmers will ever write. We just need the languages not to make coding this way too painful.

    --
    The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
  4. Clojure by slasho81 · · Score: 4, Interesting

    Check out Clojure. The only programming language around that really addresses the issue of programming in a multi-core environment. It's also quite a sweet language besides that.

    1. Re:Clojure by slasho81 · · Score: 2, Interesting

      That's a rather bold statement. You do realize that those neat features of Clojure like STM or actors weren't originally invented for it? In fact, you could do most (all?) of that in Haskell before Clojure even appeared.

      I do realize that many of the innovations in Clojure are not brand new, but Clojure did put them into a practical form that incorporates many "right" innovations into one language. Haskell is a fine language and one of the languages that heavily influenced Clojure. Clojure makes some paradigms used in Haskell far more usable than they are in their original form.

      On a side note, while STM sounds great in theory for care-free concurrent programming, the performance penalty that comes with it in existing implementations is hefty. It's definitely a prospective area, but it needs more research before the results are consistently usable in production.

      In addition, things like STM is more of a general title for a set of technologies with same general principles but vastly different implementation. Clojure's implementation plus the immutability paradigm Clojure embraces makes its STM darn close to care-free concurrent programming in almost all situations you'll encounter. And I'm well aware that this is an even bolder statement, but I strongly recommend checking it out if you do any kind of concurrent programming. It delivers.

  5. Re:Awful example in the article by awol · · Score: 3, Interesting

    The example in the article is atrocious.

    Why would you want the withdrawal and balance check to run concurrently?

    Because I can do a whole lot of "local" withdrawal processing whilst my balance check is off checking the canonical source of balance information. If it's comes back OK then the work I have been doing in parallel is now commitable work and my transaction is done. Perhaps in no more time than either of the balance check or the withdrawal whichever is the longest. Whilst the balance check/withdrawal example may seem ridiculous. There are some very interesting applications of this kind of problem in securities (financial) trading systems where the canonical balances of different instruments would conveniently (and some times mandatorily) stored in different locations and some complex synthetic transactions require access to balances from more than one instrument in order to execute properly.

    It seems to me that most of the interesting parallism problems relate to distributed systems and it is not just a question of N phase commit databases but rather a construct of "end to end" dependencies in your processing chain where the true source of data cannot be accessed from all the nodes in the cluster at the same time from a procedural perspective.

    It is this fact that to me suggests that the answer to these issues is a radical change in language toward the functional or logical types of languages like haskel and prolog with erlang being a very interesting place on that path for right now.

    --
    "The first thing to do when you find yourself in a hole is stop digging."
  6. Re:Old languages designed for parallel processing? by Anonymous Coward · · Score: 1, Interesting

    Labview by National Instruments can do parallel programming automatically. I myself find the coding fairly straightforward as well. It's problem is performance. Well thought out C++ seems to easily win the performance war, especially when you deal with all the buffer copies Labview has a tendency to toss in. Currently I've been using a mix of Labview and C++(Dll's) which seems to offer some of the benefits(debugging/performance/flexibility) of each, although I'm not sure that combining C# and C/C++ wouldn't have been an even better solution.

    In particularly, I think the two biggest flaws in existing Labview are limitations on efficient string manipulation, as well as no true inlined functions. Labview does have subroutines, but that isn't really the same. Some C++ can get around such things, but it is not an ideal solution. Still, if you need to put together things _fast_ Labview is useful for that.

  7. Re:Parallel programming is dead. No one uses it... by sam_handelman · · Score: 2, Interesting

    Parent is kinda flamebait, and it's exactly the opposite of my experience.

      Scientists (I am one) who also write some of their own code, have much better things to do with our time than to try and make the software efficient. When we figure out what we want done, we hand it over to professional programmers who, if the cost:benefit analysis works out, will parallelize or optimize it as they're told is needed. Even lousy programmers are expensive, and hardware is cheap.

      I 100% agree with the end of his statement - was it 10, 15 years ago scientific computing was still done in fortran FOR A REASON - the optimizing compiler didn't completely suck? Some scientific computing is still done in FORTRAN but that's been purely a legacy thing since the optimizing compilers for C caught up. I'm sure someone clever will find some way to get an interpreted language to figure out what depends on what and parallelize your code for you. This is a very hard problem to do perfectly, but sensible people will quickly realize that's okay. For some cases, I can beat an optimizing compiler by writing assembly - am I ever going to do that? Hell no.

      Now, this may result in additional good coding practices which will be required of us so that the optimizing compiler can make easier sense of our code. Might it be lower overhead to create an optimization friendly programming language, which I suspect will end up amounting to making such practices an explicit requirement? Probably not, but it depends on how closely these new programming languages adhere to existing languages (I haven't looked at either example discussed in the article.)

    --
    The good and new comes from no quarter where it is looked for, and is always something different from what is expected.
  8. The mess by Animats · · Score: 4, Interesting

    I've been very disappointed in parallel programming support. The C/C++ community has a major blind spot in this area - they think parallelism is an operating system feature, not a language issue. As a result, C and C++ provide no assistance in keeping track of what locks what. Hence race conditions. In Java, the problem was at least thought about, but "synchronized" didn't work out as well as expected. Microsoft Research people have done some good work in this area, and some of it made it into C#, but they have too much legacy to deal with.

    At the OS level, in most operating systems, the message passing primitives suck. The usual approach in the UNIX/Linux world is to put marshalling on top of byte streams on top of sockets. Stuff like XML and CORBA, with huge overhead. The situation sucks so bad that people think JSON is a step forward.

    What you usually want is a subroutine call; what the OS usually gives you is an I/O operation. There are better and faster message passing primitives (see MsgSend/MsgReceive in QNX), but they've never achieved any traction in the UNIX/Linux world. Nobody uses System V IPC, a mediocre idea from the 1980s. For that matter, there are still applications being written using lock files.

    Erlang is one of the few parallel languages actually used to implement large industrial applications.

    1. Re:The mess by Animats · · Score: 2, Interesting

      Sigh .. my last perl program used JSON messages over System V IPC (msgsnd,msgrcv). And here I was feeling proud of it

      I know, I know. I have an application in production which uses Python "pickle" over pipes to subprocesses.

      Incidentally, it's interesting to speculate what the UNIX/Linux world might have been like if, when a process exited, it was able to return a result list, like the parameter list that goes in. Shell scripts, and "make", might not have been so blind to what the subprogram actually did.

  9. Re:Parallel is here to stay but not for every app by peragrin · · Score: 2, Interesting

    yea why can't you buy 6 ghz cores ? Is it because unless you super cool them you can't clock them that high?

    3.8 ghz P4 was released in 2005. Instead Intel has focused on power savings, and adding cores while to shrink die sizes.

    Quantum computing is a long ways off, heck they can't even get a good Memresistor yet. The advantage we are having is that Memory speeds are finally catching up to processor speeds. Combine that with a memresistor at that speed and Computing will take a whole new direction for efficiencies and speed. However clock speed isn't gong to significantly increase for a while.

    --
    i thought once I was found, but it was only a dream.
  10. Art of Parallel Programmin by rayharris · · Score: 2, Interesting

    There needs to be an equivalent of Donald Knuth's "Art of Computer Programming" as a definitive reference for parallel algorithms. Until then, I don't care how many cores you have, you won't get the most out of them.

    --
    I void warranties.
  11. How to Solve the Parallel Programming Crisis by Louis+Savain · · Score: 2, Interesting

    Exactly, some problems are inherently serial. These programs would run slower if you made them run in parallel.

    If they are inherently sequential, then obviously they cannot be made to run in parallel. The truth is that the vast majority of computing applications, both existing and future, are inherently parallel. As soon as some maverick startup (forget the big players like Intel, Microsoft, or AMD because they are too married to the old ways) figures out the solution to the parallel programming crisis (see link below), get ready for a flood of super complex parallel applications to hit the market, especially in the AI, gaming and simulation fields. Cars will drive themselves and robots will maintain your home, that kind of stuff. The possibilities are mind boggling.

    Now the reason that the old timers cannot solve the problem is that they are all addicted to the Turing Machine model of computing and last century's multithreaded approach to concurrency. The Turing Machine model is evidently no help in solving the crisis and threads are inherently non-deterministic. There is an urgent need to move away from antiquated and flawed paradigms that do not contribute to the solution. Indeed, they got us into this mess to begin with.

    How to Solve the Parallel Programming Crisis

    1. Re:How to Solve the Parallel Programming Crisis by Louis+Savain · · Score: 2, Interesting

      I am not sure why you make your assumptions since the solution that I am proposing emphasizes intructions and timing more than anything else. Data is just the environment where the program effects changes and reacts to changes. If my solution is dataflow then so is a pulsed neural network that consists of connected sensors and effectors. I have never heard neural network programmers refer to their programs as dataflow programs. Yet, this is what I am proposing: a program should be more like a signal-driven neural network.

      AFAIK, dataflow systems do not concern themsleves with a program counter. One of the most important aspects of the solution that I am advancing is that determinism is essential to reliable software. There should never be any ambiguity as to whether any two events/operations in a program are sequential or parallel. This requires a program counter to mark time. Another important aspect is that a program should be 100% reactive, i.e., everything should happen in reaction to a change/event.

  12. Chapel by jbolden · · Score: 2, Interesting

    Looking at the 99 bottles Chapel code (from original article)
    http://99-bottles-of-beer.net/language-chapel-1215.html

    This looks like the way you do stuff in Haskell. Functions compute the data and the I/O routine is moved into a "monad" where you need to sequence. This doesn't seem outside the realm of the possible.

  13. Re:Parallel is here to stay but not for every app by Anonymous Coward · · Score: 1, Interesting

    You can make video encoding very parallel, but you reduce the quality (image quality decreases and bitrate increases) because the most efficient use of motion compensation video compression techniques, like the ones used in all MPEG derivatives, requires using the result of processing one entire frame before processing the next frame. In other words, you can make your encoder highly parallel, but you won't get anything resembling the compression and quality of even a video CD

  14. Re:I'm waiting for parallel libs for R by ceoyoyo · · Score: 4, Interesting

    Whoever told you that is mistaken.

    The easiest way to take advantage of a multiprocessing environment is to use techniques that will be familiar to any high level programmer. For example, you don't write for loops, you call functions written in a low level language to do things like that for you. Those low level functions can be easily parallelized, giving all your code a boost.

  15. it's getting easier, not harder by drfireman · · Score: 2, Interesting

    Recent versions of gcc support OpenMP, and there's now experimental support for a multithreading library that I gather is going to be in the next c++ standard. These don't solve everyone's problems, but certainly it's getting easier, not harder, to take better advantage of multi-processor multi-core systems. I recently test retrofit some of my own code with OpenMP, and it was ridiculously easy. Five years ago it would have been a much more irritating process. I realize not everyone develops in c/c++, nor does everyone use a compiler that supports OpenMP. But I doubt it's actually getting harder, probably just the rate at which it's getting easier is not the same for everyone.

  16. Re:Old languages designed for parallel processing? by Anonymous Coward · · Score: 2, Interesting

    If I recall correctly, the Swedish telecom where Erlang was designed had one server running it with 7 continuous years uptime.

    Certainly their ADX301 telephony-over-ATM switch has achieved 'nine nines' uptime - i.e. 99.9999999% up, which allows for ~31ms downtime a year.

    British Telecom use them to power their network and the passed the ultimate test (finals voting for Pop Star, the UK original for American Idol) with flying colours.

    And erlang is a fun language to program.

  17. LIBRARIES!! by HiThere · · Score: 2, Interesting

    The main problem faced by each new language is "How do I access all the stuff that's already been done?"

    The "Do it over again" answer hasn't been successful since Sun pushed Java, and Java's initial target was an area that hadn't had a lot of development work. Sun spent a lot of money pushing Java, and was only partially successful. Now it probably couldn't be done again even by a major corporation.

    The other main answer is make calling stuff written in C or C++ (or Java) trivial.Python has used this to great effect, and Ruby to a slightly lesser one. Also note Jython, Groovy, Scala, etc. But if you're after high performance, Java has the dead weight of an interpreter (i.e., virtual machine). So that basically leaves easy linkage with C or C++. And both are purely DREADFUL languages to link to, due to pointer/integer conversions and macros. And callbacks. Individual libraries can be wrapped, but it's not easy to craft global solutions that work nicely. gcc has some compiler options that could be used to eliminate macros. Presumably so do other compilers. But they definitely aren't standardized. And you're still left not knowing what's a pointer so you don't know what memory can be freed.

    The result of this is that to get a new language into a workable state means a tremendous effort to wrap libraries. And this needs to be done AFTER the language is stabilized. And the people willing to work on this aren't the same people as the language implementers (who have their own jobs).

    I looked over those language sites, and I couldn't see any sign that thoughts had been given to either Foreign Function Interfaces or wrapping external libraries. Possibly they just used different terms, but I suspect not. My suspicion is that the implementers aren't really interested in language use so much as proving a concept. So THESE aren't the languages that we want, but they are test-beds for working out ideas that will later be imported into other languages.

    --

    I think we've pushed this "anyone can grow up to be president" thing too far.
  18. Inmos had it right by Tjp($)pjT · · Score: 2, Interesting

    In the let the compiler decide attitude of the C language families ... Inmos C had the correct solution. You add two new keywords to the language, parallel and sequential.
    sequential
    {
    stmt1;
    stmt2;
    stmt3;
    }

    as opposed to

    parallel
    {
    stmt4;
    stmt5;
    stmt6;
    }

    The stmt1 must be executed before stmt2 which must be executed before stmt3 in the sequential construct. C languages actually already support this in a bit more awkward way with the ravel operator. But sequential is an easier to understand and read method, and balances nicely the parallel keyword. The compiler and runtime have been told that stmt4, stmt5, and stmt6 can be executed in parallel. There is implicit synchronization at the end of the statement block.

    This is all well and good and many people look and say that it would not be so tough to do this in other ways, and so on. But combine this with fast iterators as are in Objective C 2.0 and it gets much more interesting. Or for the generalized case where any place a left brace is permissible, either of these two constructs could be substituted. This generalizes to braces enclosing a conventional block of statements as exists now, a forced sequential block of statements (so that side affects from say external inputs or other volatile entities can be dealt with at the specific case where needed) or a statement block where the contained statements may be executed in parallel. The programmer still has to have a bit of knowledge here, but the compiler and runtime can really lighten the load. And it does not have a syntax clash with either C, C++ or Objective C so could be adopted by all of them.

    I used this back in 1980s and it was awesomely easy to deal with dispatch of hundreds of lightweight instances. Essentially fibers in a more modern vernacular. By partitioning the work between the complier and the runtime systems I ran the same binary code across quad processor and 64 processor arrays. (Ancillary to this discussion was that Inmos Transputers had also built in message passing on dedicated links in hardware. Of course Fortran was also supported as was Pascal, but the main pushed language was Occam. And hardware timers were there as a data type too to make scheduling a breeze. Processors w/o hardware timers just mimicked them in the runtime. And locks were supported in the hardware as well...)

    The point is this was a elegantly solved problem in the 1980s that was mostly forgotten. It was a simple matter to have the runtime aware of the fabric an individual process could access and just turn stuff loose. But that part is a bit outside the main discussion, like I don't drift enough already!

    --
    - Tjp

    I am in wallow with my inner money grubbing capitalistic pig. ... Oink!

  19. Re:Parallel is here to stay but not for every app by bertok · · Score: 3, Interesting

    The % utilization metric is a red herring. Most servers are underutilized by that metric, which is why VMware is making so much money consolidating them!

    Users don't actually notice, or care, about CPU utilization. What users notice, is latency. If my computer is 99% idle, that's fine, but I want it to respond to mouse clicks in a timely fashion. I don't want to wait, even if it's just a few hundred milliseconds. This is where parallel computation can bring big wins.

    One thing I noticed is that MS SQL Server still has its default "threshold for query parallelism" set to "5", which AFAIK means that if the query planner estimates that a query will take more than 5 seconds, it'll attempt a parallel query plan instead. That's insane! I don't know what kind of users Microsoft is thinking of, but in my world, if a form takes 5 seconds to display, it's way too slow to be considered acceptable. Many servers now have 8 or more cores, and 24 (4x hexacore) is going to be common for database servers very soon. In that picture, even if you only consider a 15x speedup due to overhead, 5 seconds becomes something like 300 milliseconds!

    Ordinary Windows applications can benefit from the same kind of speedup. For example, a huge number of applications use compression internally (all Java JAR files, of the docx-style Office 2007 files, etc...), yet the only parallel compressor I know of is WinRAR, which really does get 4x the speed on my quad-core. Did you know that the average compression rate for a normal algorithm like zip is something like 10MB/sec/core? That's pathetic. A Core i7 with 8 threads could probably do the same thing at 60 MB/sec or more, which is more in line with, say, gigabit ethernet speeds, or a typical hard-drive.

    In other words, for a large class of apps, your hard-drive is not the bottleneck, your CPU is. How pathetic is that? A modern CPU has 4 or more cores, and it's busy hammering just one of those while your hard-drive, a mechanical component, is waiting to send it more data.

    You wait until you get an SSD. Suddenly, a whole range of apps become "cpu limited".

  20. Interesting article by Banador · · Score: 2, Interesting

    Threads Cannot be Implemented as a Library. That means pthreads is bad. Read: http://www.hpl.hp.com/techreports/2004/HPL-2004-209.pdf

    Then after a few years, work on Java memory model has found a good solution. Read: Foundations of the C++ concurrency memory model [based on the Java memory model] http://www.hpl.hp.com/techreports/2008/HPL-2008-56.pdf

    How fugly can this be for all you C++ wannabe fanguys??? (Phun intended!)