Slashdot Mirror


Multi-Threaded Programming Without the Pain

holden karau writes "Gigahertz are out and cores are in. Programmers must begin to develop applications that take full advantage of the increasing number of cores present in modern computers. However, multi-threaded development has been notoriously hard to do. Researcher Stefanus Du Toit discusses and demonstrates RapidMind, a software system he co-authored, that takes the pain out of multi-threaded programming in C++. For his demo he created a program on the PlayStation 3 representing thousands of chickens, each independently tracked by a single processing core. The talk itself is interesting but the demo is golden."

327 comments

  1. Which Comes First? by jforest1 · · Score: 5, Funny

    The multi-threaded chicken or the multi-threaded egg?

    --josh

    1. Re:Which Comes First? by zoefff · · Score: 2, Funny

      nope, the post

    2. Re:Which Comes First? by Kjella · · Score: 4, Funny

      Since the egg thread is always converted to a chicken thread while the chicken thread spawns new subthreads, the egg came first. QED.

      --
      Live today, because you never know what tomorrow brings
    3. Re:Which Comes First? by cronot · · Score: 5, Funny

      the fork()

    4. Re:Which Comes First? by Timesprout · · Score: 3, Funny

      Since the egg thread is always converted to a chicken thread
      Someone forgot about the omlette thread.
      --
      Do not try to read the dupe, thats impossible. Instead, only try to realize the truth
      What truth?
      There is no dupe
    5. Re:Which Comes First? by Anonymous Coward · · Score: 0

      Wouldn't they come at the same time?

    6. Re:Which Comes First? by Rakshasa+Taisab · · Score: 1

      So who's to say the egg thread wasn't executed first? You seem to have missed the fundamental problem in chicken/egg thing.

      --
      - These characters were randomly selected.
    7. Re:Which Comes First? by Anonymous Coward · · Score: 0

      probably the first time a fork was used to produce a chicken rather than consume it

    8. Re:Which Comes First? by Anonymous Coward · · Score: 0

      I've always believed this when it comes to the chicken and egg question.

    9. Re:Which Comes First? by Frozen+Void · · Score: 1

      Depends on the scheduler.

    10. Re:Which Comes First? by FuzzyDaddy · · Score: 1
      Of course, fork() spawns a new process, not a thread...

      --
      It's not wasting time, I'm educating myself.
    11. Re:Which Comes First? by Anonymous Coward · · Score: 0

      information theory suggests that you can never know the contents of the egg without changing it*, so the observer does not know if the egg was going to be a chicken egg if he did not observe it. The only way to assume total "egg before chicken" is to let the egg hatch without interfering, but then again there is never a 'chicken egg', only 'unclassified eggs' and only after it is hatched does an observer understand that there is a now what is called a chicken. so the chicken has to come first.

    12. Re:Which Comes First? by Anonymous Coward · · Score: 1, Funny

      They're multi-threaded... They can come at the same time.

    13. Re:Which Comes First? by Eythian · · Score: 1

      I'd expect the (multithreaded or otherwise) rooster came first.

    14. Re:Which Comes First? by Anonymous Coward · · Score: 0

      (a) Other animals were laying eggs millions of years before the first birds, let alone the first chickens.

      (b) Eggs generally form before fertilization.

      (c) Eggs generally contain half the genetic makeup of the animal that will form inside the egg upon fertilization.

      But we can consider the question you seem to have answered, viz.: "Which came first, the chicken or the chicken egg?"

      (d) If a chicken-ancestor produces an egg that will, upon fertilization by an appropriate spermatazoan, become a full chicken, but the mother is not fully a chicken genetically, is the egg a chicken egg?

      (d)(1) If so, then the chicken egg came before the chicken, otherwise
      (d)(2) If not, then the chicken comes at the same time as the egg.

      Since at the time of formation, the candidate unfertilized egg is in a superposition of chicken egg (produced by a chicken-ancestor) and chicken-ancestor egg (produced by a chicken-ancestor), the answer to (d) remains unclear until the time of fertilization.

      If we accept (d)(1) then there were many chicken eggs produced by chicken-ancestor hens but fertilized by chicken-ancestor cocks such that the resulting diploid zygote is a chicken-ancestor rather than a chicken.

      If we accept (d)(2) then we accept that a chicken egg is one that contains an actual chicken, no matter what laid or fertilized it, then the chicken and the chicken egg happen simultaneously.

    15. Re:Which Comes First? by Anonymous Coward · · Score: 0

      No, eggs can't come at all; they can't even masturbate.

  2. Deadlocked! by Trimbo2 · · Score: 2, Funny

    Deadlock detected!

    1. Re:Deadlocked! by GIL_Dude · · Score: 1

      So it is a classic race condition to see which chicken gets to peck the corn.

    2. Re:Deadlocked! by dgatwood · · Score: 1

      Un-clucking believable. You spend all day pecking and poking data in kernel memory and all you have to show for it is a frozen chicken. Oh, well. At least it will taste good with soup and crackers.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

  3. Huh? by dreamchaser · · Score: 4, Insightful

    I didn't know the PS3 had thousands of cores ;)

    I think what he meant was 'each tracked in a separate thread'...obviously each core is still handling many threads. I haven't watched the presentation and don't plan on it until later today, too much to do and I'd rather read something about it. It just sounds like it provides an efficient high level way to write a multi threaded app. Evolutionary but not revolutionary?

    1. Re:Huh? by Grashnak · · Score: 1

      I didn't know the PS3 had thousands of cores ;) I think what he meant was 'each tracked in a separate thread'...obviously each core is still handling many threads. I haven't watched the presentation and don't plan on it until later today, too much to do and I'd rather read something about it. It just sounds like it provides an efficient high level way to write a multi threaded app. Evolutionary but not revolutionary? Heh, that was my thought too. I was like, "Fuck, what the hell are those chickens doing that it requires an entire core to direct each one? Those are some complex chickens!"
      --
      Life needs more saving throws.
    2. Re:Huh? by Gr8Apes · · Score: 4, Insightful

      Even so, this is a "bad" implementation. There's absolutely no reason for there to be 1 thread per chicken. That's inherently not scalable. What you really want are an optimum number of threads for the number of cores in a pool that handle work units (chickens). This will scale much higher than the 1 thread per object model discussed in this topic.

      Oh, and there's no such thing as "easy" multi-threading. Hell, the average programmer can't even grasp OO, so what makes them think they can grasp threading which has many many more aspects to it?

      --
      The cesspool just got a check and balance.
    3. Re:Huh? by dreamchaser · · Score: 3, Interesting

      Having written more than my share of threaded apps I agree 100%. I still haven't looked into this more, but it's probably a C++ class library that abstracts the creation and management of threads. Too many threads thrashes the processor nicely in many cases, so unless they have some magic behind the scenes managing the number of threads vs. cores then this is just a hyped up multi threading library.

      Fsck the chickens...show me what this does with a real game or a real world app that lends itself to highly parallel operations, then demo it on a quad quad core Xenon.

    4. Re:Huh? by Gr8Apes · · Score: 3, Interesting

      The chicken scenario described removed any curiosity I had about looking into the library further. Why? Because it's very similar to the Java 101 bouncing ball thread demo (one thread per ball) which is used to show why 1 thread per ball doesn't scale to first time would be multi-threaded programmers.

      --
      The cesspool just got a check and balance.
    5. Re:Huh? by grumbel · · Score: 4, Interesting

      ### That's inherently not scalable.

      Not scalable? I beg to differ. Thousands threads for sure scale are a lot better then when you just have two or four or whatever, since with thousands you don't really have an upper limit of how many CPU you want to throw at the problem. The real issue with threads is that OS threads are extremely slow, so you can't have thousands threads or your machine would go to a crawl. Threads also are painful to work with since the languages just aren't up to the task.

      However for both these issues there exist solutions, namely Erlang, using user-level threads there is no upper limits and you really can have each chicken have its own thread without a problem and the language is also build from the base up to work nicely with threads.

      Now I haven't yet seen the talk, bittorrent still busy downloading it, but I seriously doubt that it will just be yet-another-simple-wrapper class.

    6. Re:Huh? by Gr8Apes · · Score: 5, Informative

      First, last time I ran the ball test just to see how processors had improved in their capabilities to run code, I got to over 2K threads in a single JVM before significant degradation occurred and then it occurred rapidly.

      Using the threadpool concept, however, you can tune the size of the threadpool via performance metrics from the threads in the threadpool for the optimum size of threadpool, after which you can place however many objects on the pool you'd like. Generally, this is based on the work the thread has to do. If there is no I/O blocking, I've found that 2-3 threads per CPU with moderate CPU time work units will load it to 100% (read moderate CPU time work units as work units that take on the order of 100-1000 ms to complete). If you start adding in any type of I/O blocking, including large amounts of memory access, then that number goes up. A DB retriever system wound up running 64 threads for my particular work load due primarily to the lag involved in the synchronous calls made to the DB. I could have tuned that further using future tasks and reducing the number of threads (a Doug Lea addition to the JDK 1.5 and also available in his previous concurrency library) but my particular case didn't have any negative effects by running 64 threads, so we left it at that. This particular DB access module ran across 64 systems (64*64 threads) serving roughly 35K concurrent customers.

      I haven't run Erlang, so can't comment. I have heard nice things about it though, and I'm curious about it. One day I'll have enough time to play with it.

      --
      The cesspool just got a check and balance.
    7. Re:Huh? by Anonymous Coward · · Score: 0

      Well, given that the chicken demo did scale, aren't you curious as to what they are doing differently?

    8. Re:Huh? by barrkel · · Score: 1

      For an OS scheduler, yes, having one thread per object would be a bad idea. However, it's a very scalable model for soft threads, see e.g. Erlang.

    9. Re:Huh? by ruffnsc · · Score: 0

      Posting your speculation and not even looking at the video is lame.

      "I was going to give credence to your post but I am too busy right now so I may do it later."

    10. Re:Huh? by gbjbaanb · · Score: 3, Funny

      Well, given that the chicken demo did scale, aren't you curious as to what they are doing differently? Not using Java? :-)
    11. Re:Huh? by Gr8Apes · · Score: 1

      This library is for C++. I've had Erlang on my list to check out for a while now.

      --
      The cesspool just got a check and balance.
    12. Re:Huh? by Lars+T. · · Score: 1

      Even so, this is a "bad" implementation. There's absolutely no reason for there to be 1 thread per chicken. That's inherently not scalable. What you really want are an optimum number of threads for the number of cores in a pool that handle work units (chickens). This will scale much higher than the 1 thread per object model discussed in this topic.

      Oh, and there's no such thing as "easy" multi-threading. Hell, the average programmer can't even grasp OO, so what makes them think they can grasp threading which has many many more aspects to it? Well, when you propose a "scallable" solution where the code has to be compiled for each possible number of cores available, and you don't take any other threads that may run on the machine into account - of course multi-threading is hard.
      --

      Lars T.

      To the guy who modded me down from perfect to terrible Karma - Apple haters still suck

    13. Re:Huh? by poot_rootbeer · · Score: 1

      However for both these issues there exist solutions, namely Erlang, using user-level threads there is no upper limits and you really can have each chicken have its own thread without a problem and the language is also build from the base up to work nicely with threads.

      I wish you had multithreaded this sentence. It took too long for me to extract all the information from it as a single linear process.

    14. Re:Huh? by Gr8Apes · · Score: 1

      You can code self-tuning thread pools if you like. This is obviously more difficult than hard-coding specifics in, but it is certainly an attainable possibility.

      And not taking other threads on a machine into account is at least an order of magnitude easier than accounting for other threads/programs.

      --
      The cesspool just got a check and balance.
    15. Re:Huh? by Panaflex · · Score: 1

      Maybe he's chicken. *BWOK* *BWOK* *BWOK*

      Seriously though... 10 years ago I could scale to a hundred threads pretty easily... what's so difficult about a few thousand threads on processors that are thousands of times more effecient and faster.

      The issue isn't can I do a thousand threads - but can I do it simple AND effeciently. The answer is no - unless you know what you're doing. There is no "general model" that fits. Or you use something that already has a good model and your problem fits in that model.

      --
      I said no... but I missed and it came out yes.
    16. Re:Huh? by pmedwards · · Score: 1

      It's not _inherently_ non-scalable. The problem is of the same order of complexity if the thread scheduler stores and switches states or the program
      does it explicitly.

      It tends to be slower to use a preemptive thread scheduler, as it has less information than the application program, and the effort of switching the
      thread states is often greater than the effort of explicitly yielding to another task at the application layer -- and possibly because of such secondary benefits as having fewer thread stacks tending to result in greater cache efficiency and less memory usage, but none of this is inherent or invariant, just currently a tendency.

      Thread schedulers in the real world are designed for current real world usage and existing CPU architectures. New paradigms can change that.

    17. Re:Huh? by joto · · Score: 2, Interesting

      Thousands threads for sure scale are a lot better then when you just have two or four or whatever, since with thousands you don't really have an upper limit of how many CPU you want to throw at the problem.

      Yes, the upper limit is thousand(s)! Go directly to jail. Do not pass Go. Do not collect $200.

      Seriously, with companies already offering 4 cores per CPU, and promising to offer 16 cores in the near future, and Moores law being as it is, you don't exactly have to be a visionary to predict that the future might bring a shitload(TM) of processor cores to somewhere in your vicinity. Note that a shitload(TM) is more than a few measly thousands. Oh, and before you start telling me that nobody needs a shitload(TM) of processor cores, remember that nobody needed more than 640K RAM either.

      The real issue with threads is that OS threads are extremely slow, so you can't have thousands threads or your machine would go to a crawl.

      OS threads aren't necessarily slow (I assume you mean switching between them). If this is at all true, it is an artificial limitation of current hardware/software combos that can be easily fixed (at least the fix is much easier than the work involved in creating shitload-core CPUs). Note that the cost of OS-threads, user-space threads, and OS processes vary wildly among different systems already. But shared memory really needs to go. It just doesn't scale.

      However for both these issues there exist solutions, namely Erlang, using user-level threads

      Last time I checked, Erlang used only user-space threads, meaning that even if you had a shitload(TM) of cores, a given Erlang program would only use one of them. Erlang focuses on modelling, not performance. I suspect there to be good ideas in Erlang, but it's not going to be the system programming language of the next century.

    18. Re:Huh? by Raffaello · · Score: 1

      I agree with much of your comment, but please realize that Erlang was designed for, and is now used in soft real time telecom switches. When the number of processes becomes very high (in the thousands) erlang performance does not degrade unlike threaded systems which often do when the number of threads get this high. So erlang does focus on performance, just a very specific kind of performance - scalability to thousands of processes.

    19. Re:Huh? by Gr8Apes · · Score: 1

      And therein lies one of the issues of MT and why MT programming is a more difficult topic, than say, OO. It's no longer just about how you write code and your chosen language's compiler, but about the actual target machine(s) you'll be running on. MT includes the larger picture of your computing environment. OO is relatively invariant regarding computing environment.

      --
      The cesspool just got a check and balance.
    20. Re:Huh? by Jeremi · · Score: 1
      The chicken scenario described removed any curiosity I had about looking into the library further. Why? Because it's very similar to the Java 101 bouncing ball thread demo (one thread per ball) which is used to show why 1 thread per ball doesn't scale to first time would be multi-threaded programmers.


      So the chicken demo (allegedly) shows that this guy has solved the thread scaling problem, and because of that you're not interested in it?


      Seems like you're not interested in anything but confirming things you already "know".

      --


      I don't care if it's 90,000 hectares. That lake was not my doing.
    21. Re:Huh? by Procyon101 · · Score: 4, Insightful

      You are using JVM threads. Most massively scalable threaded languages, like Erlang, use green threads. A green thread acts like a thread from the standpoint of the programmer, but carries little or no context switch cost (because it's not really a thread). The underlying platform then load balances these green threads across the actual hardware in an optimal pool of true threads.

      What makes these programming languages easy to grasp the massive concurrency of is one of 2 things:

      1) In Erlang and Termite (A scheme dialect) there is no mutable state, and no globals. Every function is in essence a "service" that simply gets messages and then responds with replies. There is no need to think about locking in such a system and very easy message passing idioms to do what you would normally do with mutable object orientation.

      2) In languages like Haskell, there is no concept of a "thread" at all... not even a single thread. There is no concept of "ordering". Things are defined as they are in mathematics.. as relationships between functions and variables. There is no mutable state allowed. This strictness allows the compiler to make very deep conclusions as to what can be parallelized. The compiler can then load balance under the covers across any number of procs without exposing any issues of concurrency to the user at all.

      So yes, in Java (and OO in general), concurrency is very, very difficult. In other paradigms though it can be trivial, or even transparent.

    22. Re:Huh? by Anonymous Coward · · Score: 0

      "I got to over 2K threads in a single JVM before significant degradation occurred and then it occurred rapidly. [...] I haven't run Erlang, so can't comment."

      Then why *are* you commenting? And you mods, why are you modding it up?

      Just because Java can't do something, doesn't mean it's not possible.

      Wikipedia: "Creating and managing processes is trivial in Erlang, whereas threads are considered a complicated and error prone topic in most languages." "Erlang processes are neither OS processes nor OS threads, but lightweight threads somewhat similar to Java's "green threads". They are, as a result, extremely lightweight (the estimated minimal overhead for each is 300 bytes) and many of them can be created without degrading performance (a benchmark with 20 million processes was tried[1])."

    23. Re:Huh? by Raenex · · Score: 1

      Last time I checked, Erlang used only user-space threads, meaning that even if you had a shitload(TM) of cores, a given Erlang program would only use one of them. Time to check again.
    24. Re:Huh? by Anonymous Coward · · Score: 1, Interesting

      Actually, Haskell has both threads and mutable state. However unlike Erlang, it also has a strong static type system which can isolate stateful bits from purely functional bits. I.e. you still get the benefits of purity (a function is always safe to evaluate on any thread in any order), while having the option to use e.g. shared state concurrency where it makes sense.

      Haskell threads are lightweight as well (and get distributed to the number of cores you specify on startup time), and they can also use software transactional memory (which eliminates the need for locks and other non-composable concurrency abstractions with a very simple programming model). Plus, Haskell is compiled, unlike Erlang, and will typically perform better as a result.

      Haskell is the best language I currently know of for concurrency and parallelism. There's room for improvement though. Perhaps I should say "least broken" rather than "best" to emphasize how pathetically late to the game we on the software side are on this issue. There's tons more to do, but I'd recommend everyone interested in these issues to take a gander at Haskell (specifically STM).

    25. Re:Huh? by Procyon101 · · Score: 1

      So, how does STM compare to Erlang for:

      1) Cross machine boundary pooling of processes
      2) Process failure recovery
      3) Runtime swapping of code

      I am very interested in both languages (and am learning Haskell). I like Haskell for it's purity, and Erlang for it's lightweight and failable processes. Can the same model be implemented in Haskell?

    26. Re:Huh? by Anonymous Coward · · Score: 0

      Posting your speculation and not even looking at the video is lame.

      Cut the guy some slack. I started downloading the video over nine hours ago, and it still isn't finished. What are folks supposed to do in the meantime, post in other threads? Work? Go outside? Get real. I've made eight posts :-) Hope I don't feel stupid when I finally see the video.

    27. Re:Huh? by bluefoxlucid · · Score: 1

      Inherently not scalable? The OS handles each individual thread and proper scheduling; your other option is to use a kludge in the program to handle "threads" at the application level, with uneven threading. Maybe some group of X chickens scheduled by OS, with scheduling within that group-- so if X.y needs more CPU than Z.n, it will have to get it from ... other processes in the X group. Or decrease all of Z. Either way is not very scalable. This makes me question if you have any idea what in the fuck you're even talking about.

    28. Re:Huh? by Anonymous Coward · · Score: 0

      No, not currently. Haskell isn't currently well suited for distributed apps (though look at GdH for some research into it). If you do need to use those very specific features of Erlang because you happen to be in the 0.00001% of the market that requires it, then Haskell isn't for you, currently.

      OTOH if you're writing apps that currently don't use more than one core because the app itself isn't very "concurrent", then Haskell is excellent, or if you're writing an app which *is* concurrent, but in a context where you don't want transparent messages across the network.

      I think Haskell, or perhaps a lenient or even strict version of it, will be *very* suited for programs which have traditionally be written in a single threaded way, but now will need to use many cores to scale without re-engineering or recompilation when users get newer CPUs (and thus more cores).

      Haskell is also getting nested data parallellism (like in TFA, but *much* more usable since it's nested -- it works on arbitrary recursive structures). So you have: 1) Excellent explicit concurrency with STM for shared state when needed but probably with message passing most of the time 2) Excellent parallellism due to pure semantics, where you can just plug in the "par" operator to run stuff on multiple threads automatically and safel 3) Nested data parallellism for automatic parallelism over data.
      And that's pretty much why I think Haskell is the least broken language so far. It has a multi-pronged approach to helping the programmer make the most of multiple cores, not just one silver bullet (like erlang's message passing concurrency).

    29. Re:Huh? by Gr8Apes · · Score: 1

      Go read up on threadpools.

      I suppose I can expect a genuflecting apology when hell freezes over, but it makes you no less arrogantly wrong.

      --
      The cesspool just got a check and balance.
    30. Re:Huh? by shutdown+-p+now · · Score: 2, Informative
      Have a read please. As it goes, when it comes to multithreading, the model used by C++, Java and similar languages is rapidly becoming outdated.
    31. Re:Huh? by Gr8Apes · · Score: 1

      Many people already pointed towards Erlang. Note that this is a C++ library we're talking about so my comments are spot on.

      --
      The cesspool just got a check and balance.
    32. Re:Huh? by shutdown+-p+now · · Score: 2, Insightful

      A C++ library could simply implement Erlang semantics on top of C++.

  4. Bah humbug by kahei · · Score: 2, Insightful


    Multithreaded development is commonplace in applications that need it. The places it's not common in are:

    -- old-style Unix development, because of the 'lightweight process model'. It's a unix-ism that's on the way out but until it disappears we will have some things like Ruby that don't 'get it'.

    -- places that have absolutely no need for it, which certainly includes the chicken demo. One core per chicken?? Seems more like the guy just discovered threads but hasn't quite grasped what they're for.

    --
    Whence? Hence. Whither? Thither.
    1. Re:Bah humbug by Anonymous Coward · · Score: 0

      No, distributing the workload for animated objects among threads is a standard programming technique, and is especially valuable when the threads can be allocated to separate CPUs or processing cores. He didn't just discover threads, but maybe you're just discovering animation.

    2. Re:Bah humbug by ari_j · · Score: 3, Insightful

      You are, of course, correct. The other thing that people need to keep in mind is that there is rarely only a single process running on a given machine. For applications where it makes sense, such as video rendering on a machine doing nothing else, multithreading can increase overall performance. For applications where it doesn't or where there are other things running on the same machine, you normally end up with worse overall performance by trying to get your naturally single-threaded program to run on multiple cores at once when the extra cores would be better dedicated to running things other than your program.

      Multithreading is a tool. Just like more traditional tools, like the hammer, this one is useful for certain applications. But multithreading is not the only tool at your disposal - people need to stop looking at everything as if it were a nail.

    3. Re:Bah humbug by aldheorte · · Score: 3, Insightful

      "old-style Unix development, because of the 'lightweight process model'. It's a unix-ism that's on the way out but until it disappears we will have some things like Ruby that don't 'get it'."

      I'm not sure I follow you there. Lightweight process models are perfect for multi-cores. The more the merrier. Given the andundance of high-quality networking and commodity machines, heavyweight programs outside of very niche areas that use internal threads are less suitable for distributed computing than lighteight process models that can call across the network or the OS to other lightweight processes. A heavywight process can only scale to the number of cores avaiable on the machine it is running on, whereas a flock of lightweight processes can scale to the locally available cores and onto to other machines in a distributed fashion without a major bump in the road between local and remote. Any machine that has multi-cores today could easily run, say, one Ruby process per core with negligible overhead.

    4. Re:Bah humbug by tepples · · Score: 1

      The other thing that people need to keep in mind is that there is rarely only a single process running on a given machine. For applications where it makes sense, such as video rendering on a machine doing nothing else Or playing a video game, right?

      multithreading can increase overall performance. For applications where it doesn't or where there are other things running on the same machine, you normally end up with worse overall performance by trying to get your naturally single-threaded program to run on multiple cores at once when the extra cores would be better dedicated to running things other than your program. That may be true of servers, but on desktop machines, there is one foreground task. In order to use more than one core, it needs more than one thread. Or is this part of an elaborate joke whose punchline is that 100 percent of one of the two cores in a dual-core CPU is devoted to antivirus software?
    5. Re:Bah humbug by kcbrown · · Score: 4, Insightful

      -- old-style Unix development, because of the 'lightweight process model'. It's a unix-ism that's on the way out but until it disappears we will have some things like Ruby that don't 'get it'.

      And it's silly for it to be "on the way out".

      Anyone remember the Amiga? It had a preemptive multitasking OS that lacked hardware memory protection because the hardware it was running on couldn't support it. And while the OS itself was very fast and efficient, the overall system was relatively crash-prone, because any memory-related programming error in any running application had a decent chance of taking down the system.

      Fast forward to today. Every computer sold has hardware memory protection built-in. Anyone who doesn't know why that's a good thing needs to spend time on an Amiga.

      And yet, despite that, threads are all the rage. Why? Because people have this idiotic belief that they're somehow "more efficient" than processes. Such people probably program about as well as they think, which is to say not very well. Threads are indeed more efficient at context switching than processes, but the real question is: does that really matter? In the vast majority of cases, it doesn't, because in the vast majority of cases multiple threads are being used to make the user interface responsive. There's no way a human being can tell the difference between a millisecond-level context switch time and a microsecond-level one.

      On top of that, processes bring one critical advantage to the table that threads don't: memory protection. And for the same reason memory protection is important at the OS and hardware level, so too is it important at the process and thread level: it allows clean, protected separation of concern and greater overall application stability.

      The vast, vast majority of applications that are multithreaded don't actually need the slight additional context switch performance advantage that threads bring to the table, but they very much need the memory protection facilities that processes bring to the table. Which is another way of saying that if your application needs concurrency, you're a fool if you blindly use threads instead of processes.

      Even Windows supports fork() these days, with the POSIX subsystem (available, as far as I know, on any Windows 2000 and later system), so creating a clone of your current process is dirt simple even under Windows. End result: application authors have no good reason to use threads over processes unless they've actually done the math and can prove that their application really needs the slight performance advantage of threads more than the significant reliability advantage of processes.

      As to the other reason for using threads, the sharing of memory, there's this really cool new technology out these days. Maybe you've heard of it. It's called "shared memory". It's only been available for 20 years or so. No wonder most people haven't heard of it. Being forced to explicitly declare what's shared and what isn't is a good thing, because it makes you program easier to maintain, easier to debug, and more reliable -- all at the same time.

      The bottom line is this: if you need concurrency in your application, you should be using processes, not threads. If you insist on using threads, you'd better have a damned good reason for it, because the reliability implications of threads are hugely negative while the performance implications are modest at best.

      --
      Use 'slashdot stuff' in the subject line in any email you send me if you want to get past the spam filter.
    6. Re:Bah humbug by Anonymous Coward · · Score: 0

      If one thread can run on core A and one thread can run on core B, what makes you think one process can't run on core A and another on core B?

      An application doesn't have to be one process -- that's the point of the discussion. Threads made much more sense on VMS, where starting a process was expensive, than on any Unix. Windows is pretty efficient at starting processes, too, so multi-process makes some sense there too.

      That's not to say that multi-threaded is wrong. Multi-threaded applications and multi-process applications both have their uses.

    7. Re:Bah humbug by Anonymous Coward · · Score: 0

      No, distributing the workload for animated objects among threads is a standard programming technique
      ...the standard bad programming technique.
      you don't need a dedicated thread-per-object to animate an arbitrary mesh.
      It's not at all necessary and not at all scalable.

      It is common practice -- just as mis-use of OO techniques is common practice.
    8. Re:Bah humbug by odourpreventer · · Score: 2, Insightful

      Please correct me if I'm wrong, but it seems to me this discussion has gone into apples and oranges mode. Threads, as far as I'm aware, are supposed to be used for single, explicit tasks and always under supervision by a parent thread. I've used multi-threading with excellent results, but then I've taken pains to ensure that the threads don't have any privileges whatsoever. Processes, on the other hand, are more like stand-alone programs working in the same context.

    9. Re:Bah humbug by Anonymous Coward · · Score: 0

      For heavy computational programming, threads offer no benefits. For interactive applications, the ability to split the user interface and backend processing into two or more threads is a clear advantage. Threads also have a clear advantage in other areas I.e. in a producer/consumer model the consumer can be multi-threaded, allowing it to process more than one input at a time. Sure, you could have multiple consumers running as processes but the synchronisation involved will be just as complex as the threaded consumer; possibly more so because of the lack of shared context between the consumers.

    10. Re:Bah humbug by deletedaccount · · Score: 1

      Nice rant :)

      But pssst, the gazillions of java based application servers in the world would like a word with you.

    11. Re:Bah humbug by ari_j · · Score: 1

      There are certainly cases where the foreground task can properly make use of multiple threads. However, it takes a great deal of burying one's head in the sand to think that the foreground task is the only program doing anything related to the user experience on a desktop machine.

    12. Re:Bah humbug by br0k_sams0n · · Score: 1

      Sorry, those application servers will have to wait until Apache mpm-worker is done destroying mpm-prefork benchmarks. Sheesh, I guess all those developers working on the most popular, successful Web server ever have a lot of learning to do.

      Threads have been a far more efficient way to write socket-based servers for decades. Or, perhaps now you plan to re-write history and know something the late, great R. Stevens didn't. Suggest the OP read his detailed analysis and benchmarks starting here:

      http://www.amazon.com/Unix-Network-Programming-Vol -Networking/dp/0131411551/ref=pd_bbs_sr_1/104-5537 299-2831938?ie=UTF8&s=books&qid=1174574811&sr=8-1

    13. Re:Bah humbug by tuskentower · · Score: 2, Interesting

      Huh? Threads and processes are two different animals.
      Since you need a reason, here's one, its called concurrency. With processes I have to consume finite system resources to handle concurrency issues or role my own, which is called reinventing the wheel (aka waste of time). Thread libraries will do this for me.

    14. Re:Bah humbug by porlw · · Score: 1

      The main reason that multi-threaded programming is the in thing is that process creation on Windows is many times slower than thread creation. You can see this if you compare the performance of a complex shell script on Unix vs. Windows. See also Apache 1 vs Apache 2.

      Right now C (and other iterative languages) are starting to look like assembler was in the 50s and 60s, lots of people insisting that the only way to get decent performance was to program at the lowest level possible. As the number of cores increases we're going to have to accept that the shared-nothing approach of functional style languages will be better suited to exploit these resources, even if they are slower on smaller scale SMP systems.

    15. Re:Bah humbug by kcbrown · · Score: 1

      Sorry, those application servers will have to wait until Apache mpm-worker is done destroying mpm-prefork benchmarks. Sheesh, I guess all those developers working on the most popular, successful Web server ever have a lot of learning to do.

      Look, the point wasn't that threads are always a bad idea. The point is that if you're going to write a concurrent application using threads, you should have a really good reason for making use of them instead of processes. Processes should be the default.

      In the case of Apache, yes, it benchmarks better (perhaps even significantly better) with a threaded server than with a multiprocess server. That will matter to some people who deploy web servers. But only to some.

      To most people, the amount of performance gained by using the threaded Apache server will make no difference whatsoever to them. Those people will be better served by the improved stability and isolation that mpm-prefork brings to the table.

      As for the Apache developers, they wrote the multithreaded server in response to demand for improved performance in a high-traffic environment. In other words, in a situation in which the advantages of threads really do matter. Such situations are rare in practice. And on top of that, the preforking server had already been written and tuned so its performance characteristics were already well-known. At the time the multithreaded server was written, it was no longer an either-or choice as to which model to go with.

      So I stand by what I said before: you should be using processes unless you can make a convincing argument that you really do need the context-switch performance benefits of threads more than the reliability and isolation that processes give you.

      --
      Use 'slashdot stuff' in the subject line in any email you send me if you want to get past the spam filter.
    16. Re:Bah humbug by itlurksbeneath · · Score: 1

      The bottom line is this: if you need concurrency in your application, you should be using processes, not threads. If you insist on using threads, you'd better have a damned good reason for it, because the reliability implications of threads are hugely negative while the performance implications are modest at best.

      You haven't done a lot of GUI programming, have you?



      From your text above, I'd guess you worked in Microsoft for the Outlook programming team.

      --
      Have you ever considered piracy? You'd make a wonderful Dread Pirate Roberts.
    17. Re:Bah humbug by 644bd346996 · · Score: 1

      You seem to be offering evidence in support of his position. You act as though the performance difference between threads and processes is the primary factor contributing to the difference you have percieved between Windows and Linux desktops. It is not. Most of the time, when windows programs are faster, it is because they do less. On Linux, simple apps are run by slow, complex language interpreters (eg. cpython) and at any given moment there can be half a dozen gui libraries in use, with all the necessary abstraction layers to arbitrate between them. Every step of the way, there are multiple ways of doing things. That comes at a cost, especially compared with an operating system that essentially can't do many things, or can only do them with compromised security.

    18. Re:Bah humbug by hamanu · · Score: 1

      Yep threads are faster, but what about stability and security? If there's a buffer overflow attack against apache with pre-forking can the attacker see everbody else's credit card numbers at the end of the SSL stream where it's been decrypted? Nope. But they could see them if you're running the multi threaded server, since there is no protection. And if the multithreaded server has a single thread die, they all die.

      --
      every _exit() is the same, but every clone() is different.
    19. Re:Bah humbug by gbjbaanb · · Score: 4, Interesting

      Unfortunately you say processes have their own memory protection which is better than threads that have to do their own synchronisation when accessing shared memory, but then go on about process-based shared memory needing its own additional protection.

      If you need concurrency in your apps, there isn't that much between threads and processes. However, if you need interprocess-communication then you are far better off with threads, they are significantly faster wrt locking than processes as all process-based locks must be done at the OS level, using shared (and finite) system resources. Threads can just use a critical section and have done with it, almost no overhead.

      Threads are not more efficient at context switching than processes, the same procedure happens whether a thread is switched, or a process is (in fact, a process is really an app with 1 thread). However, as threads can share memory more efficiently, locking is often not needed as much so they appear to be more efficient.

      The best argument for threads v processes is Apache. Personally, I agree with the Apache group that Apache 2 with its thread-based model is better. They should know.

    20. Re:Bah humbug by Viol8 · · Score: 1

      The main problem with shared memory between processes compared to using shared globals between threads is

      1)with shared memory you need to specifically create it which requires finding unique keys that don't clash with
          anything else - not a big deal but extra hassle nonetheless
      2)unless you want to create little bits of shared memory all over the place (eg 4 byte integer sizes) to emulate normal
      variables you'll have one big block of the stuff. And with one big block you have to make damn sure you get your demarcation right and you don't have processes writing to or reading from sections they shouldn't. This of course is in the same problem realm as buffer overflow issues with arrays
      3) shared memory has no intrinsic locking abilities unlike posix threads (unless you set aside certain bits to be read/write flags). This means you have to *spit gag* use SysV semaphores. A more counterintuitive API I yet to have the misfortune to use. Whoever designed it should be shot.

      Also , fork() can be quite expensive. Not a problem if the process you're firing off will be running for hours but it is an issue if the process simply (lets use Apache as an example) reads from a socket , spits some data back down then quits.

      In general I myself would use multiprocess instead of multithread but not always.

    21. Re:Bah humbug by newt0311 · · Score: 1

      Ummm... In Linux, the kernel doesn't differentiate between threads and processes. Everything is just a "task." Thus the argument for using threads due to context switch requirements is incorrect. With mmap and FIFOs, memory sharing within processes (among other things) becomes trivially easy and thus ease of IPC is also a horrible argument to use. Therefore, in Linux, there is no point in using actual threads except laziness. Case in point, postgres spawns a different process for every connection and uses memory mapped files to take care of table access. It is extremely fast, especially in environments with several thousand connections and more forming every second and extremely reliable. Note also that I have sometimes crashed postgres connections by inserting incorrect pieces of C code (accidentally) and requesting execution but that has never resulted in any problems in any of the other connections or tables. Note that in windows, process creation/duplication is a very slow process compared to threads (in Linux, its still pretty fast). Thus there may actually be an argument to use them there but some people re-implemented postgres' connection strategy to use threads instead of processes in windows and did not notice any appreciable performance increase.

    22. Re:Bah humbug by wfeick · · Score: 1

      Threads are more efficient than processes for context switching because you don't have to switch all of the memory management state. It's been a while since I worked on Unix kernel context switch code (back when the 486 was common) but back then switching to a different process' page tables meant flushing the processor's paging caches.

      Switching threads only requires saving the registers of one thread and restoring the registers of another.

      Big difference.

      Somebody educate me if this is more efficient in modern processors.

    23. Re:Bah humbug by kcbrown · · Score: 0

      If you need concurrency in your apps, there isn't that much between threads and processes. However, if you need interprocess-communication then you are far better off with threads, they are significantly faster wrt locking than processes as all process-based locks must be done at the OS level, using shared (and finite) system resources. Threads can just use a critical section and have done with it, almost no overhead.

      Unfortunately for your argument here, there is no difference between processes and threads in this regard (with one possible exception. See below). A critical section makes use of a synchronization mechanism like a semaphore at entry and exit time to ensure that only one thread of execution (whether that be a process or a thread) is running in the critical section at any one time.

      A process can make use of the same synchronization mechanisms that a thread can here. So the thread gets no advantage in this respect.

      The only case where that isn't true is when the threading mechanism is strictly user-space (no kernel support), but in that case the process itself is doing its own internal thread switching. And obviously, such a setup can't make use of multiple processors at the same time. There's a version of pthreads that does this, on systems which don't have support for kernel-level threads.

      That is, of course, unless the kernel threading interface has an explicit way of denoting critical sections that isn't available to processes. I suppose that's possible, but it's obviously not going to be true on all systems.

      Regardless, the amount of overhead in making use of locking mechanisms for accessing shared memory in the process model, and thus the amount of performance that can be gained by using the threaded model, is usually (but not always) trivial in comparison with the total overhead of the application. Apache is something of a special case here, because its purpose is to act as a framework rather than an application in its own right.

      Threads are not more efficient at context switching than processes, the same procedure happens whether a thread is switched, or a process is (in fact, a process is really an app with 1 thread). However, as threads can share memory more efficiently, locking is often not needed as much so they appear to be more efficient.

      Threads generally use less resources for a context switch than processes, but this depends greatly on the hardware architecture and the OS. In the general case, there is less state to be saved and restored upon a context switch for a thread than there is for a process, and some of that state is expensive to save and restore. The fact that the page tables don't have to be switched out is in some cases a rather large win for threads.

      The best argument for threads v processes is Apache. Personally, I agree with the Apache group that Apache 2 with its thread-based model is better. They should know.

      It just depends on what's important to you. The process model buys you much better isolation between contexts, so if one thread of execution scribbles on the wrong bit of memory it won't affect any other threads of execution (unless the bit of memory is explicitly shared, which generally won't be the case for Apache processes), but under the multithreaded server it will affect all of them. That is a big, big reliability win for the multiprocess model.

      And that's just for Apache. There are definitely some Apache installations that get so much traffic that going with the threaded version instead of the multiprocess version is a big win. But those installations are the exception, not the rule, and by doing so they give up the isolation and protection that the multiprocess model brings to the table. They may get better performance, but unless their code is very well written and debugged, they'll get worse stability as a result. Performance is almost always much easier to buy than stability: just buy more and/or faster hardware.

      --
      Use 'slashdot stuff' in the subject line in any email you send me if you want to get past the spam filter.
    24. Re:Bah humbug by Bluesman · · Score: 3, Informative

      Nope, still the same. The OS has to flush the TLB when it switches processes, which is the cache for virtual memory address lookups.

      This and the reduced startup time are the most compelling reasons to use threads instead of processes on a single core.

      However, on a large number of cores, things aren't so clear-cut, since if you have as many cores as active processes, you're not doing the context switching as much, and the benefit of using threading to reduce cache flushes isn't so clear. You'd still benefit from the quick startup of threads, so for things like a highly concurrent web server that creates a thread per user, threads may still be a better solution.

      Interestingly, the much maligned cooperative threads (user-space) are the fastest of all since the programmer can control when the context switch happens. However, if there's blocking or an infinite loop, the whole application will hang. You have to use asynchronous I/O and make sure no thread runs for too long.

      Like most things, it's a trade off between protection from various mistakes and errors vs. speed and control. Processes give you the most protection with the greatest amount of overhead, while user level threads give you the best performance, but only if you design everything correctly.

      --
      If moderation could change anything, it would be illegal.
    25. Re:Bah humbug by kcbrown · · Score: 1
      I wrote:

      A process can make use of the same synchronization mechanisms that a thread can here. So the thread gets no advantage in this respect.

      This appears to be where I may be wrong. Pthreads has its own mutex mechanism which has direct kernel support in Linux. That mechanism isn't necessarily available across processes.

      I would be surprised if they're not usable across processes, though, since in Linux a process and a thread are both simply special cases of the more generic "task".

      Even so, the locking overhead likely represents a very small percentage of the overall runtime of most applications. Lower hanging fruit almost certainly exists, so my original point (that one should prefer processes to threads in the general case) stands.

      --
      Use 'slashdot stuff' in the subject line in any email you send me if you want to get past the spam filter.
    26. Re:Bah humbug by ceswiedler · · Score: 1

      You're certainly right that shared memory is more of a pain in the ass than threads, but two of your points are wrong.

      1) you can memory-map files instead of using SysV shmget.

      3) you can put pthread mutexes and condition variables in shared memory instead of using SysV semaphores.

      Your point #2 is correct, but that's the same thing the grandparent was saying--you get memory protection, meaning that you have to explicitly share data. The biggest pain is that you can't put pointers into shared memory since it will be mapped to different locations in different processes, so you have to use offsets and calculate pointer values per-process.

      Creating processes on Linux is exactly the same overhead as threads (they both use the same syscall, clone()) and both are comparable in performance to creating threads on Windows and Solaris.

      Shared memory isn't perfect for every problem (go figure, what is?), and are definitely more difficult to use, but they do offer better protection and robustness.

    27. Re:Bah humbug by Doctor+Memory · · Score: 1

      Also , fork() can be quite expensive. Not a problem if the process you're firing off will be running for hours It's generally only expensive if you try to use the resulting process like a thread. If you use it for the purposes it was originally designed for (create a subprocess and execute a new program within it), then it's quite reasonable. Especially with today's machines that can use hardware copy-on-write support to avoid having to duplicate the parent's address space.

      but it is an issue if the process simply (lets use Apache as an example) reads from a socket , spits some data back down then quits. Exactly.
      --
      Just junk food for thought...
    28. Re:Bah humbug by Explo · · Score: 1

      Anyone remember the Amiga? It had a preemptive multitasking OS that lacked hardware memory protection because the hardware it was running on couldn't support it. And while the OS itself was very fast and efficient, the overall system was relatively crash-prone, because any memory-related programming error in any running application had a decent chance of taking down the system.

      You probably know this, but for the benefit of random readers I'd like to mention some trivia: While there eventually was Amiga hardware that could have supported the memory protection (A3000, A4000 and actually pretty much any other models if fitted with a 3rd party expansion card, commonly known as turbo cards), the OS wasn't upgraded to make use of it accordingly because it would have broken compatibility with some existing features, including the message passing mechanism between processes, thus breaking a significant portion of existing applications badly.

      (Yes, I realize that this is a bit off-topic :)

      --
      Everyone who makes generalizations should be shot.
    29. Re:Bah humbug by kroepoek · · Score: 1

      There are *lots* of reasons why threads should be used. Suppose your program has a GUI and needs to do some I/O, but want to be available to the user while doing it. The solution: queue all I/O operations and let them run in a worker thread. Suppose you use a library that has it's own event loop, and you want your process to be doing other things (for instance running a torrent) while it's waiting for those events. Solution: call the library function from the main thread, and do your other stuff from another. Suppose your program needs to decode and encode multimedia (audio/video/both) rapidly, for some sort of (post)production or just entertainment, then being able to use multiple cores in a process is *very* useful. Suppose your program is a kernel, then you don't have a process to begin with ;) (You could still "outsource" work to userland apps, but only when a userland app is capable of dealing with the task).

      You may get by with using only processes most of the time, but you still need to have synchronisation. If you do all your I/O in seperate processes, then you still need to transfer all the writes /to/ the other processes, and the reads /from/ the other processes. That sounds like unneccesary overhead to me. Your proposed solution has problems of it's own. When your programs use shared memory, then not only can /your/ programs access it, but /other/ programs (atleast those within the same security context) as well. This can open a lot of security holes you'd rather not have to deal with.

      And about fork(), I use GNU/Linux and I don't dual boot but I do write win32 programs for fun (and profit). There is no fork() in win32, and there's no POSIX (or OS/2) subsystem in Windows XP (and later). (They introduced the system with NT4 IIRC, not with 2000.) Even if they hadn't dropped it, it would be useless for programs running under the win32 subsystem. I just looked at the cygwin implementation, and it creates a child process, so handles with a security descriptor set not to inherit handles, or handles without security descriptors (that aren't inherited by default) will break, so it's not a true fork() (according to posix documentation, man 3posix fork). There may be handles of this type or other state data held in DLLs that will not seamlessly be copied over without breaking them too.

      With single-threaded processes, the user still wins when running multiple such programs at once. Especially when running multiple desktops simultaneously (which is getting increasingly popular, both over the network and locally) multiple cores have a clear advantage.

    30. Re:Bah humbug by Anonymous Coward · · Score: 0

      You are, of course, correct. The other thing that people need to keep in mind is that there is rarely only a single activity done by a given person. For applications where it makes sense, such as the 100m dash by a person doing nothing else, using both legs can increase overall performance. For people who aren't or who have other things to do that day, you normally end up with worse overall performance by trying to get your naturally one-direction locomotion to use both legs at once when the extra leg would be better dedicated to doing things other than your movement.

      Walking with two legs is a tool. Just like more traditional tools, like the hammer, this one is useful for certain applications. But walking is not the only tool at your disposal - people need to stop looking at everything as if it were a race.

    31. Re:Bah humbug by try_anything · · Score: 1

      If you're willing to give up a little bit of performance for a lot of reliability, why not use a language with a few safety and concurrency features built in? Then you don't have to worry about code scribbling all over memory, and you have to worry less about protecting shared resources. Shared memory isn't much protection against either of those. If a process needs to maintain any state at all, then putting incorrect data into its shared memory space will put it off on the wrong track, requiring a shutdown or really complex recovery logic. Using multiple processes with shared memory doesn't do anything to decrease the likelihood of poor protection of shared resources, either, which is the most common cause of concurrency issues.

      The difference between processes and threads isn't very important compared to the quality of the program design and the safety of the programming language. In practice I think the biggest advantage of using processes is that the awkwardness of using shared memory motivates lazy designers to come up with better designs with less sharing.

    32. Re:Bah humbug by gbjbaanb · · Score: 1

      cheers. 1 thing, if you have multiple cores with independant threads, then you're right about the overhead of context switching being better. Once you add locks between these threads then context switching becomes much worse (as the chance of hitting a locked lock is greater than when running on a single CPU)

    33. Re:Bah humbug by gbjbaanb · · Score: 1

      On Windows a critical section is a 'lightweight synchronisation object' - ie it is only available within a process so only applies to threads. If you want to synchronise betwen processes you need a kernel object whether that's a semaphore, event or a mutex. I know Solaris has 'local mutexes' which I assume are the same kind of thing.

      Locking a critical section is almost a no-op because its in-process only, whereas locking a mutex takes quite a bit of time relatively speaking. (ah numbers: 9800 cpu cycles for a mutex, 112 for a critical section without any contention occurring.)

      Locking overhead can introduce very significant overhead, when used obviously. We had a server app that spent more time locking and switching than it did performing useful work! I wouldn't be surprised if more people start coding up threads unnecessarily now we're in the world of multiple cores and find that their apps go slower and become more unreliable.

    34. Re:Bah humbug by Foolhardy · · Score: 2, Informative

      A critical section makes use of a synchronization mechanism like a semaphore at entry and exit time to ensure that only one thread of execution (whether that be a process or a thread) is running in the critical section at any one time.
      On Windows at least, a critical section requires no kernel object, and only a few instructions with no syscall to acquire and release as long as there is no contention on the object. If, while entering the section marked as already owned, a kernel notification event is created for waiters to sleep on. A kernel mutex OTOH, always requires a kernel object and a syscall for both acquire and release. Syscalls are quite expensive, making critical sections much faster in most cases. A design involving a large number of small lockable objects with rare contention would benefit from being able to use them in particular. I know that Solaris also has lightweight mutexes that can't be shared between processes, and I assume they avoid syscalls in most cases as well.
    35. Re:Bah humbug by try_anything · · Score: 1

      Creating processes on Linux is exactly the same overhead as threads (they both use the same syscall, clone()) and both are comparable in performance to creating threads on Windows and Solaris.

      There are a bunch of flags passed to clone() that specify which aspects of the task are shared between the old and new tasks and which ones aren't. I imagine there's a performance difference between "all shared" and "all different." But I'm just guessing; please comment.

    36. Re:Bah humbug by DaleGlass · · Score: 1

      The best argument for threads v processes is Apache. Personally, I agree with the Apache group that Apache 2 with its thread-based model is better. They should know.


      Are you joking? That switch was one of the main reasons why apache2 wasn't adopted for a very long time, as the myriad of PHP extensions wouldn't work because they weren't thread safe.

      In fact, I didn't upgrade to apache2 until I found out that threads can be completely removed from it, so that it works the same way as apache1 did:

      [ebuild R ] net-www/apache-2.0.58-r2 USE="apache2 mpm-prefork ssl -debug -doc -ldap -mpm-itk -mpm-leader -mpm-peruser -mpm-threadpool -mpm-worker (-selinux) -static-modules -threads" 0 kB
      Might be a bit slower, but I'll very gladly sacrifice a tiny bit of performance for reliability.
    37. Re:Bah humbug by Anonymous Coward · · Score: 0

      You seem to be offering evidence in support of his position

      His?

      How do you know it was a him?

      Damned sexists /.ers!

      :-D

    38. Re:Bah humbug by Branko · · Score: 1

      Threads are indeed more efficient at context switching than processes, but the real question is: does that really matter?

      Thread switches can be less expensive than process switches, as illustrated in this table: http://www.linuxjournal.com/articles/lj/0057/2941/ 2941f1.png. I suspect the difference (when it exists) is not important in many cases, with some notable exceptions such as near-real-time systems such as games.

      As to the other reason for using threads, the sharing of memory, there's this really cool new technology out these days. Maybe you've heard of it. It's called "shared memory".

      You can't easily put the complex memory states there. With the shared memory, you basically need to serialize the data at the input and the de-serialize it at the output. This is not very efficient way of doing things (both from the CPU-performance an brain-performance perspective).

      For example, while not being game developer myself, I suspect games have this big interconnected graph of objects and each thread of execution has a need to access numerous objects from this graph. Putting this into processes and then designing what is essentially a "communication protocol" using shared memory would complicate (not to mention slow down) things considerably.

      The bottom line is this: if you need concurrency in your application, you should be using processes, not threads.

      The reason for keeping your threads in separate processes might be that the communication required is simple and/or the memory protection is extremely important (such as in DBMS - Oracle is multi-process for that reason).

      However, it is unjustified to make blanket statements of type: "you should always do X" and then skip the valid reasons for doing Y.

    39. Re:Bah humbug by mihalis · · Score: 1

      I think this is a bit too strong a statement.

      From what I recall, different processes have different virtual memory spaces. CPUs have to translate virtual addresses into real addresses, and they cache such translations in a Translation Lookaside Buffer (TLB). Now I'm no chip architect, but from what I've read switching TLB entries is becoming more and more expensive in terms of how long is the CPU stalled in terms of cycles. So process switching is getting slower relative to thread switching. Thread switching does have to save and restore registers, but not switch virtual address space. I seem to recall that process switching might also be harder on cache hit rates than thread switching, but not 100% sure about that (I guess it depends on whether the cache is "virtually indexed").

      Some good discussion on http://en.wikipedia.org/wiki/CPU_cache

    40. Re:Bah humbug by jazir1979 · · Score: 1

      They may be apples and oranges, but they are still both fruit.

      The discussion above is valid because some technologies use threads where others use processes- eg: a java web server generally uses threads to handle requests, whereas a ruby web app results in a seperate ruby process for each request.

      --
      What's your GCNSEQNO?
    41. Re:Bah humbug by dkf · · Score: 3, Insightful

      On a single core without hyperthreading, your best bet (if you can) is to write very efficient single-threaded code, using non-blocking I/O as much as possible. Some language runtimes require the use of lots of threads even on single-core systems, but that's horrible.

      Once you've got multiple cores, getting multiple threads of execution (either in multiple processes or in multiple threads) makes a lot of sense. I believe hyperthreading benefits particularly such code that has multiple threads executing in the same bit of code since the parallelism there is within a memory management domain, so OpenMP is better there than pthreads, and pthreads is (probably) better than processes. On the other hand, if you're potentially working across a cluster (cue the beowulf jokes!) your code had better be written with processes (and probably MPI) in mind. Of course if you're going that way, you also ought to spend on getting a good interconnect network...

      All in all, getting proper high performance is tricky. The best guide to making things go faster is to try to reduce the amount of shared state between threads-of-execution. Reducing shared state also helps to make the code easier to debug. (Alas, dealing with the bits of state that must be shared is what makes life hard.)

      --
      "Little does he know, but there is no 'I' in 'Idiot'!"
    42. Re:Bah humbug by Viol8 · · Score: 1

      I'm not convinced clone() is even used anymore in the latest versions of libc with 2.6 since "real" threads were implemented in the kernel.

    43. Re:Bah humbug by Viol8 · · Score: 1

      >you can put pthread mutexes and condition variables

      If the implementation underneath uses pointers you're buggered. mutexes are only guaranteed to work between threads , not processes.

  5. "hundreds of cores"? by CarpetShark · · Score: 1

    Where is the abstract getting "hundreds of cores in desktops on the horizon" from? Is this actually expected soon, or are they just looking ahead a bit too eagerly?

    1. Re:"hundreds of cores"? by dreamchaser · · Score: 2, Interesting

      If by 'on the horizon' they mean 'possibly in the next ten years', then sure. I can see that happening. Quad cores are already here. If they double the number of cores every 18 months that means in 7.5 years we'll have 128 cores. I'm just throwing that out as an example, but it's certainly possible even if all the cores are not on the same package. Take 8 physical CPU's with 16 cores each for example.

      Just rampant speculation, but it is certainly possible.

    2. Re:"hundreds of cores"? by Bastard+of+Subhumani · · Score: 3, Funny

      that means in 7.5 years we'll have 128 cores.
      What a pity 640 isn't a round power of 2. That ought to be enough for anybody.
      --
      Only three things are certain; death, taxes, and apocryphal quotations - Ben Franklin.
    3. Re:"hundreds of cores"? by AmIAnAi · · Score: 1

      I guess they are thinking of this: Intel's 80-core research CPU.

      --
      Any sufficiently advanced bug is indistinguishable from a feature.
    4. Re:"hundreds of cores"? by wgaryhas · · Score: 1

      5 processors 128 cores per processor. I wonder if the 8-socket motherboards for AMD processors work with only 5 in there?

      --
      "For every complex problem, there is a solution that is simple, neat, and wrong." - H.L. Mencken
    5. Re:"hundreds of cores"? by Lazerf4rt · · Score: 3, Informative

      Yeah, they're looking ahead too eagerly. That's what academics do.

      Let's not forget that Intel and IBM both recently found a manufacturing process to keep Moore's law going for the next several years. Most people in 2006 thought we hit a wall, and that the multicore revolution was inevitably under way, but that just might not be true anymore. That said, it is always nice to have at least a few cores in available in your system.

      At the same time, AMD's Fusion strategy looks pretty interesting. I really wonder what's going to become of that.

    6. Re:"hundreds of cores"? by srvivn21 · · Score: 1

      If by 'on the horizon' they mean 'possibly in the next ten years', then sure. I can see that happening. Quad cores are already here. If they double the number of cores every 18 months that means in 7.5 years we'll have 128 cores. I'm just throwing that out as an example, but it's certainly possible even if all the cores are not on the same package. Take 8 physical CPU's with 16 cores each for example.

      Just rampant speculation, but it is certainly possible. http://www.google.com/search?hl=en&q=intel+80+core s>Intel Pledges 80 cores by 2011. Grain of salt optional.

  6. RapidMind = vendor lock-in by Anonymous Coward · · Score: 5, Insightful

    Both, RapidMind and Peakstream are proprietary commercial solutions and those companies are trying to lock users into their particular framework. What we really need is the equivalent as true open-source solution, perhaps as a gcc extension. Does anyone know if there is progress being made on this?

    1. Re:RapidMind = vendor lock-in by Anonymous Coward · · Score: 4, Informative

      OpenMP is implemented into GCC 4.2 (I think, I've never used it in GCC).

    2. Re:RapidMind = vendor lock-in by acidrain · · Score: 3, Informative

      Does anyone know if there is progress being made on this?

      The GPUs will ship with C compilers soon enough. They are already supporting limited forms of C. Actually we will see hybrid CPUs (the cell being a first example) which are capable of massive amounts of parallel math operations stacked in along side some of your CPU cores in time. As the number of cores grows, room is made for specialized processors where that makes sense in the market.

      --
      -- http://thegirlorthecar.com funny dating game for guys
    3. Re:RapidMind = vendor lock-in by Wesley+Felter · · Score: 1

      The GPUs will ship with C compilers soon enough.

      Except NVidia's NVCC has NVidia-specific syntax extensions, which is just more lock-in. Maybe somebody will propose a metacompiler that can output Peakstream-C, RapidMind-C, or CUDA-C...

    4. Re:RapidMind = vendor lock-in by Prune · · Score: 1

      Why not use something most recent versions of C/C++ compilers already support? Just use OpenMP. GCC, Intel's compiler, Visual C++, all support it.

      --
      "Politicians and diapers must be changed often, and for the same reason."
    5. Re:RapidMind = vendor lock-in by BillAtHRST · · Score: 1

      Boehm's (and the others') point is that it is not possible to provide robust and efficient synchronization support in the form of a library -- that it must be part of the language itself. A good example of this is the double-checked locking "anti-pattern" http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_re vised.pdf/

    6. Re:RapidMind = vendor lock-in by ralph.corderoy · · Score: 1

      Anyone interested in multiple threads of execution should be familiar with the excellent information presented by Russ Cox on Bell Labs and CSP Threads. This model of threading is far easier to reason about than what most of the world mucks around with, and is available to Unix as part of the Plan 9 from User Space project as libthread.

    7. Re:RapidMind = vendor lock-in by Prune · · Score: 1

      OpenMP is a language extension, and thus part of the language.

      --
      "Politicians and diapers must be changed often, and for the same reason."
  7. C++ can't be made safe by bytesex · · Score: 0

    The programming language must provide thread safety as part of its own paradigm, not as an add-on to the language in the form of functions, classes, templates or whatnot. Even java doesn't do it; you can make definitions of public accessors in java classes that are a mix of synchronized and not synchronized. That is provably unsafe (provided that these public accessors access the same private data), yet the language and the compiler allow it. Elegant degradation during runtime in case of deadlock detection or race conditions are non existent in the world of programming languages, yet you can easily think them up (binding some sort of preference to a thread and let the most preferential thread break the lock - experimenting with sleeping a thread for a number of seconds if it seems to race - notifying an 'exception' thread in case of all of this). I don't think that _any_ language that I know (and that isn't saying much, I'm not boasting) is really thread-ready.

    --
    Religion is what happens when nature strikes and groupthink goes wrong.
    1. Re:C++ can't be made safe by Gr8Apes · · Score: 2

      In Java, this is allowed because of performance issues. You can make it almost 100% thread safe (note I said almost) by synchronizing every method, but there's still some gotchas in the JDK.

      Multi-threaded programming is a skill that comes with a level of understanding, much like students of mathematics must reach a level of understanding to comprehend Algebra, Calculus, Differential Equations, and Partial Differential Equations (yeah, that last one's a bear, especially when you apply it to various physical models) respectively. It's why you can be a wiz at adding or subtracting, but utterly fail at algebra, or a wiz at algebra, but never be able to differentiate or integrate a function, and so on.

      Writing code is easy. Any moron can do it, as the Java and PHP hordes have shown. Writing good code is harder, designing good OO code is even harder, and designing and writing good multi-threaded code is yet a step beyond that. It's why there are so few well-written multi-threaded apps, and most of those are in server land.

      --
      The cesspool just got a check and balance.
    2. Re:C++ can't be made safe by Coryoth · · Score: 1

      Writing good code is harder, designing good OO code is even harder, and designing and writing good multi-threaded code is yet a step beyond that. In theory writing good multi-threaded code shouldn't be much harder than designing good OO code - it's a matter of actually learning the right paradigm to think about things, and then it all flows easily (presuming you've got a language that supports your paradaigm well - otherwise it is doable, but a little clunky, much like OO). If you're willing to let go of shared state concurrency and think in terms of message passing then things get much easier. Think of actors passing messages and try writing multi-threaded code in Erlang. You'll find it is surprisingly easy to do well. If tht's too much trouble then try SCOOP for Eiffel which attempts to convert the OO model into a message passing model (which in some sense it was originally intended to be). In that case designing to multi-threaded code is hardly different than designing good OO code - just decalre which objects are allowed to be handled by in parallel and let the compiler do all the work of sorting everything out.
    3. Re:C++ can't be made safe by ratboy666 · · Score: 2, Informative

      This stuff is an outgrowth of the Sh work done at Waterloo U. Anyway, the idea is that the declarations in your code are replaced. The new types redefine standard operators to generate code for a parallel machine (say, an nVidia card, or a PS/3 cell).

      The code so generated can be run immediately, or deferred (note -- its been a while since I've looked at Sh, so I am being vague).

      I didn't think that this was a GENERAL multi-threading solution; more a way to easily generate code for the parallel machines that are coming available.

      --
      Just another "Cubible(sic) Joe" 2 17 3061
    4. Re:C++ can't be made safe by Gr8Apes · · Score: 1

      Well, if you're going to remove 99+% of the common trouble spots of multi-threaded coding by moving to a messaging paradigm, then yes, it probably is conceptually easier than OO. It can also be significantly slower depending upon the application's design and function and greatly increase its memory footprint. e.g., I don't think a game like Quake would work all that well under this paradigm. BTW, webservers generally work under your paradigm.

      You'll also still have the potential of concurrent modifications in this scenario, but at least you won't be working on the same memory storage locations, potentially reading indeterminate/incoherent values. Instead you'll have inconsistent values displayed, depending upon which thread's data you're displaying.

      I'll still make the argument that good multi-threaded coding is harder than good OO coding.

      --
      The cesspool just got a check and balance.
    5. Re:C++ can't be made safe by bytesex · · Score: 1

      But isn't that the same old excuse ? I mean, if a language prides itself on 'there's just one way to do it' (and don't tell me java doesn't do that - it does), then why would it allow such a glaring hole ? I tell you why - it was never thought about. I know: I've inspected every JDK source tree since it was 1.0.2 - the people then, brilliant as they were, never gave a second thought about making classes re-entrant, and when they started doing that, the language wasn't really fit to apply it on. So you get ugly things like re-entrant and non-re-entrant varieties of the same class. Yuk.

      So you come up with the oldest excuse in the book, and you replace responsibility from the tool to the people who use it. I'll tell you something: that only goes when the tool as such is finished, which it obviously isn't. The tool must go back to the toolmaker, but that's something we've heard about java so many times now, it isn't even funny.

      --
      Religion is what happens when nature strikes and groupthink goes wrong.
    6. Re:C++ can't be made safe by Gr8Apes · · Score: 1

      You choose to focus purely on the Java statement. That's fine. It was thought about. The problem is that there's only so many ways to skin the cat, and Java's approach was to focus on performance. You'd argue that they should have focused on thread safety instead and made it slower?

      The architecture of Java, C, C++, and C# are all such that there will always be issues with re-entrant code. I'm sure you'll have some suggestion for a language that doesn't suffer from this issue.

      As for Java, the core is finished. Refinements are being done to accommodate new features, like the concurrency library and changes in the memory model to address issues brought up by that addition. It's not that the features couldn't have been accommodated with the original JVM, but it would not have been as clean or easy to implement, and certainly not as performant as what resulted from the modifications.

      --
      The cesspool just got a check and balance.
    7. Re:C++ can't be made safe by Coryoth · · Score: 2, Informative

      Well, if you're going to remove 99+% of the common trouble spots of multi-threaded coding by moving to a messaging paradigm, then yes, it probably is conceptually easier than OO. It can also be significantly slower depending upon the application's design and function and greatly increase its memory footprint. e.g., I don't think a game like Quake would work all that well under this paradigm. I think it is nowhere near as bad as you seem to think - it all depends on how message passing is handled. If you're doing via some slow complex scheme then, sure, it will be slow. But the trick is to think conceptually in terms of message passing - that doesn't mean it actually has to be handled with a big clunky message passing interface internally; just in terms of how you think about it. Take SCOOP for instance. The "message passing" mechanism there is feature calls, the message being parameters passed to the feature/method. The preprocessor and compiler handles all the messy details of locking etc. and in practice it runs about as fast as hand written threading. The difference is that you think in the simpler terms of actors and messages, while the computer (in this case the compiler) handles the grunt work of converting that into efficient code. This is no different than OO, or garbage collection, of course: you simplify what you need to write by writing in a higher level paradigm and leave the hard work of turning that into efficient machine code to the compiler.

      You'll also still have the potential of concurrent modifications in this scenario, but at least you won't be working on the same memory storage locations, potentially reading indeterminate/incoherent values. Instead you'll have inconsistent values displayed, depending upon which thread's data you're displaying. Read the page on SCOOP, or this draft paper, to see what it actually does - it is well worth it: it's the best mix of OO and concurrent programming I've ever seen. You won't end up with inconsistent values because everything will block accordingly with the compiler handling all the necessary locking/blocking/waiting and letting you get on with just writing code.
    8. Re:C++ can't be made safe by bytesex · · Score: 1

      Hm. I focussed on java, because you seemed to do so - it was the only specific thing you reacted to. With regards to performance; if locking was an aspect of a class, rather than its accessors, then java might not have suffered so much in performance. The architectures are all the same (within C, C++ and java) because they were chosen to be the same (something existing programmers knew). Kernighan and Ritchie had an excuse - their predecessors are long gone and threads weren't existing when they invented C. C++, C# and java have no such excuse - they were invented long after that, when computer power was many times that of a PDP-11.

      I agree with you that MT programming is something that you have to understand the (almost mathematical) basics of. I don't agree with you that modern languages have an excuse for ignoring threads as a intrinsic aspect of their being. Java's core may be finished, but if we're really to write in languages that allow for proper usage by multiple CPUs, then C++ or java aren't the right choices.

      FWIW, I don't personally think that this 'multiple core programming revolution' is around the corner in any way. If only because what they do at the moment (services and batch-processes) can already be programmed successfully in many other ways. The revolution that the topic at hand seems to imply, requires a radical shift in what people expect programs to do in the first place.

      --
      Religion is what happens when nature strikes and groupthink goes wrong.
    9. Re:C++ can't be made safe by Gr8Apes · · Score: 1

      Well, that's probably because most of my recent multi-threaded work has been in Java, and hence is at the top of the heap, so to speak.

      Yes, Java largely copied the C/C++ architecture regarding threads, with minor tweaks. C# is of course mostly a reimplementation of Java with a strong C++ flavor. Regarding C++, it was merely an OO wrapper on top of C initially, or at least that's how it was implemented with the (g)cc compilers, as they translated C++ to C, then compiled and linked that C code IIRC. (it's been so long I wouldn't bet a beer on it, and I'm too lazy to look it up;)

      To write efficient, production usable MT code currently I have only seen it done with C/C++/Java. I have yet to see MT C# production code, although I'm sure some exists somewhere, even if it's only in unsafe sections. I've already been directed to Erlang and SCOOP and will look at them soon.

      I agree with you that an MC programming revolution is not around the corner. The only real consumer applications that already exist are photo and movie editing packages. Some of these effectively run across multiple CPUs. They're pretty much all written in C/C++ with some assembly from what I gather from various reviews, discussions, articles, etc. After all, does your mail program really benefit from MC? (MT yes, but MC?) I doubt it, unless you're getting thousands of emails a day and have some form of AI agent processing it. How about your word processor? Nope, can't type faster than my fingers will go. Speech recognition? Now there's a concept, but I personally believe that will be handled more effectively via an add-on DSP specialty chip. Multi-media processing including audio? Sure, I can see it there too, and I thought some already were MT. The audio compressors I've used to date are single threaded though. I've countered this by running my ripping software in MT with the ability to compress multiple tracks simultaneously in separate processes (EAC is truly a wonderful product, can't promote it enough). EAC itself only benefits from a single processor though - it's so CPU light while ripping I can barely tell it's running in the CPU monitor.

      Games. Now games are an area that could have some serious MC/MT goodness. It requires a whole new paradigm, however, to run a game effectively in MT/MC. The only game I know that did this was Galactic Civilizations for OS/2. While a windows version came out, I do not know whether it was MT as well. I suppose I could run it and see... :) It certainly allowed for much improved AI, although it's still somewhat limited. At least it was the first turn based game I played that the AI didn't outright cheat.

      --
      The cesspool just got a check and balance.
    10. Re:C++ can't be made safe by joto · · Score: 1

      In theory writing good multi-threaded code shouldn't be much harder than designing good OO code

      Which theory?

      OO is just an aesthetic quality of code, so basically single-threaded OO code is to single-threaded code, as proper paragraph breaks are to a block of text. On the other hand, multi-threaded code opens up a can of worms. There are whole new classes of bugs to be avoided, and most of them can NOT be found by inspecting code locally.

      If you're willing to let go of shared state concurrency and think in terms of message passing then things get much easier.

      Yeah, yeah, lots of paradigms are supposed to make concurrency easier. And message-passing is nice. It's still not easier than single-threaded code, though...

      If tht's too much trouble then try SCOOP for Eiffel

      SCOOP, like all concurrent programming models, has all kinds of unexpected non-intuitive quirks. It's not easier than single-threaded programming.

    11. Re:C++ can't be made safe by joto · · Score: 1

      There was no such thing as "(g)cc compilers". If that expression is to have any meaning, it would be any vendor supplied C/C++ compiler on a unix system and/or the gcc compiler. In that case it's untrue. Some compilers, such as CFront, originally compiled C++ to C. Today almost all compilers, such as gcc, compile C++ to native code.

    12. Re:C++ can't be made safe by Gr8Apes · · Score: 1

      I figured when I wrote that that someone would take me to task about the (g)cc compilers. I didn't recall which particular flavors I used, but I ran them several different OSes and those were the compiler executables we called. (Of course, they could have been linked/aliased to anything, but that's not the point:)

      I did state that they originally operated in this fashion. This was purely in support of the argument that C++ was a syntactical overlay of C, nothing more. That changed later as did the C++ spec.

      --
      The cesspool just got a check and balance.
    13. Re:C++ can't be made safe by pmedwards · · Score: 1

      Elegant degradation during runtime in case of deadlock detection or race conditions are non existent in the world of programming languages, yet you can easily think them up (binding some sort of preference to a thread and let the most preferential thread break the lock - experimenting with sleeping a thread for a number of seconds if it seems to race - notifying an 'exception' thread in case of all of this)
      It's easy to think them up, but they'd be wrong :-). You can't just revoke one thread's lock arbitrarily. If you think you can, you haven't thought through the consequences: What happens next? The complexity isn't (just) a feature of the language in this case: concurrent programming is tricky.
    14. Re:C++ can't be made safe by Coryoth · · Score: 1

      SCOOP, like all concurrent programming models, has all kinds of unexpected non-intuitive quirks. It's not easier than single-threaded programming. Who said anything about easier? I was commenting with regard to not significantly harder. Sure there are quirks, just like there are quirks to using garbage collection, or OO design. It isn't that much to learn, however, and it is a damn sight easier than trying to do threading with hand coded mutex locks.
    15. Re:C++ can't be made safe by Procyon101 · · Score: 1

      I don't think that _any_ language that I know (and that isn't saying much, I'm not boasting) is really thread-ready.

      Check out Erlang and Haskell. Haskell does it by eliminating the concept of state and instruction ordering (so there is so concept of a thread of execution). The compiler can then do provably safe things on any number of threads it feels like. Erlang takes a different approach by making EVERYTHING it's own little thread (functions, variables, everything). Both of these are production languages.
  8. Full respect for offering theora downloads by Anonymous Coward · · Score: 0

    At 411MB this'd better be some demo because chicken little it ain't.

  9. Functional programming by Cthefuture · · Score: 3, Interesting

    Also note that certain programming languages can make multithreaded programming a lot easier. Nothing against C++ (one of my favorite languages) but no matter what you do it's relatively hard to use in multithreaded applications compared to a functional language. We are already seeing more functional features put into existing languages.

    The main problem I see is that there is lack of focus in the functional arena. Many current functional languages are designed to use a VM with bytecode (Erlang for example) and don't support native threads easily (often requiring multiple VM instances and slow[er] message passing). The languages that do support native compiling almost always have other problems like horrible syntax (O'Caml, Lisp) or just general lack of refinement. Arguably Haskell comes the closest but suffers from a complicated and large backend support requirement like Java.

    Without native thread support it's hard to take advantage of multiple processor cores. Too bad we don't see more mature native compiled functional languages out there.

    --
    The ratio of people to cake is too big
    1. Re:Functional programming by kcbrown · · Score: 2, Insightful

      Without native thread support it's hard to take advantage of multiple processor cores. Too bad we don't see more mature native compiled functional languages out there.

      What?

      Sorry, that's bullshit. If you want to take advantage of multiple processor cores, use multiple processes! Even Windows has fork() these days, thanks to its POSIX subsystem, so creating a clone of your process is very easy.

      You should use threads over processes only if you can prove that the context switch savings really does cause a big performance improvement for your application. If you really think about it, you'll find it's very rare indeed that the context switch overhead difference really matters, even on an OS like Windows where it's relatively high.

      --
      Use 'slashdot stuff' in the subject line in any email you send me if you want to get past the spam filter.
    2. Re:Functional programming by Anonymous Coward · · Score: 0

      Erlang has support for multicore/SMP since the R11 release. Also do you have backing for the statement "slower message passing" ? That would be of interest to some of us thinking of using message passing.

    3. Re:Functional programming by Anonymous Coward · · Score: 0

      Erlang would be all but useless if it supported native threads. Native threads share state, which is a very (very) bad thing in Erlang, and the overhead for native thread context switching is too much...which is why Erlang uses green threads.

    4. Re:Functional programming by Lazerf4rt · · Score: 2, Insightful

      I recently shipped an Xbox 360 game and am about to ship a PS3 game, and having done a lot of system-level programming and optimization for both, I can tell you don't know what you're talking about. You're probably a smart guy and a good programmer but you're obviously speaking out of academic experience without having much real-world experience.

      The key to performance and stability does not lie in the discovery of high-level tools that abstract away all the hardware details for you. And it definitely doesn't lie in a functional language. They key is knowing your hardware and designing your software for it, right down to the low level. You have to create and manage your tasks/jobs/threads/fibers, each to do a specific thing, and you manage their lifetimes and the flow of data between them. If want need more performance, you need clever ways of pipelining your data.

      Anyway, just thought I'd share that. If you make a career in programming, you'll eventually learn that having a low-level understanding of each platform, and just using existing tools, is far more productive than trying to research and develop new programming tools. I'm downloading TFA's video right now, look forward to hearing whatever it is they have to say.

    5. Re:Functional programming by Communomancer · · Score: 2, Informative

      The main problem I see is that there is lack of focus in the functional arena.

      Whoa whoa whoa! You may not like Erlang's implementation, but you can hardly attribute it to a lack of focus. The whole language was built with concurrency in mind. Heck, the concurrency even has built-in network awareness. And Erlang's been multi-core since last May.

      Erlang goes multi-core

      Yeah, that doesn't say anything about your VM worries. I don't have those, though. Seamless multi-threading and a language paradigms designed for concurrency more than make up for the VM performance hit, imo. When I have to write non-trivial concurrent systems, I reach for the language that already has the plumbing excellently implemented. I'm sure it's better done than anything I could implement myself, and since the system is concurrent, cheap hardware is easily added to improve performance.

      Man, this is the second time this week I've had to stick up for Erlang around here.

      --
      "UNIX" is never having to say you're sorry.
    6. Re:Functional programming by arevos · · Score: 1

      The main problem I see is that there is lack of focus in the functional arena. Many current functional languages are designed to use a VM with bytecode (Erlang for example) and don't support native threads easily (often requiring multiple VM instances and slow[er] message passing). The languages that do support native compiling almost always have other problems like horrible syntax (O'Caml, Lisp) or just general lack of refinement. Arguably Haskell comes the closest but suffers from a complicated and large backend support requirement like Java. Languages like Erlang do not use native threading not because they are unable, but because it is generally less efficient and more prone to error. There's a large overhead associated with creating a thread in Windows or Linux, whilst the user space processes in Erlang are extremely lightweight. If Erlang used native threads, it would use more memory, and be able to run less concurrent processes. I'd imagine I'd have some trouble creating a program that used a million or more pthreads under Linux, but I would not if I created the same amount of Erlang processes.

      The other problem with native threads is that they are designed around the shared memory model. Functional languages like Erlang do not share memory between processes, but use message passing instead, as you correctly point out. This factors out a whole category of potential errors, and makes developing large concurrent applications considerably easier.

      I've only started looking into Erlang's concurrency model, however, so I'm probably not the best person to point out its merits. However, from what I've read, I don't see functional languages benefiting much from native threads, which are generally designed around the shared memory/imperative model, and which does not translate well to pure functional languages.
    7. Re:Functional programming by Eli+Gottlieb · · Score: 1

      Do you have any idea how much stuff is stuffed into process control blocks nowadays? In addition to simply a thread of control and an address space, they've now got file handles, security tokens (UID and GID on *nix), disk or memory quotas, and IPC ports/mailboxes for message passing. And that's just the stuff I remembered! Threads actually provide a good solution when you only need concurrency, not separate file handles.

    8. Re:Functional programming by fnc · · Score: 1

      About Erlang: old versions really needed multiple VM instances to use native threads, but the current implementation distributes its user processes in native threads transparently.

    9. Re:Functional programming by deKernel · · Score: 0

      I have a feeling that you don't really understand computer systems as a whole. When you develop high-end solutions there are more items to consider than just context switch times. Getting information between processes is slow compared to say shared memory (aka heap) with threads as just one trade-off.

      Threads are great when your process has multiple IO points to the world which are slow. This does allow your application to accept multiple inputs concurrently. The downside is that threads don't allow for scaling with multiple processors or cores. Think about this for a minute, say your application has two threads running. Can each of those threads run on separate CPU's in parallel effectively? Note the word 'effectively'. If the threads have any type of dependency on each other like data sharing, the answer more than likely is no because the system will spend more time thrashing data back and forth. Think about where the process space data resides. In memory, in what cache level, how does the caches keep in sync for each thread that do reside in the same process space.

      The arguments here can only be answered on a case-by-case situation and only by a person who understands what the desired output is and what needs to be done to generate the correct response.

      Arguing about generalities is a waste of time in my opinion.

    10. Re:Functional programming by kscguru · · Score: 1
      Windows doesn't have fork(). The POSIX subsystem has it, but you can't mix the posix subsystem with the win32 subsystem in the same app, and all the Windows apps you're thinking of run as win32. (win32's C library has a lot of posix calls ... fork() isn't one of them).

      On Windows, process creation is expensive and complex, while thread creation is cheap, thus Windows design favors using threads for parallelism. On most Posix systems (unix, linux), process creation is cheap (e.g. fork, which has 40 years of optimization behind it), threads didn't enter the design until much later (Linux: NPTL, late 2.4 kernels?) and are mostly hacked in as "lightweight processes", so Posix design favors using processes linked by shared memory or pipes for parallelism.

      But the underlying point is fine - every modern OS has support for some variant of parallelism. It doesn't have to be baked into the language or the compiler. And to anyone who understands multiprogramming, it's quite easy to switch between programming models.

      --

      A witty [sig] proves nothing. --Voltaire

    11. Re:Functional programming by Anonymous Coward · · Score: 0

      They are still disconnected VM's using message passing though.

    12. Re:Functional programming by Cthefuture · · Score: 1

      Note that I said lack of focus around functional programming languages in general, not specific to Erlang. Individual languages focus on certain things but none of them try to be everything a functional language should. A native compiled version of Erlang would be pretty nice. HiPE doesn't count because that is a tacked on system that is far from nice. If Erlang would have been designed for both VM and native executation earlier then it would be better off (see O'Caml).

      Please note, I'm a fan of Erlang too. I wouldn't consider it a high performance platform though. Mostly people don't notice all the overhead it has when handling multiple cores because the whole thing is relatively slow.

      --
      The ratio of people to cake is too big
    13. Re:Functional programming by joto · · Score: 1

      I agree that if there was a language called "O'Caml, Lisp", then it would probably have a horrible syntax. On the other hand, lisp has a clean beautiful syntax. O'Caml has a syntax that is at least no worse than e.g. C++ or java.

    14. Re:Functional programming by SpinyNorman · · Score: 1

      What you say is true if you're looking to squeeze max perfomance out of a custom platform, but does not apply as a general solution towards concurrent programming.

      As multi-core and multi-processor systems are becoming more the norm, it'd be good to see software technology such as languages and compilers keeping up. The easiest way to take advantage of hardware parallelism would be if you can somehow express that parallelism in your program rather than use a low level imperative language like C++ to painstakingly spell out how the concurrency should be implemented. An explicit thread model is about the worst possible solution because it's the most low level one, although conceivably one might create a new language by extending C++ with concurrency constructs that are compiled down to threaded code the same way early C++ compilers compiled to C. Complete paradigm shifts such as fucntional languages are also an option, but I don't see that becoming mainstream.

      As far as low level and threaded programming - been there done that. I've implemented multiprocessor (mixed 68000/Z80) embedded systems entirely in assembler back in the day, and nowadays write multi-threaded Solaris code in my sleep - it's easy enough with experience, but experience also says that there are much more productive ways we could be doing this.

    15. Re:Functional programming by Cthefuture · · Score: 1

      Your view is skewed due to the market you work in.

      By the way I'm not coming from an academic background. I have not been in school for several decades and my career in programming is just fine thankyouverymuch.

      --
      The ratio of people to cake is too big
    16. Re:Functional programming by The_Wilschon · · Score: 2, Insightful

      That approach works just fine if you know exactly what hardware your code is going to be running on, and you know that it will never have to run on any other hardware, and you know that you won't have to ever work on it again once it is released.

      In the Real World (ie not game consoles), programs must be portable. They must be maintainable. They must be writeable in a short time. Your approach completely ignores these requirements which enormously outweigh the tiny performance gains that you can get by tweaking for the hardware.

      High level languages remove the portability problem entirely, by shifting it to the shoulders of the language implementors. If I implement a language on one platform, and you implement the same language on a completely different platform, any program written in that language will now run just fine on both platforms. Sure, unless your compiler or interpreter is incredibly smart, programs written in that language won't run as fast as endlessly hand-tweaked assembly programs written for that platform. But if your compiler or interpreter is even just passingly not bloody stupid, then programs in that language will run pretty close to as fast.

      Higher level languages allow the programmer to express more directly exactly what he means. Suppose I have a very very low level language that manipulates bits (ie logic gates). I want an addition routine for arbitrary length sequences of bits. Implementing this in logic gates is not only very difficult and very very time consuming, but the end result doesn't look like addition at a glance. Now suppose I use a high level language. I can just write (a + b). It is clear to anyone who made it through elementary school that this code adds two things, a and b. Of course, I could abuse this, and actually make the language so that (a + b) deletes all files on the hard disk, but a realistic high level language is essentially self documenting. This means that 5 years down the road, when I've forgotten why I even wrote this bit of code, I can pull it up in a text editor and remember that it adds two numbers very easily. Suppose I then need it to add three numbers. Well, now that's a very easy change to make. The code is immensely more maintainable, not only by the original author, but by anyone, than code in a low level language.

      Many of the extremely high level languages that are disparagingly referred to as "research languages" have an extremely high information density. That is, I can express in 1 line of code what it might take you perhaps as many as hundreds of lines of code to express in a low level, close-to-the-metal language. This also contributes to the maintainability, because there is a lot less room for mistakes in one line of code than in 20 lines of code. More importantly, this very high expressiveness means that I can write code in a high level language much more quickly. When you're trying to compete with other companies, or trying to finish a PhD thesis in less than 10 years, or pretty much anything at all, being able to code 20 times faster is a very important thing.

      If you make a career outside of the very sheltered world of Xbox and PS3 programming, you'll see that endless performance tweaking in very low level languages is not only useless, but it is wastefully stupid.

      --
      SIGSEGV caught, terminating

      wait... not that kind of sig.
    17. Re:Functional programming by DragonWriter · · Score: 1

      Individual languages focus on certain things but none of them try to be everything a functional language should.


      That's actually the opposite of lack of focus, rather, its lack of universality. But then, again, I think its as unreasonable to think that an FP language should be the one tool for every job as to think that about anything else.

      It also shouldn't be surprising that a capacity that is only now just becoming common and viable as a target doesn't have a lot of mature support for it yet, either.
    18. Re:Functional programming by Cthefuture · · Score: 1

      Don't change my words. I would have said lack of universality if that's what I meant. I also think it's unreasonable to think that one language can do everything but I never said otherwise.

      It has always been viable but no one has put the pieces together well enough to make developers actually want to use them (outside of academia). C is an example of a language that put things together well enough to take off in its respective arena.

      --
      The ratio of people to cake is too big
    19. Re:Functional programming by Lazerf4rt · · Score: 1

      If I implement a language on one platform, and you implement the same language on a completely different platform, any program written in that language will now run just fine on both platforms.

      You mean "Write Once, Run Anywhere" like in Java? You know how that ended up turning out, right?

      If you make a career outside of the very sheltered world of Xbox and PS3 programming, you'll see that endless performance tweaking in very low level languages is not only useless, but it is wastefully stupid.

      I've had an array of development experience, and I will say that game console programming is, in fact, less sheltered than other environments. The whole point of the higher-level approach used in other types of development is that it shelters you. And the performance tweaking that we do is not endless. It ends when we ship. And while it's a must to understand assembly (the very low level language you must be referring to), nobody writes in it. The closest you get is a few compiler intrinsics.

      Anyway, I feel like I'm writing this more for your benefit than for my own.

    20. Re:Functional programming by DragonWriter · · Score: 1

      Don't change my words.


      I didn't.

      I would have said lack of universality if that's what I meant. I also think it's unreasonable to think that one language can do everything but I never said otherwise.


      You did say that no FP language has focussed on doing everything a functional programming language should do. That's the statement I was referring to. There are many things one might imagine functional languages should do, and some of them conflict.

      It has always been viable but no one has put the pieces together well enough to make developers actually want to use them (outside of academia).


      Erlang was developed in private industry, IIRC, for telephone switching, and best as I know that remains a major use, if not the major use, for it—since being released, its been adopted for use by communications carriers other than Ericsson, where it was invented. Functional programming features are, outside of academia, being bolted on to newer versions of popular, widely-used OO programming languages (e.g., Java), and are central to the design of other popular languages (e.g., Ruby).

      C is an example of a language that put things together well enough to take off in its respective arena.


      Great. So are R, Mathematica, and Erlang, among other functional languages: each has taken off in its own respective arena. Its just that C's "arena" is systems programming, which is rather fundamental to any computing system, whereas the "arena" for FP languages tends to generally be more narrow.

      Now, as certain FP languages have some attractive features for concurrency, and new applications of those are seen, the "arena" for FP languages, and the demands placed on them, may certainly evolve. And, certainly, the abundance of multi-core processors now likely changes the implementation considerations in concurrent languages (functional or not), and implementation design, if not language design, will certainly respond to that.
    21. Re:Functional programming by Cthefuture · · Score: 1

      You did say that no FP language has focussed on doing everything a functional programming language should do. That's the statement I was referring to. There are many things one might imagine functional languages should do, and some of them conflict. You're reading way too much into what I wrote. The "should" could mean anything. Nothing conflicts in my definition of what a language should do (note I never said what I thought a language should do because that is beside the point).
      --
      The ratio of people to cake is too big
    22. Re:Functional programming by The_Wilschon · · Score: 1

      Yeah, the Write Once, Run Anywhere thing doesn't work quite as advertised. Therefore we should throw sense to the winds and deliberately write code which will have to be completely rewritten if we want to run on another platform. Java is one example, and a particularly poor one. Perl, python, ruby, scheme, TeX, etc, etc, etc, actually do work quite close to write once run anywhere. Not quite, of course, as you can almost always still do things that depend on the hardware at hand, but nearly always, barring bugs in the implementation of the language on the target platform.

      No, the whole point of the higher-level approach is maintainability, portability, and writeability. As I quite clearly said. Funny. I feel like I'm writing for your benefit, not my own. I've been down the low-level optimize the hell out of it road, and I've been down the high level let the compiler optimize the hell out of it road. The end result is nearly always that the high level approach is better on all counts (often including performance, as with any reasonably complex program, a decently smart compiler can do much more, much safer, and much better optimization than a human).

      As far as endless tweaking goes, no, I do realize that nearly nobody does that in assembler anymore. But doing it in C is almost as bad.

      Game console programming is completely sheltered from the concerns of portability. You know exactly what hardware your code will run on, and you know it will never ever run on any different hardware. This allows you to make assumptions about the hardware that nobody else gets to make, and it turns out that relying on the details of the hardware actually makes for worse (unreadable, and thus unmaintainable, and thus far more fragile) code. Also, your development team is the only group of people that will ever see your code, much less work on it. Thus as long as all of you understand the horrible crufty nonsense that results, you don't have to worry about it. In the Real World, other people see, work on, and use the code that you write. So if it isn't nice and readable and understandable, you ain't gonna get very far. So, in game console programming, you are sheltered from readability concerns as well. Finally, with most console games, once they are released, that is it. You never have to come back a year later and fix some obscure bug. So you are sheltered from the concerns of maintainability as well.

      You might read this: http://www.cs.indiana.edu/~jsobel/c455-c511.update d.txt. It points out quite clearly that writing in a high level language often results in much faster (and less buggy!) code than writing in a low level language.

      --
      SIGSEGV caught, terminating

      wait... not that kind of sig.
    23. Re:Functional programming by shutdown+-p+now · · Score: 2, Interesting

      Well, speaking of game development, I hope you'll excuse me if I will hold the opinion of that guy Tim Sweeney, you know, the one behind Unreal, higher than yours? 'Cause he seems to disagree with you pretty strongly on many things, threading issues among them. Tools (which languages are) are key to solving this problem, and a lot of it does come from academia, just as all things heavily used in the industry today (like OOP) did.

  10. What?! by eldavojohn · · Score: 4, Interesting

    Programmers must begin to develop applications that take full advantage of the increasing number of cores present in modern computers.
    I'm a developer. I may not be the greatest one but I enjoy it. This declaration baffles me.

    You choose to go with a multi-threaded application when it is necessary. Anyone who just starts adding threads because they feel they need to utilize the number of cores is a complete idiot in my book. Hell, why don't we just put spin locks in there so your CPU usage shoots up and it looks like I'm using it to its full potential?

    My point is that there have been a few applications I've written that require a multi-threaded solution. Perhaps this API would have made my life easier but I doubt it as I had to pretty much structure by hand each thread. There are frameworks, graphical libraries and that also use multi-threading that the scheduler has taken care of in the past. Hurray for multi-core if you use those.

    A good programmer keeps things as simple as possible. They will be easier to maintain in the future. I'm afraid that this is unneeded layer of abstraction or some nut case trying to "utilize cores" for the sake of it. No one has only one application running at one time. The OS is usually running, you have a network process, etc. If I write my application to use one core, I'm giving the user more options to do with the other cores whatever he wants. Let the scheduler work with the futuristic hardware and sort that crap out.

    Also, not everyone is multi-core already. Take use into consideration please!
    --
    My work here is dung.
    1. Re:What?! by acidrain · · Score: 2

      why don't we just put spin locks in there so your CPU usage shoots up and it looks like I'm using it to its full potential?

      I heard stories of this being done by games companies when their publishers complained they weren't using the VU1 on the PS2 enough. That was the VU which was really hard to utilize because had no access to the rendering hardware. And yes, publishers ran the diagnostic tools available when you submitted builds.

      --
      -- http://thegirlorthecar.com funny dating game for guys
    2. Re:What?! by Anonymous Coward · · Score: 0

      The OS is usually running, (...) I'm using Windows ME, you insensitive clod!
    3. Re:What?! by zx75 · · Score: 4, Insightful

      I think you've missed the mark a little.

      I believe what he is saying is that if your an application developer who is pushing the limits of what a single core is capable of in terms of performance, then you are going to see decreasing rate of improvement and then stagnation because the focus of hardware development is shifting away from more power in a single core to more power because there are more cores.
      At some point you will hit a wall, and for single-threaded applications you're going to reach a point where there isn't any more power to be had.

      Therefore if you want to tap that extra power that a multi-core processor has, you will by definition *need* to start multi-threaded programming. This isn't about you people who are happy with the speed and power that you already have, research is pointless if you already have everything you could possibly need. This is for the people who push the edge, at some point if you need more you will need to learn to multi-thread correctly.

      And a simpler way to do it, is gold in my books.

      *From a former University classmate of Stephanos*

      --
      This is not a sig.
    4. Re:What?! by LWATCDR · · Score: 1

      Your are right. Some programs will never need to be multi-threaded. However if a program is running slow right now you can no longer bet on the next generation of hardware to give you a performance boost unless it is multi-threaded. It will really depend on your application.

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    5. Re:What?! by D-Cypell · · Score: 1

      "Anyone who just starts adding threads because they feel they need to utilize the number of cores is a complete idiot in my book."

      This is not about dreaming up ways to add concurrency, but utilizing concurrency options that already exist. For example, when a user of your application double clicks a row in your table, you need to grab the detail from the server and create a complex dialog to display that data. Clearly these tasks can run concurrently, but generally they are coded sequentially. On a single core, the benefit of concurrency is outweighed by the complexity involved. This may not be so when multiple cores are available.

      Multi-core aware software will break it's individual functions down into nice clean atomic units that can be processed concurrently. You can already code this way (server developers are used to it, they have been dealing with SMP for a while now), the growth in availability of multi-core desktop systems means that client-side developers will have to begin to learn the same tricks as server side developers if they want their code to take advantage of the hardware. Constructs like a 'pool of worker threads' should become far more common in client-side code.

    6. Re:What?! by Anonymous Coward · · Score: 0

      Sir, you are wrong. I also am a developer, only it appears that I am slightly closer to the hardware industry than you are.

      Intel, still the largest CPU vendor on the planet, is actively pursuing multi-core designs (think 16 cores in several years, not 2 cores). As clock speeds will stop increasing, very soon the only option to boost your application's speed will be to utilize the redundant cores which you oh-so-much-despise. Not everything scales well, some things don't scale at all, but looking at most popular games and applications there's still a lot of catching up to do. Whereas your application in several years will run as fast as today, an application of a smart programmer will scale to more cores and will be faster.

      It's not the question of whether you want more threads, because it has already been decided for you by the chip maker. It's the question of how to split your problem into more threads to make it scalable. If you don't want to do that, you don't care for your customers and/or you're a lazy human being. Or maybe the concept of two simultaneous threads is just too difficult to comprehend?

    7. Re:What?! by gbjbaanb · · Score: 1

      not so, see your example of 'grab detail from the server and create a complex dialog to display it' is still single-thread of execution. If you split it into 2 threads, chances are the 2nd thread will be sitting waiting for the first thread to complete in order to have data to display!

      Server devs are used to multi-threaded coding, but generally this is because on a server you write code to do one thing, and then you find that 10 people want to do that same thing at the same time (eg think of a webserver, it only takes data froma client, processes it, returns it to the client - a single threaded app effectively. But then you want to do that 1 thing for many simultaneous clients).

      So I think concurrency is something that still needs a certain kind of problem to solve before it should be used. and a salutary lesson for us all: when Win95 came out, it had threads and the explorer team decided they use this cool new feature.. so every part of explorer found its way into a thread, one to draw the tree, one to draw the current folder, one for each directory to check to put a cross in the tree.. etc etc. and it was as slow as clay off a shovel. Just because you have threads doesn't mean you have to use them. Just because you have dual cores doesn't mean you have to write MT apps to use them either.

    8. Re:What?! by grahamwest · · Score: 1

      You mean VU0 (which is accessible using coprocessor instructions of the EE), not VU1 (which feeds directly into the GIF via PATH1). And I've never heard of anyone doing that, nor of publishers even caring. Publishers care about things they can put on the back of the box. VU0 utilisation is not such a thing. The only way I could see it coming up is if a developer says "I can't do after all." and then publisher says, "Well does, are they using the system better than you?"

      For all that, not many people use VU0 that much. Some people do particles on it, some people do part of their character animation on it and some people do some of their skinning on it. Most people have optimised math routines to use it via the coprocessor instructions, but that's not exactly a heavy amount. I've written about a dozen lines of VU0 assembly in my career.

      --
      Graham
    9. Re:What?! by grahamwest · · Score: 1

      I should've previewed. The developer is supposed to be saying "I can't do [high-level game feature] after all." and the publisher, well [other game] does...". My forgetfulness of angle brackets was not sufficiently mitigated by office doughnuts.

      --
      Graham
    10. Re:What?! by D-Cypell · · Score: 1

      "not so, see your example of 'grab detail from the server and create a complex dialog to display it' is still single-thread of execution. If you split it into 2 threads, chances are the 2nd thread will be sitting waiting for the first thread to complete in order to have data to display!"

      This is a thread synchronization problem and is common in multi-threaded environments. Semaphores and Mutexs are already used to solve this problem. The difference is that the theorectical execution time of the concurrent approach is as long as the longest task, which in many cases can be significantly better than execution time of all tasks combined.

      What happens in your web server example when those threaded requests all want to log their activity to the central log file? Or update the same record in the database? You word this paragraph as if multi-threaded servers are a simpler problem. Not true, dealing with 10 people wanting to do the same thing is easy... dealing with 10 people who want to do conflicting things is much harder; and it is a problem you should not have with client-side threading.

      I wasn't aware of the threading mistakes made by the windows 95 development team and threading in the way described is exactly what not to do. One thread to build the GUI, while another reads the disk is sufficient. Threading every object on the GUI will just kill you with the cost of 'context switching' (which, I presume, is much more efficient in multi-core hardware).

    11. Re:What?! by 0xdeadbeef · · Score: 1

      The people "pushing the edge" are quite capable of using the threading APIs of the operating system. If you can't do that, the edge is going to cut you.

      Ok, maybe I shouldn't criticize without seeing the implementation, but for some reason instead of a paper the link leads to a 400 megabyte movie. Talk about wasting resources...

    12. Re:What?! by snark23 · · Score: 1

      I've seen several presentations this year from bleeding-edge researchers from different disciplines (operating systems, programming languages, hardware) and everybody agrees that the only way we're going to keep up with Moore's Law in the next decade is by adding more cores. The hardware guys are nearing the limit of just how many transistors they can jam into a single core (which is why clock speeds have stopped increasing), so the new trend is to add more cores to a chip.

      Windows and Linux do a nice job of dispatching threads to your 2 or 4-way SMP system; since we're usually running a few programs simultaneously, adding a single core will give us a speed boost even when all of our programs are single-threaded. But those days will end: We're not running thousands of programs, and people ARE talking about thousands of cores. In order to take advantage of those cores, programming tools are going to need to change dramatically; it's just too hard to effectively leverage that sort of parallelization by hand, especially with heterogeneous cores, and it's pretty much impossible to automatically infer very much parallelism from the usual imperative languages.

    13. Re:What?! by DrVomact · · Score: 1

      A good programmer keeps things as simple as possible. They will be easier to maintain in the future. I'm afraid that this is unneeded layer of abstraction or some nut case trying to "utilize cores" for the sake of it. No one has only one application running at one time. The OS is usually running, you have a network process, etc. If I write my application to use one core, I'm giving the user more options to do with the other cores whatever he wants. Let the scheduler work with the futuristic hardware and sort that crap out.

      OK, I'm going to reveal my ignorance here for all to see. I do some programming, but as a sideline (I make tools for myself when automation would make a job easier). I had thought that if a program was "multithreaded", then this meant that the OS could assign different threads to run on different processors of your multiprocessor machine, and thus permit quicker execution of CPU-intensive applications. I have a dual processor machine here at work, and it's frustrating to see a job that takes a half hour to run consistently using only 50% of CPU. (No, I didn't write the software in question.) I keep thinking that if only the application was multi-threaded, then it would finish in 15 minutes. Am I wrong?

      I'm more than a little suspicious that multiprocessor motherboards and multicore CPUs are just another piece of marketing hype to separate me from my money. (Sort of like having a 64 bit chip when I run a 32 bit OS because there aren't that many drivers and apps out there that will run on 64 bit machines...but all the AMD CPUs are 64 bit now, so I pay for all those bits whether I like it or not...)

      --
      Great men are almost always bad men--Lord Acton's Corollary
    14. Re:What?! by akypoon · · Score: 1

      I agree with eldavojohn. Multi-core is *not* a must for application development. If your problem is better off with a multi-threaded solution, and time allows you to do it, I don't see why not. However, keep in mind that:

      A) If your problem does not involve or has very little parallelism in it (e.g. scanning in compilers), I see little point to solve it in a parallel way;

      B) Multi-threaded programs are harder to develop than sequential programs. Why? You have more things to get right. And if you don't understand the problems well enough, you often end up with a parallelized programs that are more complex than its sequential solution and with little (or none) performance gain.

    15. Re:What?! by try_anything · · Score: 1

      And of course everyone here will get bored and leave before they finish downloading the movie.

    16. Re:What?! by gbjbaanb · · Score: 1

      context swithing is probably just as bad on a multi-core - you still have to save the registers which is most of the problem. On a multi-core box though, you will get more context switches as the locks get used more compared to a single core. (ie there's much more chance a switch will be needed compared to running all those threads 'sequentially' on a single core)

      On a multi-core CPU you also have the issue of cache thrashing. If your thread gets preempted, when its reloaded on the other core the cache line will be invalid and has to be completely reloaded.

      My points though were that threads aren't best used to complicate a consumer-producer scenario (eg, thread 1 gets data, thread 2 uses that data), unless you can service the 2nd thread while thread 1 is still processing. If you can't have the consumer doing something without that data thread 1 will provide then it is more efficient, as well as seriously simpler and therefore more maintainable, to run them in sequence.

      My point is that threads are best used to do a simple thing many times over. Eg. if you have a ray tracing prog, 1 thread per core is optimum and you will get results 4x faster on a quad core. 8 threads will not get you results any faster. Similarly, if you have a webserver that wants to log to a central location, the time your first thread spends logging (ie waiting for IO) will allow the server to process other clients. But putting your logging in a thread will not make the server run any faster overall, will require some slow and fragile memory handling as you pass log lines to it (if it cannot write to the log as fast as you pump logs to it, say goodbye to your server under heavy load; or add some complex locking systems when the write queue is full which takes you back to where you started, logging in single threaded mode!).

      So, unless the logging process is significantly slow, don't bother, you'll make things far more complex for no gain whatsoever.

    17. Re:What?! by argent · · Score: 1

      it's frustrating to see a job that takes a half hour to run consistently using only 50% of CPU. (No, I didn't write the software in question.) I keep thinking that if only the application was multi-threaded, then it would finish in 15 minutes. Am I wrong?

      You might be. Some programs get very little benefit from parallising, and some actually become slower if you try to spread them over more processors because more time is spent coordinating the threads than doing the computation because every thread needs to coordinate with another thread for every step.

      Start reading here

  11. what a joke by acidrain · · Score: 5, Insightful

    From the site:

    • 1. Replace types: The developer replaces numerical types representing floating point numbers and integers with the equivalent RapidMind platform types.
    • 2. Capture computations: While the user's application is running, sequences of numerical operations invoked by the user's application can be captured, recorded, and dynamically compiled to a program object by the RapidMind platform.
    • 3. Stream execution: The RapidMind platform runtime is used for managed parallel execution of program objects on the target hardware platform, which can be a GPU, the Cell processor, or a multicore CPU.

    Man thats some funny stuff. Wow that cracked me up. A *games* company using a tool that has this level of indirection?!? I sure hope these guys got a lot of money from their sucker VC to roll in.

    Look guys. There is no multi-processing silver bullet. It isn't even such a hard problem, *if you stop trying to solve it at such a low level*. Break your application into separate pieces that, *don't need to communicate very often.* Then this is the same kind of problem scalable websites like Google, MySpace, Hotmail and so on, have already, just without having to factor in the reliability issues. Finer grained multi-threading just leads to deadlocks and is really hard to debug. If you *really must* render the same sphere on 100 processors at the same time, then you need the speed of a custom coded solution. But you don't so let it go. The main loop of your program will be just fine as a single threaded implementation, 1 processor will do, and farm the 10% code / 90 % heavy lifting out in big clean chunks to other processors. If you find yourself writing some bizzare multi-threaded message passing system so that you can have 100s of threads all modifying the same live object model at the same time -- you are fucked, just forget about it 'cause you will never be able to debug that one killer bug that you know is going to get you right as you go to ship.

    --
    -- http://thegirlorthecar.com funny dating game for guys
    1. Re:what a joke by Anonymous Coward · · Score: 0

      Who says it has to be a single object? Hell, I'd think games would be the perfect setting for this approach, as would any simulation of multiple entities. The demo was a perfect example of a practical use (though not very practical in and of itself :).

      Someone elsewhere mentioned that it's not for everything, but certain approaches simply cry out for this sort of an approach. Being able to spit out multiple tasks at need and not worry about setting them up and keeping more than cursory track of things or trying to decide whether you should be making them asynchronous or not would certainly come in handy at times, wouldn't you agree?

      No, it's not a silver bullet; but it sure sounds like a handy hammer to have in your toolbox to me.

    2. Re:what a joke by dcam · · Score: 1

      Break your application into separate pieces that, *don't need to communicate very often.*

      The problem is knowing where to create the breaks.

      Then this is the same kind of problem scalable websites like Google, MySpace, Hotmail and so on, have already

      Serving pages scales well with multiple cores/processors/machines. Games don't because there is just one user not millions.

      --
      meh
    3. Re:what a joke by byteherder · · Score: 1

      Finer grained multi-threading just leads to deadlocks and is really hard to debug.

      It is statements like this that I just burns me up inside. Let me put the proper qualification on the above statement.

      Finer grained multi-threading just leads to deadlocks, if you suck at programming or if you don't know what you are doing and is really hard to debug if you are some clueless hack that cannot keep track of more than one thread in their feeble mind.

      I will get off my soapbox now

      byteherder

  12. 400MB download by MobyDisk · · Score: 1

    Is there a version that isn't a 400+MB movie file? I was expecting an article.

    1. Re:400MB download by aadvancedGIR · · Score: 5, Funny

      Me, I was expecting 100 4MB movies files that you would have to play concurrently.

    2. Re:400MB download by Anonymous Coward · · Score: 0

      Right, and you were even going to read it.

  13. Re:Don't Bother by Anonymous Coward · · Score: 1, Informative

    99.99% of the time, multi-thredded programming is not needed, and can actually *Slow* things down as mutexes block each other, killing performance - It takes a certain amount of time to establish a mutex, so two threads working on a single bit of memory can perform like a dog as each try to block, and unblock the other.

    Wait, can you tell me how you can get to parallel tasks that need to access the same resource to coordinate in a more performant manner? Aren't referring to poor design/algos vs anything specifically having to do with multi-threading? Saying mutex's are slow and should be avoided is like saying disk io is slow and should be avoided. While true, if my app needs to deal with storage, then that's what it needs to do.

    Also, Multi-core CPU's, have made using windows bareable when under load, because current singly threadded processes leave one spare core available for the OS to use when any app decides to eat all the cpu!

    Actually multi-core has made it cheaper. Multi-processor boxen have been around forever and those who've used them have been enjoying the benefits for a gazillion years. Plus your statement is misleading, very few major apps are single threaded, the OS itself has a ton of stuff going on in the background, there are demons/services running all the time. There is no such thing as a simple "a single threaded app leaves the other processor/core free for OS use", all cores are constantly being used by all sorts of crap unless the OS is configured to actually force the above to happen (processor affinity). In that case even a multi-threaded app can be forced onto a single processor/core.

  14. Yep, concurrency is a problem, not a solution! by argent · · Score: 2, Interesting

    100% agree. Concurrency is a problem, not a solution, and it needs to be abstracted out early if you need it at all.

    1. Re:Yep, concurrency is a problem, not a solution! by grumbel · · Score: 2

      Concurrency is a problem, but its one that you *can't* avoid. Everything in todays CPU development points very strongly into a multi-core direction, in a few years you can't buy single cores any more and in a few years down the road again something like eight cores might be the norm. So how exactly do you want to write programs then? Single thread, using only 12.5% of the available computing power? I don't think so. Now it is of course hard to write multi-threading in C++, but other languages such as Erlang or function languages in general have a lot less issues with it, sooner or later we actually might need a big switch in the programming languages, since C++ simply seems to be no longer up for the task.

    2. Re:Yep, concurrency is a problem, not a solution! by jc42 · · Score: 1

      Concurrency is a problem, but its one that you *can't* avoid. ... So how exactly do you want to write programs then? Single thread, using only 12.5% of the available computing power? I don't think so.

      I don't think so either, but I intend to handle 99% of the "multi-threading" tasks in the same way that was pioneered in unix systems back in the 70's: a flock of small, communicating processes. That works quite well with most of the things that people are hyping threads for, and is debuggable.

      Let's face it, writing multi-threaded code is easy. What's difficult is debugging it, because multiple threads in a single address space invalidates almost all the debugging tools we have available. But if you factor out the parallelism into separate processes (i.e., threads with separate address spaces), most of the debugging tools work.

      And yes, I've built apps that run on machines with more than one processor. Back in the early 80's, I worked on a number of projects that used test machines with 50 to 200+ processors. It wasn't all that difficult with the debugging tools we had then. I have fond memories of a parallelized version of make, which took advantage of the tree of dependencies to build as much in parallel as possible, up to the limit of an environment variable that you could define. It was fun watching system builds spit out "cc" lines as fast as the display would show them, and deliver a new kernel in under a minute. I worked on a number of kernel mods that required a system build for every test, and this was really handy. But you could see from ps that make wasn't really parallelized; it forked to do the parallel work. I'd guess it took only a few dozen lines to hack this into the original make.

      There may be a few tasks that actually require multi-threading withing a single process. I feel a bit sorry for the poor souls that have to debug them. I hope someone is working on debugging tools that will help. But it's not obvious what they might be. The public discussion mostly seems to devolve to "Well, if you'd just program it right, you wouldn't have bugs, dummy." Such cluelessness doesn't give one with experience a lot of hope.

      (I'd work on it myself, but I don't think I have the funds for a decent test setup, and I don't see anyone trying to hire for the task. Debugging isn't a profit center in most companies. This fact also doesn't give one hope. ;-)

      --
      Those who do study history are doomed to stand helplessly by while everyone else repeats it.
    3. Re:Yep, concurrency is a problem, not a solution! by ClosedSource · · Score: 1

      I see multicore as a plan B strategy because chip makers can't provide the super high speed processors the market really wants. If new technologies that let us break through the current speed barrier become practical, the interest in multicore and the associated "everything needs to be multithreaded" philosophy will fade away.

    4. Re:Yep, concurrency is a problem, not a solution! by argent · · Score: 1

      Concurrency is a problem, but its one that you *can't* avoid.

      Sure you can. As the parent article suggested, you divide the program up into loosely coupled components and use an asynchronous coordination mechanism. For a lot of workloads that happens more or less automatically, jobs like software builds can trivially be distributed across multiple cores, and any problem that can be addressed using a dataflow approach is well suited to the job. In a game you have many components that can be made independant: physics, AI, rendering, and these can be further broken down: mob AI and character AI; large and small scale physics; special objects like vehicles; and so on.

      The programming language is much less important than the programming model. You can write FORTRAN code in any language, and I'm sure that Erlang is no exception.

  15. Square peg, round hole. by Ihlosi · · Score: 2, Informative
    Programmers must begin to develop applications that take full advantage of the increasing number of cores present in modern computers.



    No. Whether something can be done effectively on multiple cores doesn't depend on the programmer, but on the type of processing. Some things have to be done in a certain order, and there's nothing even the best programmer in the world can change about that, period. If you try hacking something together that uses multiple threads for this type of processing, you'll just end up making things slower and messier.



    On the other hand, there are other types of processing that just lend themselves fantastically to being done multithreaded.

    1. Re:Square peg, round hole. by AutopsyReport · · Score: 1

      Square peg, round hole.

      I'll be damned... that's the same excuse my wife uses when I try to get her in bed!

      --

      For he today that sheds his blood with me shall be my brother.

  16. Thumbs up, thumbs down... by Anonymous Coward · · Score: 1, Insightful

    Researcher Stefanus Du Toit discusses and demonstrates RapidMind, a software system he co-authored, that takes the pain out of multi-threaded programming in C++.
    Thumbs up, since this is great for PCs, where you never know the hardware configuration you're running on. At the very least your applications will have the ability to "do better" with more hardware if it exists, yet still run on older stuff if it doesn't.

    For his demo he created a program on the PlayStation 3 representing thousands of chickens, each independently tracked by a single processing core.
    Thumbs down, since you're on fixed hardware, and for the best benefit you're going to want to dedicate specific processors to specific tasks at specific times rather than have an algorithm just "figure it out for you."

    It may not seem like a bad idea now, but if we're to go down this path eventually we'll be praising researchers for getting a flavor of BASIC that can take advantage of all the PS3's processors seamlessly, but still has the same issues BASIC has.

    I think in general this begs the question, "Are we better off in the long run empowering ignorant programmers, or informed programmers?"

    For instance, would it have been better for the researcher to perhaps focus on how to teach programming for multiple processors rather than coming up with a library that "abstracts it away"?

    This is not to take away from the fact that he got it running on the PS3 though. Kudos for that.
  17. Re:Don't Bother by szobatudos · · Score: 1

    Some researchers actually do bother: http://manticore.cs.uchicago.edu/

  18. Toy Supercomputer by Doc+Ruby · · Score: 3, Interesting

    The problem with programming the PS3 is that once the complexity of its parallel processors is handled, the CPU is so fast that it consumes and produces data much faster than the IO available. The Cell is a basically 204GFLOPS/32bit machine (plus the Power RISC, basically a Mac G5), with an internal 1.6Tbps bus. But even its builtin gigabit ethernet is puny compared to that kind of dataflow. It's not clear whether the USB slots are 1, 2 or 4 buses at 480Mbps each, but even 2Gbps more isn't so much. Maybe another gig-e can plug into its CompactFlash slot, bringing the total up to 4Gbps, but that's still only 0.25% the chip bus. In desperation, perhaps the SATA bus could also be used for another 1.3Gbps. Adding the HDMI output with some fancy codecing (especially on the receiving host) gives 10.2Gbps out, so the other 5.3Gbps can be used for input, but that's still only 5.3Gbps throughput, probably a lot less at under 100% efficiency per channel. The Cell can spin its wheels with 2000 instructions on the data it's got before it gets more. There are lots of "multimedia mixing" and transformation applications that could run multiple cycles in that 2K instructions, which instead need more machines for more IO.

    The PS3 doesn't seem to have the PCI-Express bus that would solve all these problems. For some reason Sony left out its old pet, FireWire, which could have added buses at 800Mbps each. There doesn't seem to be any expansion whatsoever, except changing the HD on the single SATA connector. To use what it's got, a huge amount of complex, heterogeneous IO management is necessary to use its power.

    It's strange to think that a $600 machine with around 5Gbps throughput and 7Tbps processing is a "toy", but the cropped IO makes the PS3 look that way, relative to its full power. Maybe a HW mod, even at $500 or possibly up to $2000, that adds PCIe for a half-dozen 2x10Gig-E cards, or even InfiniBand, will make this crazy little toy into more than just a development platform for games or prototypes for really expensive Cell machines. Who's got the way out?

    --

    --
    make install -not war

    1. Re:Toy Supercomputer by Anonymous Coward · · Score: 0

      errr... it is a toy. It's a gaming system, why does it need more IO bandwidth? What it seems like you're saying is, "Waaaaahhh!!! Why isn't the PS3 a good general purpose system!!!! I want to use it in my research!!!! Waaaah!!!" Here's an idea, why don't you put together your own platform made of PCI-Express cards and the cell processors?

    2. Re:Toy Supercomputer by Doc+Ruby · · Score: 2, Insightful

      Because it's a toy supercomputer. If I find a way to expand its IO, I'll have a $600 supercomputer, scalable into a supercomputer cluster.

      If I listen to you, Anonymous defeatist Coward, and just cry "waaaahhh, I'm too dumb to hack a toy into a tool", then I'll just have a really cool toy.

      Allow me to introduce you to the term hack, which is what Slashdotters used to do before we were mostly posers.

      --

      --
      make install -not war

    3. Re:Toy Supercomputer by Anonymous Coward · · Score: 0

      Heh, ESR is the biggest poser of them all, he missed out on the precious MIT hackers generation and has been trying to recreate it ever since. Also, what part of building your own system (fully open of course) using those components is not part of ``hacking''? Sony doesn't give a rat's ass about you and they really don't want you to be able to modify the PS3 in this way. They really want you to buy a PS3 and then buy games for it. Don't bitch about what sony did or didn't do, just go out there and make what you want yourself.

    4. Re:Toy Supercomputer by Doc+Ruby · · Score: 1

      Anonymous troll Coward, what the hell has ESR got to do with anything, except collecting the Jargon File, and getting you to reveal that you're a troll?

      The part of building my own Cell machine that's not "hacking" is that it's really system design, while getting the PS3 to do things its designers didn't expect is what hacking is all about. Using the existing device to exploit all its pent-up power is the soul of hacking.

      I'm not bitching about what Sony didn't do - I even pointed out that they did quite a lot. I just asked the rest of the Slashdotters if they had other ideas for hacking more IO into the limited offering of the PS3.

      Apparently you don't even know that Sony has officially inserted Linux support into the PS3 release from the beginning. Why don't you go bother someone who cares what you think about things about which you know nothing, instead of just serving as an excuse for me to bash an anonymous troll who doesn't understand hacking or community collaboration?

      --

      --
      make install -not war

    5. Re:Toy Supercomputer by giminy · · Score: 1

      Your post seems to imply that *every* operation a CPU does requires some I/O. This is hardly the case. Most of the computation that occurs in a normal well-written program only requires the CPU. For example, compare these two code snippets and tell me which one is more likely to be seen in an actual program that gets run on your computer:

      Snippet 1:

      while(list != null){
          if(list.val == i){
              break;
          list = list->next;
      }

      Snippet 2:

      fprintf(IOPORT, "Starting search\n");
      while(list != null){
          fprintf(IOPORT, "Trying %d\n", i);
          if(list.val == i){
              fprintf(IOPORT, "Found it\n");
              break;
          list = list->next;
          fprintf(IOPORT, "Advancing list\n");
      }
      fprintf(IOPORT, "Done\n");

      If you said "Snippet 1", congratulations. You just proved my point. The CPU did a lot of operations in Snippet 1 and the only thing it had to talk to "outside" the CPU was to copy data from memory into registers. If the CPU has a somewhat competent cachine mechanism and the program was written well (so 'list' is bunched together in memory), you'll see something like a > 99% cache hit rate, so you won't even have to wait for memory, only for CPU cache most of the time. Snippet 2 demonstrates a process that does as much I/O as you think we need: one I/O operation per CPU instruction, but really we don't need that much I/O because we're not debugging everything in an amateur way.

      --
      The Right Reverend K. Reid Wightman,
    6. Re:Toy Supercomputer by Doc+Ruby · · Score: 1

      Not only did I not imply that, I explicitly mentioned that the CPU:IO throughput means 2000 instructions per IO.

      You don't seem to know much about DSP. The vast majority of DSP is not logic, but arithmetic - the logic isn't usually that fast (except sometimes zero-overhead looping), but the arithmetic is extremely fast. The entire game in DSP is keeping the pipeline full. 2000:1 keeps the compute pipeline, the critical link, empty much of the time.

      Moreover, there's no time for cache fetches in DSP loops - it's all working registers (the SPEs have many) and signal data from the bus.

      Really, your comment demonstrates ignorance of both DSP and my post. Don't try to tell me about doing "everything in an amateur way" when you are the amateur and I am the expert.

      Unless you have something to tell us about increasing the PS3 IO, which your comment leads me to doubt.

      --

      --
      make install -not war

    7. Re:Toy Supercomputer by mihalis · · Score: 1

      Perhaps after the 20GB and 60GB PS3 editions (already out) there will be the HomeHPTC edition :)

    8. Re:Toy Supercomputer by Doc+Ruby · · Score: 1

      All it would take would be including PCI-Express.

      --

      --
      make install -not war

    9. Re:Toy Supercomputer by Anonymous Coward · · Score: 0

      You really don't have to worry about I/O bandwidth of the PS3, as long as its main memory is so tiny (256MiB) you can hardly fit anything interesting in there in supercomputing terms. Additionally, the Cell SPEs are pretty bad at double-precision floating-point operations (which is essentially all that matters in scientific computing), as compared to single precision (where it is really fast) - it's 14 times slower with doubles. If you can solder a few gigabytes of memory in there, and have one of the rare problems that can get away with single precision, then you may consider starting a cluster and calling it a supercomputer. But as long as you have to wait for external IO, even with SATA, Infiniband, 10Gbps Ethernet or whatever, you'll be starved of data for most of the time.

    10. Re:Toy Supercomputer by imsabbel · · Score: 1

      No, its a NON SUPERCOMPUTER TOY.

      Open your eyes. And no, stating the bandwith in bits/s instead of bytes/s to make the numbers seem larger isnt cool either. You arent going to fool anybody.

      If you consider the PS3 a supercomputer, you have to consider all modern GPUs supercomputers, too.

      (and your brainless adding up of bandwith of the different connection ports for IO is really creepy. Have you some kind of disorder in that regard? If yes, try it on a 250$ A64+MB. Come back when you are finished with jerking off to the Gbit/s..)

      --
      HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
    11. Re:Toy Supercomputer by illumin8 · · Score: 1

      The Cell is a basically 204GFLOPS/32bit machine (plus the Power RISC, basically a Mac G5), with an internal 1.6Tbps bus. But even its builtin gigabit ethernet is puny compared to that kind of dataflow.
      You fail to grasp that the internal bus is not for I/O. It's for transferring from RAM to video memory and vice-versa. When you're trying to render to 2+ megapixels in 24-bit color at 60 fps while doing real-time 3d shading effects, you can use as much memory->GPU bandwidth as possible. Gigabit ethernet is just an afterthought compared to the bandwidth modern games need between system memory and the GPU.
      --
      "When the president does it, that means it's not illegal." - Richard M. Nixon
    12. Re:Toy Supercomputer by Doc+Ruby · · Score: 1

      Except that more IO with such a fast bus means that the pipeline could refill the SPE local storages every time they're used up. And most of the apps that I want to process with a (hotrodded) PS3 are fine in single-precision float, which would need a supercomputer to crunch anyway. I want a supercomputer for nonscientific apps. Otherwise, I'd have more than $600 to spend on it.

      --

      --
      make install -not war

    13. Re:Toy Supercomputer by Doc+Ruby · · Score: 0, Flamebait

      Fuck you. Stop fantasizing about me jerking off.

      You don't win any arguments with "open your eyes" and nothing backing it up. What part of 204GFLOPS don't you understand? What part of the serial nature of IO, whether over SATA, USB, or ethernet don't you understand? What is it about addition of multiple channels for a total don't you understand?

      When modern GPUs are as programmable as in the Cell in the PS3, as demonstrated most recently by the work in the story we're now discussing, then yes, they're supercomputers.

      I know your disorder. You're a STUPID ASSHOLE.

      Did I mention FUCK YOU? FUCK YOU.

      --

      --
      make install -not war

    14. Re:Toy Supercomputer by Doc+Ruby · · Score: 1

      Why would I grasp that, when it's not true? Of course the bus is to handle IO: it's connected directly to the FlexIO on the Cell to handle all the IO on an equal basis.

      But then, you don't seem familiar with the PS3 architecture: the Cell is not the GPU, that's the (even faster and more esoteric) nVidia RSX. And I'm not even programming games, I'm using the Sony provided feature of installing Linux and programming this beast as a general purpose computer. Starting with its embedded Power RISC, then adding its parallel DSPs. So I return to what I said in my original post: if this thing only had more IO, I'd get even more use out of the Cell (and maybe the RSX, too).

      Does anyone who wants to respond in this subthread have any acutal knowledge of the Cell, preferably of ways to increase the IO? Or is it just a Slashdot orgy of ignorance?

      --

      --
      make install -not war

    15. Re:Toy Supercomputer by Anonymous Coward · · Score: 0

      Are you on fucking crack?!?!?

    16. Re:Toy Supercomputer by Doc+Ruby · · Score: 1

      Put the pipe down, Anonymous crackhead Coward. Try programming the PS3 instead. It's faster, and less delusional.

      --

      --
      make install -not war

    17. Re:Toy Supercomputer by shplorb · · Score: 1

      If you want I/O bandwidth, you're going to have to call up IBM and plonk down the cash for a Blade Center and a Cell Blade or two.

      Also, your 1.6Tb/s figure is wishful thinking as the actual throughput to the EIB is choked by the XDR and RSX bandwidth which is 50GB/s(?) combined at best. Although if you're just shuffling data between the SPU's then I guess it can get that fast. I think I'll leave it at that though since I'll probably step into NDA territory if I say anything more.

    18. Re:Toy Supercomputer by Doc+Ruby · · Score: 1

      The point of the 1.6Tbps EIB bandwidth citation is that it's the speed of IO for computation, just as you say. In fact, due to the crossconnects between SPEs, it's even faster, depending on which SPEs are linked. The RSX bandwidth isn't relevant, because the RSX is unavailable to Linux under the Hypervisor. But the FlexIO bandwidth is certainly relevant. 50GBps/200Gbps is a lot faster than the single GigE. Even getting data to the HDMI output, an unproven stunt, would at least get 10Gbps of the 200. The CompactFlash and SDIO ports could each support a GigE card, ganged together with maybe (cropped to 480Mbps) 1-4 USB GigE adapters, which could mean 3.5-6Mbps/in + 10.2Mbps/out. With a lot of fancy programming. Maybe some kind of waveguide connecting the 802.11g to an AP for another measly 54Mbps, and maybe some crazy SATA protocol for another 1.3Mbps.

      Or maybe I have to solder a PCIe onto a bus somewhere (if only) ;). Or spend $8K for each blade, without any other HW, plus its backplane, probably from Mercury. At $600 each PS3, it's worth trying something to widen the bottleneck.

      --

      --
      make install -not war

  19. How many cores? by Aladrin · · Score: 2, Informative

    "For his demo he created a program on the PlayStation 3 representing thousands of chickens, each independently tracked by a single processing core. "

    Wait wait wait... How many cores does a PS3 have? Thousands? I suspect someone has their facts sadly mistaken. I think they meant 'each with its own thread and using multiple cores to processing the threads,' but that isn't nearly as impressive sounding.

    --
    "If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
    1. Re:How many cores? by Anonymous Coward · · Score: 1, Insightful

      Respectfully, one core per chicken does not equate to one chicken per core, sir.

  20. Re:Don't Bother by statusbar · · Score: 1

    Wait, can you tell me how you can get to parallel tasks that need to access the same resource to coordinate in a more performant manner?

    "Transactional memory"

    --jeffk++

    --
    ipv6 is my vpn
  21. Where are the chicks? by Meneth · · Score: 2, Funny

    I browsed through the 411 MB ogg file, but could not find any chicks. Where are they?

    1. Re:Where are the chicks? by dim5 · · Score: 1

      I browsed through the 411 MB ogg file, but could not find any chicks. Where are they?
      Dude, if you're expecting to get chicks by posting on slashdot about browsing an ogg file, I've got some bad news...
      --

      Is something burning?
      Oh, it's my karma.

    2. Re:Where are the chicks? by Anonymous Coward · · Score: 0

      Spoken like a true 35-year-old teenager. You must be a virgin as well.

    3. Re:Where are the chicks? by alex4u2nv · · Score: 0

      That's what to expect when a geek gets into the porn business ;)

  22. Relativity by stratjakt · · Score: 2, Interesting

    However, multi-threaded development has been notoriously hard to do

    Only at first, once you wrap your head around it it becomes second nature.

    To a newbie, recursion is hard to do. To somebody who's been writing functional FORTRAN for 25 years, object oriented is hard to do.

    It's just another way of thinking about problems. The real bitch is having the toolkits and thread safe libraries at your disposal.

    --
    I don't need no instructions to know how to rock!!!!
    1. Re:Relativity by fitten · · Score: 1

      Yup. Creating threads is easy... have a function of the appropriate signature and call pthread_create or CreateThread to your heart's content. Data partitioning (solving/minimizing data contention), that's the 'hard' part and they haven't really had luck doing this automagically yet. Even with OpenMP you still have to tell the compiler where it can do it's 'magic'.

    2. Re:Relativity by Anonymous Coward · · Score: 2, Informative

      The real bitch is when you have a bug because your bug is not reproductible as easily than in any other programming method.
      So no its not only 'another way of thinking'.
      And good luck trying to 'extend' multithreaded stuff.
      Multithreading should only be used on very special occasions where it is really needed.
      That is harly ever in most end users applications.

  23. Active Objects by lefticus · · Score: 2, Interesting

    I'm not sure what techniques the developer is using as the um, "article" is a little light on details (unless I missed something) But the concept of Active Objects (a trivialized way of using threads) has been around for a while with generic implementations of them becoming more mainstream rapidly. In the past week there as been much discussion about active objects and "futures" on the boost mailing list and it is likely that both will become part of boost shortly. To put it simply, an active object is an object which has its own threaded message queue, so it is asynchronous from the rest of the system and a future is a return value from an asynchronous method call, a "future" value. These techniques are quite reasonable today because of concepts like fibers and the NPTL.

    And of course, a shameless plug for my active objects implementation (bsd style license). Actually, that page also does a decent job of demonstrating the concepts.

    1. Re:Active Objects by lefticus · · Score: 1

      Hate to reply to my own message but the shameless plug should be http://source.emptycrate.com/projects/activeobject s/... I didn't check my url's closely enough.

  24. Re:Don't Bother by Nimey · · Score: 3, Funny

    99.99% of the time
    ...people who use that phrase don't know what they're talking about, and especially don't grok statistics.
    --
    Hail Eris, full of mischief...

    E pluribus sanguinem
  25. Mod parent up, PLEASE by VorlonFog · · Score: 1

    Mod parent up, PLEASE

  26. Life is Pain by gillbates · · Score: 2, Insightful

    First of all, I, and many others before me, have been writing multithreaded applications for years in the likes of Linux and UNIX. I have had to maintain multithreaded applications created by others. My collective experience tells me:

    It is not trivial.

    Let me repeat: It is not a trivial task. Even if you have libraries and an API which abstracts out the ugly stuff, you still have the problem of concurrency, proper locking, deadlocks, etc...

    The majority of problems with using multithreaded programming come not from "ugly" parts of the OS/API layer, but from a misunderstanding of the problem. A few problems in computer science - particularly in the physical sciences - do benefit from multithreading. And it is easier to use threads when writing a game than just to execute all of the IO in one big loop (Hello DOS!). But for most applications, using threads is not only unnecessary, but overkill, and introduces the possibility of yet another class of bugs for which the application must be tested. Furthermore, as deadlock and race conditions are often timing related, they are the most difficult type of bug to find and fix. Finding and fixing this class of bugs is still somewhat of a black art in the industry, and is highly dependent on the skill and experience of the programmer.

    In short, unless your system/application design cannot do without multithreaded programming, it is best not to use it. Even with a glossy API, you still cannot escape the fact that debugging a multithreaded application is an order of magnitude more difficult than a single threaded one. In any case, you shouldn't be using threads just because you can.

    --
    The society for a thought-free internet welcomes you.
    1. Re:Life is Pain by MS-06FZ · · Score: 1

      Besides, pain and guilt can't be taken away with the wave of a magic wand. They're the things we carry with us, the things that make us who we are. If we lose them, we lose ourselves. I, for one, don't want my pain taken away. I need my pain!

      --
      ---GEC
      I'm but the humble pupil, seeking to snatch the scratchbuilt pebble from the master's fully articulated hand
    2. Re:Life is Pain by shutdown+-p+now · · Score: 1

      The problem is not with multithreading, it's with the shared writeable data multithreading model used by C++/Java/... world. When your threads only communicate by message passing, for example (see Erlang), many problems just go away. Even more do when you don't have any mutable data at all (e.g. Haskell) - in this case the compiler can create threads on its own as it sees fit, and not worry about locking, since there's no place a write conflict could occur.

  27. Re:Don't Bother by fitten · · Score: 1

    99.99% of the time, multi-thredded programming is not needed, and can actually *Slow* things down as mutexes block each other, killing performance - It takes a certain amount of time to establish a mutex, so two threads working on a single bit of memory can perform like a dog as each try to block, and unblock the other..

    Your first statement is generally true - you should use multi-threaded programming when/where it makes sense. The rest of your statement calls out a common issue that all multi-threaded (even multi-process) programming has to deal with - data contention. Most people who program multi-threaded code know about this already, since data partitioning and contention are what you typically spend the majority of time analyzing when you write multi-threaded code. Your example displays what would be horribly written multi-threaded code and is not the norm, since it is the very thing that multi-threaded programmers spend 90% or more of their time trying to avoid.
  28. VB.net by dlhm · · Score: 1

    I program in Vb.net and use multithreading for work on pictures.. jpegs and mpegs and such. I'm know I'm not a great programmer but Multithreading has allways been a pain in the keester for me. I still use vb.net 2000, but hofully they fixed the starting of stopped thread issue. or maybe reading a threads state accurately without having to pause the system for a few NS. I need to learn C I guess. maybe some classes would help.. maybe I shoudl just but the newest rendition of Visual Studio..

    --
    Ad eundum quo nemo ante iit!
    1. Re:VB.net by revlayle · · Score: 1

      Yeah, .NET 2.0 has some improved thread functionality/libraries - still works the same, just access to a bit more stuff

  29. Re:Don't Bother by tonywestonuk · · Score: 1

    I take your point about multi-core been around for years, however...What I should have been more clear about, is if 1 thread ends up in a infinate-loop, then, on a single core Windows PC, the whole OS starts to crawl, somtimes making it difficult to even open 'Windows Task manager' in order to kill the rogue process. Multiple processors/cores get around this problem - there will always be the idle cpu for windows to make up for its dodgey task scheduler. "Wait, can you tell me how you can get to parallel tasks that need to access the same resource to coordinate in a more performant manner?" - Nope, I cant. But, if you create some multithredded code, in which you create 2 threads that compete with each other for the same resource, then this is shite design, and the performance of your code will almost certainly be singly thredded code doing the same thing.

  30. Pthreads? by gillbates · · Score: 3, Informative

    Pthreads has been out for a while. It is open source, and runs on Linux, Windows, and Mac(?).

    Whether or not you believe concurrency should be an explicit library or a matter of compiler extension is a bit of a religious argument. But pthreads does offer the functionality, and works fairly well.

    --
    The society for a thought-free internet welcomes you.
    1. Re:Pthreads? by Gospodin · · Score: 1

      And Pthreads is a C API. TFS says this is C++. Still, it's not clear how this is better than Boost.Threads.

      --
      ...following the principles of Heisenburger's Uncertain Cat...
    2. Re:Pthreads? by Wesley+Felter · · Score: 1

      Pthreads don't work on clusters, GPUs, or Cell, and they are about 100 times harder to program than new data-parallel APIs/languages that are now being developed. This is probably explained in the video but since this is Slashdot, neither of us bothered to watch it.

  31. JUST GIVE US THE CHICKENS by steveoc · · Score: 1

    Geez - at 411MB, it better be a complete operating system, plus development toolchain.

    Many complete distros fit in under 411MB.

    Nuts --

    Just give us a little animated GIF of the chickens, we dont believe the rest of your claims anyway.

    1. Re:JUST GIVE US THE CHICKENS by Anonymous Coward · · Score: 0

      Check out the 2D version of the chicken game here : FreeRange.
      Sorry, only uses one core.

  32. Dr Stephan better hold off that new mortgage! by BillGatesLoveChild · · Score: 1

    Agreed. The question is "will this be free?" If the answer is "no" then everyone leaves and Dr. Stephan finishes giving his speech to the guy picking up the rubbish.

    Despite what the USPTO* clerks tell you, programming ideas are a dime a dozen. He's got as much chance of getting you to pay for this as I have of convincing all you C++ programmers to switch to my new proprietary (*D)++++(R)(TM) language. Only $1,250 a seat! What are you waiting for boys?

    * = At least the guy who picks up garbage knows trash when he sees it.

  33. Re:Don't Bother by Anonymous Coward · · Score: 0

    "Transactional memory"

    How is that more performant? If I have thread A and thread B trying to modify object C, if I use the traditional lock method I lock C while I'm writing to it. In the transactional memory model, some mechanism has to eventually do the same thing. You might have taken the load off of the higher level developer, but you've simply pushed the performance problems onto someone else.

    As a side note, back in the late 80's, I worked on a 4GL that utilized a VM that used the transactional memory model. It was quite nice from an application developers standpoint and ANY modifications made by an application (memory, file, db, etc) was handled by the transaction manager so when a logical operation failed, EVERYTHING would get rolled back. So this concept is not new (and I'm sure others can come up with examples from even earlier).

  34. Hey, Why so many -1 Overrated moderations?? by tonywestonuk · · Score: 0

    My comment is valid, true, and anyone who has spent months dealing with code developed by cowboy coders who thought nothing of dropping in a new thread to perform almost every operation, without thought, will agree with me.

  35. OS/2? by SCHecklerX · · Score: 1

    I seem to recall pretty much every app I used under OS/2 took advantage of threading. The workplace shell, of course, being the prime example. This was in 1992.

    The problem, I think, is that the majority of programmers out there today who were just hobbyists back then, were learning on a very single-threaded platform. Because the model was never there, it's 'hard'. With OS/2 3+, it was always there, and anybody who dabbled on that platform were immediately exposed to how to implement threads, as they were such a core piece of the OS.

    1. Re:OS/2? by arthurpaliden · · Score: 1

      The problem is that today "programmers" are not tought to program they are tought to use IDEs.

  36. Comments from the presenter by sdt · · Score: 5, Insightful

    Good morning slashdot!

    As the (slightly terrified to find himself mentioned on slashdot) presenter in the video linked to above I thought I'd respond to a couple of comments in bulk. First off, I'm part of a much bigger team at RapidMind that builds this software to make targeting multicore and stream processors easier -- the system and the "chicken demo" was a group effort, and you can read more about it and the company in general in the article linked to from here, which unfortunately is PDF-only.

    For those crying out about multi-threading not being the solution: you're absolutely right! Our platform's approach to programming multi-core processors is to expose a data parallel model. In this model, the programmer explicitly deals with parallel programming (writing algorithms to work well on arbitrarily many cores) but all of the standard multi-threading issues such as deadlocks and race conditions are avoided, and the developer doesn't worry about how many cores there actually are.

    And no, the chicken demo didn't run each chicken on an individual core ;). But it did automatically scale to however many cores were available -- 6 SPUs and a PPU on the PS3, and 16 SPUs and 2 PPUs on a Cell Blade (on which we originally showed the simulation at GDC 2006).

    If you want to learn more, drop by our website at http://www.rapidmind.net. You can sign up for a free no-strings-attached evaluation version if you want to try it yourself.

    1. Re:Comments from the presenter by qualidafial · · Score: 1

      Some of the slides are whitewashed on the video, making it difficult to read. Can you post a link to the slides in pdf or powerpoint format online?

  37. Re:Don't Bother by Dog-Cow · · Score: 3, Informative

    "Plus your statement is misleading, very few major apps are single threaded, the OS itself has a ton of stuff going on in the background, there are demons/services running all the time."

    Plus, you completely and utterly missed the point of the poster you replied to. Most apps (who cares about major?) are single-threaded. The poster's point is that writing a multi-threaded app JUST BECAUSE THERE ARE MORE CPUs/CORES to handle them is pointless and stupid. If the app only requires a single thread, use just one. The other resources will get used by the OS or by other apps (that may, God forbid, *also* be single-threaded). He wasn't talking about dedicating a computing resource to an app. He was saying that an app should only use what it needs, with the understanding that the OS will make good use of any remaining resources for other tasks.

    What a lot of multi-thread-happy people seem to miss is that as long as the OS is multi-tasking, the other resources will not go to waste just because the app in the foreground isn't using them.

  38. Or you could just use Ada by Black+Parrot · · Score: 2, Insightful

    which has had easy-to-use multithreading constructs built right into the language for the past 25 years or so.

    --
    Sheesh, evil *and* a jerk. -- Jade
    1. Re:Or you could just use Ada by Coryoth · · Score: 2, Insightful

      Unfortunately I think many programmers that read Slashdot are scared off by the clear, readable, maintainable syntax. Typing end is clearly too much work, or something, and as we all know IDEs can't possibly help with that... I would like to see Ada get more use, but unfortunately I doubt it is going to happen.

    2. Re:Or you could just use Ada by Anonymous Coward · · Score: 0

      Ada83's multitasking was a bit ordinary - sometimes you would use the rendezvous mechanism for simple data passing, resulting in abstraction inversion - using high level abstraction for a low level job.

      Ada95 was much better in this regard - intorducing protected objects which do all the funky monitor like protection with good stuff such as guards. All very -niiiaaace-!

  39. Tripe by l4m3z0r · · Score: 1

    This is ridiculous tripe. Multi-threaded programming is hard not because the libraries are hard to use but because it requires alot of planning and thought to decide if you can actually gain a benefit by going multi-threaded.

    The main benefit of multiple cores will not happen in userland. It will be in the kernels and the libcs'. Once userland processes can effectively get memory from the heap with minimal locking we will see a performance boost system wide(I'm talking 100 processes can all request memory from the heap with 0 locking). This is why cores are important, because we will be running more and more applications on our computers, we won't need the performance in a specific app, we will need all of our apps to be responsive all the time.

  40. Re:Don't Bother by Anonymous Coward · · Score: 0

    But, if you create some multithredded code, in which you create 2 threads that compete with each other for the same resource, then this is shite design, and the performance of your code will almost certainly be singly thredded code doing the same thing.

    I assume you meant to say that the singly threaded code will certainly be better/faster? I would whole heartedly disagree with that generalization. It is very common to solve problems by multithreading where some common resources do need to be shared where the design is quite reasonable. As an extreme case, take something like SETI. You are processing large datasets where you break the data into chunks and kick off threads to churn on each chunk. Lets say it takes several minutes for each thread to process it's chunk and then it updates an internal results object. So you serialize access to the results object by using locks. How is this crappy design?

    Now I agree, it is certainly possible to create some crappy code because you are more intent on multi-threading the thing vs coming up with a good solution that isn't necessarily multi-threaded. But there are tons of problems that do fit quite nicely into MT/MP patterns that don't scale well or at all being serialized. My opposition to your statement is that I believe you are over generalizing.

  41. 22 hours to 22 minutes with threading by dbdweeb · · Score: 1

    I have some practical experience on this one.

    I inherited a massive data collection routine which I identified as a good candidate for threading. When I mentioned the idea to the original duhveloper his eyes just kinda glazed over. The objection was that the routine has always "worked" so why introduce risk with increased complexity? After he left I jumped in and multi-threaded it. I was able to thread 20 already busy servers all working for me at the same time and each server was threading stuff without any performance degredation. Normal 24/7 operations with many concurrent production sessions were not impacted in the least. The end result was that the former 22 hour long process now completes in 22 to 41 minutes and it's much more stable and reliable. And with a good thread pool class it really wasn't that complicated. Actually, the hardest thing was getting beyond the glazed eyed reservations of a clueless duhveloper who was too timorous to try something "new."

  42. Why did the multi threaded chicken cross the road by JackMeyhoff · · Score: 1, Redundant

    Why did the multi threaded chicken cross the road? to the other side To get the other side To get to the

    --
    http://www.rense.com/general79/wdx1.htm
  43. use ADA by Anonymous Coward · · Score: 0

    A programming language designed for multithreads.

  44. Re:Don't Bother by Retric · · Score: 2, Interesting

    You must not work with network apps all that much.

    Think of the most basic email app possible. Now when a user presses "send mail" would you create a new fork (), try and micro manage the remote connation in a thread that handles the GUI, or force the user to wait around?

    Next think about video where you have a resource intensive task AND you still want a highly responsive GUI.

    Granted if all you ever work with is simple biz apps with one user you have a point but I think your 99.99% estimate says more about the work you do than programming in general. Because threads can often simply demanding applications.

  45. Ah, but... by Svartalf · · Score: 1

    SH, from which RapidMind's core tech came from, is FOSS and you can do
    many of the things their stuff does with SH.

    --
    I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
  46. There is already an open source solution by master_p · · Score: 1

    And it is called boost::futures

    .

    The theory behind it, though, it not new: the Actor model is quite old, and it has been used in Erlang for quite sometime.

  47. CSP Occam and Transputers by Anonymous Coward · · Score: 3, Interesting

    The communicating Sequential Processes style of programming allows for many lightweight simple threads that communicate over channels rather that the monitor based thread synchronization.

    The OCCAM language implemented this style of processing and the Transputer chip implemented a fast context switching hardware that OCCAM could run on.

    This was all done back in the 1980s.

    I even implemented the original version of the Java Communicating Sequential Processes API which brought CSP style programming to the Java world, although it is based on Java's underlying Thread mechanism so context switching isn't as fast as it could be.

    1. Re:CSP Occam and Transputers by N+Monkey · · Score: 1

      I'd mod the parent up if I had any points but as I haven't, I'll just add my two cents' worth. I used to program in Occam and it soon became second nature to throw threads at a problem because it was so easy to do it.

    2. Re:CSP Occam and Transputers by TwistedSquare · · Score: 1

      I even implemented the original version of the Java Communicating Sequential Processes API which brought CSP style programming to the Java world, although it is based on Java's underlying Thread mechanism so context switching isn't as fast as it could be. Assuming that's true, that would make you Paul Austin. Hi. JCSP is still going strong -- we've recently been adding extended rendezvous and poison, new release to follow shortly. If you're interested in developing on JCSP again, let me know. While we're on the subject of C++ and CSP in this thread: C++CSP. Again, new version to follow shortly (so much to do...)

  48. Re:Don't Bother by Anonymous Coward · · Score: 0

    Most apps (who cares about major?) are single-threaded.

    Oh really. Hmm let's see on my box right now:

    firefox - 20 threads
    thunderbird - 6 threads
    vlc - 7 threads
    acrobat - 3 threads
    foobar2k - 7 threads

    And of course all the Office apps which are all multi-threaded.

    The poster's point is that writing a multi-threaded app JUST BECAUSE THERE ARE MORE CPUs/CORES to handle them is pointless and stupid. If the app only requires a single thread, use just one.

    Actually I agree with the statement. However the original poster might have meant that, but his words taken explicitly painted a different, and quite wrong, picture of multi-threading.

    What a lot of multi-thread-happy people seem to miss is that as long as the OS is multi-tasking, the other resources will not go to waste just because the app in the foreground isn't using them.

    I don't multithread because I'm worried about "the other resources going to waste", I use it because MY apps resources might be going to waste. If I'm blocked waiting for some IO, there might be other things I could be doing in the mean time. I don't care that the OS is happily using those spare cycles for other stuff, it's my app that _could_ benefit from those cycles. I've lived in the "server" world for a very long time, and those times blocked by IO and taking advantage of multi-cores/cpu's comes up over and over. In my world saying that "99.99%" of multi-threading (and multi-processes) is a waste is simply wrong. Heck look at this conversation we're having. It is utilizing server code that spawns multiple code paths to share common resources (web servers, most likely not multi-threaded, but multi-processed). Our visualization app utlizes multiple threads such that it can do IO, render contents, and accept user inputs concurrently to provide a better user experience. How can anyone say that the overwhelming majority of multi-threading is not necessary to the point of being stupid?

  49. Transactional Memory by omnirealm · · Score: 3, Interesting

    For those who have not caught wind of this yet, transactional memory is currently the most promising solution to this problem and perhaps the most-covered subject in research conferences on parallel computing today. There have been several proposals for both hardware-based (at the cache level) and software-based architectures. Transactional memory greatly simplifies concurrent programming. When using transactions instead of locks, deadlocks go away completely and there is increased concurrency.

    --
    An unjust law is no law at all. - St. Augustine
  50. This isn't multithreading in the traditional sense by igomaniac · · Score: 3, Informative

    There's a lot of posts saying that multithreading is really hard, which is completely true... But what RapidMind is providing is something else, something more like a SIMD model or vector computations. It solves things like elementwise operations on large arrays in an efficient manner using whatever parallel computing resources are available. It's a language with a semantics that don't require complicated synchronisation because you're bascially telling the compiler which operations are independent and then it can go off and compute it in the most efficient way possible. RapidMind was designed to make GPGPU programming easy, so it's a generalisation of the pixel shader model where you have a lot of 'threads' computing the color of each pixel on the display in parallel. This is an easy problem, because there is basically no communication between threads.

    --

    The interactive way to Go -- http://www.playgo.to/iwtg/en/
  51. use gcc4.2 by drerwk · · Score: 2, Informative

    Yes, gcc 4.2 supports OpenMP. As others note, parallel programming is still not trivial. But OpenMP is very nice. I have a write up on building and testing gcc 4.2 on OS X here: http://alphakilo.com/openmp-on-os-x-using-gcc-42/. Serious advantages are that OpenMP can be retrofitted to existing C/C++ and Fortran code. I know that everyone prefers to start from scratch and use Erlang or some other solution, but in a project I am working on, we already have about a million lines of C++. Current OpenMP implementations favor SMP machines, but one can go even further with the Intel OpenMP for clusters solution. I have not tried it myself yet, but I understand that it makes the issue of non-shared memory across the cluster machines transparent. As in all cases YMMV. But, if your code is amenable to parallelizing, OpenMP is a pretty straight forward way to go.

  52. Re:Don't Bother by tonywestonuk · · Score: 1

    I work with network apps on a daily basis, thanks!!. Instead of using a new thread to handle each network transaction, the comms layer is fully asynchrounous, and never blocks. And so, the gui can submit requests to the server without fear of becoming unresponsive. Responses are placed back onto the gui thread when received.

  53. Re:Don't Bother by statusbar · · Score: 1

    How is that more performant?

    As the articles say, the lock pressure is moved from the reader to the writer. Transactional Memory scales amazingly better when you have multiple threads which are reading common data. Please note that in today's system architectures even READING data on different cores at the same time may not be thread safe without memory barriers put into place to synchronize the caches.

    There have been many papers written about the efficiency gains.

    And as a bonus, writing multithreaded software with a "Transactional Memory" scheme is easier to "prove correct".

    --jeffk++

    --
    ipv6 is my vpn
  54. Valve was already working on this by wellingj · · Score: 1

    so maybe it wasn't on the PS3 but valve seems to take a more practical apoach.
    This Article has some good info on what Valve is doing to bring threads into gaming.
    Personaly I'd rather use their model because it's more utilitarian and less 'Because I can'.

  55. This is about propping up an obsolete technology by ClosedSource · · Score: 2

    "Gigahertz are out and cores are in. Programmers must begin to develop applications that take full advantage of the increasing number of cores present in modern computers."

    The marketplace wants and needs new technologies for more powerful processors. Multicore serves the needs of chip makers, not their customers. Making all software multi-threaded is trying to solve the wrong problem. It's going to result in lower-quality software without a significant increase in performance.

  56. Been There, Done That by Anonymous Coward · · Score: 0

    The real need for threading over multiple processors ("distributed computing") is in scientific applications, where tons of data needs to be tediously crunched.

    The company "Interactive Supercomputing" offers a product that takes your code, and "parallelizes (multi-threads) it" with almost no additional work, allowing you to run it on multi-processor systems, or even on networked grids and clusters of numerous computers.

    They started with a module for MATLAB, since that's a language commonly used to perform data-intensive scientific tasks, but are now working on a Python implementation!

  57. Making multithreaded programming easier by SlideRuleGuy · · Score: 1

    Back in April of 1989, the Communications of the ACM published an article describing a language for just this sort of purpose, called Linda. It provided elegance and simplicity in multithreaded programming at the expense of more overhead for coordination (always a tradeoff). Communication was done by each thread putting the results of its processing into a shared pool, from which downstream threads would periodically take messages and perform further processing. No synchronization really, just producers and consumers operating on this shared pool of data. Obviously this would not be a silver bullet for every multithreaded need, but strong in the "simple" department.

    1. Re:Making multithreaded programming easier by Anonymous Coward · · Score: 0
      Right, coordination languages like Linda are one thing Edward Lee mention in his IEEE Article, "The Problem With Threads".

      Threads are a seemingly straightforward adaptation of the dominant sequential model of computation to concurrent systems. Languages require little or no syntactic changes to support threads, and operating systems and architectures have evolved to efficiently support them. Many technologists are pushing for increased use of multithreading in software in order to take advantage of the predicted increases in parallelism in computer architectures. In this paper, I argue that this is not a good idea. Although threads seem to be a small step from sequential computation, in fact, they represent a huge step. They discard the most essential and appealing properties of sequential computation: understandability, predictability, and determinism. Threads, as a model of computation, are wildly nondeterministic, and the job of the programmer becomes one of pruning that nondeterminism. Although many research techniques improve the model by offering more effective pruning, I argue that this is approaching the problem backwards. Rather than pruning nondeterminism, we should build from essentially deterministic, composable components. Nondeterminism should be explicitly and judiciously introduced where needed, rather than removed where not needed. The consequences of this principle are profound. I argue for the development of concurrent coordination languages based on sound, composable formalisms. I believe that such languages will yield much more reliable, and more concurrent programs.
    2. Re:Making multithreaded programming easier by Anonymous Coward · · Score: 0
      I can't log in right now, but just wanted to mention that I have a little experience with a variation of the Linda system (it's an elegant add-on to a language, not a language itself), and I like the idea a lot. But it has one downside, in that there's no guarantee that your "in" operations are ever going to match anything put "out" by another process, making many bugs invisible. This is similar to the problem that relational databases have without foreign key constraints, and if there were no table definitions you could imagine the garbage that would accumulate in a database "tuple space". Additional schema constraints are vital to make the Linda concept work reliably in a production envirionment.

      Still, the idea was too good to be ignored - JINI (and JavaSpaces) is based on the idea, and fairly successful in its niche.

  58. Re:Don't Bother by Retric · · Score: 1

    I don't think you getting the kind of advantages you might assume with this approach. First off with non blocking IO you're wasting a lot of CPU time checking to see if you have a response vs. blocking. Second your manually balancing performance issues which is something modern operating systems are vary good at. Finally you're limiting the kinds of third party tools you can use.

    Now you could easily have a GUI thread that uses message queues to talk with the networking layer(s) without changing much at all. Add in a simple thread pool to handle performance issues and you're going to have a more responsive and scalable app which also more flexible. (Granted if you have already spent a lot of time tuning things it's not such an issue.)

    And, granted in some ways it's easer to debug your style of app, but as soon as have random developers mucking around in the code you're heading for trouble.

  59. Web apps = multithreaded programming made easy by xxxJonBoyxxx · · Score: 1

    Programmers must begin to develop applications that take full advantage of the increasing number of cores present in modern computers. However, multi-threaded development has been notoriously hard to do.


    Actually, the rise of web apps backed by a multithreaded web server and a multithreaded DB in the 1990s made this pretty easy for millions of people.
  60. Define "Core"... by Anonymous Coward · · Score: 0

    Each one of those 80 cores only contains two FPUs and a simple message router. Such a core is *nowhere* near the complexity of an Opteron or a Core 2. The chip resembles a DSP or a GPU more than a real CPU. Sure I could fit more than a thousand 286 cores onto a chip, but I wouldn't want to run Linux on that (because even the simple x86 ops would take several cycles apiece -- transistors weren't thrown at pipelining or superscalarity). Not that Intel's baby isn't laudable as an engineering achievement, but they are grossly mis-marketing and over-hyping it -- almost lying about the nature of that beast.

  61. revolution in programming by Anonymous Coward · · Score: 0

    I think it is pretty clear this represents a *paradigm shift* in the programming of thousands of animated chickens. I'm throwing away all the old software tools I used to *leverage* for this purpose. This will vastly increase my *efficiency*. I have a feeling it will also bleed over into different, but similar tasks -- like programming thousands of animated pigs or possibly ducks. It is not just an *evolution* in our ability to program thousands of chickens, but rather a *revolution* in the sense that it is *disruptive technology*.

  62. Threads are threads; Processes are processes by Anonymous Coward · · Score: 0

    Separate processes allow related tasks, such as pipelines, to run on *part* of the task. So, Pipes, socketpairs, and FIFOs require a master slave organization. Shared Memory (like threads) still requires locks to prevent two processes changing data at once. (I believe) Job controlled processes can run apart, with un-related processes in-between, slowing the calculation.

    Threads support better co-operative or competitive models. They do not need to build IPC sockets to communicate. Threads allow "Thread Specific Data" blocks where it's vital to not share variables. Also, it is a matter of *documentation* to mark "shared variable" sections, rather than use Shared Memory to tell sections apart. Since threads run *only* when their common process is active, It is easier to synchronize them for a common goal.

    Matrix algebra is one such co-operative activity. Each dot-product calculation can occur concurrently. The true number of dot-product threads depends on the hardware's CPU count. Modern Fortan compilers will create this in "parallel mode" without any code change.

    Games are competitive models. Each thread represents a condender, such as a race car, or a Prolog search tree. Once a condender reaches its goal, it tries to lock a mutex for itself. The race car found an open space on the track. Prolog found the best (first) path to a goal.

  63. Re:Don't Bother by Anonymous Coward · · Score: 0

    First off with non blocking IO you're wasting a lot of CPU time checking to see if you have a response vs. blocking.

    Not true if the lower layer is using callbacks. Your assuming that your using some type of polling which isn't necessarily the case. Of course if you're using a callback, then you have an implicitly multi-threaded app, but that's another story. But generally yes, it's been my experience that people use some manner of polling, which can be ok, but I've seen more poorly written polling code than multi-threaded code so you haven't really bought much from a pragmatic pov.

  64. You're lost :) by Duncan3 · · Score: 1

    "However, multi-threaded development has been notoriously hard to do.

    if ( statement.agree() )
        leave_slashdot_now();

    Noone I've ever met thinks this is even a tiny bit difficult. Maybe the problem is that there are not yet compilers AT ALL for these new chips? Nah, blame multithreading!

    --
    - Adam L. Beberg - The Cosm Project - http://www.mithral.com/
    1. Re:You're lost :) by Raffaello · · Score: 1

      Hans Boehm thinks that it is not only difficult, it is impossible. In fact, he has a paper showing why.

      The bottom line is that those who think it is easy or straightforward have been unwittingly creating non-deterministic programs that do unexpected things at seemingly random intervals. These non-deterministic things are usually written off as random crashes, but they are due to the fact that threading cannot be implemented at the library level - it must be built into the language standard and compiler to work correctly.

  65. Sequoia by Wesley+Felter · · Score: 1

    Stanford's Sequoia compiler is supposed to be open source eventually. Plus, it looks a little easier to use than RapidMind, PeakStream, or CUDA.

  66. Dining Chickens Problem by MS-06FZ · · Score: 3, Insightful

    I'm sure the demonstration would've been a lot more difficult if he'd used philosophers instead of chickens. Thing is, chickens can't even hold chopsticks. A chicken just goes straight for the feed, so there's just one resource being acquired. It's still possible for a chicken to starve, but as chickens don't eat that much it's more likely that any shut-out chickens would simply go hungry for a while, and then get to eat before starving.

    --
    ---GEC
    I'm but the humble pupil, seeking to snatch the scratchbuilt pebble from the master's fully articulated hand
  67. painful by nanosquid · · Score: 1

    I've probably used half a dozen different parallel extensions of C and C++ over the years. This one doesn't look revolutionary, it merely looks painful.

  68. Re:Don't Bother by Anonymous Coward · · Score: 0

    As the articles say, the lock pressure is moved from the reader to the writer. Transactional Memory scales amazingly better when you have multiple threads which are reading common data.

    OK, maybe I'm being dense here, but even if readers don't have to lock explicitly, at some point they need the new view of the data (i.e a writer has come along and modified the data). What is the mechanism to perform this "switch" that can be done in a way that doesn't require some type of lock even if it's to simply switch some pointers around. I guess one question I should ask is that are we talking about userland code here? I've been assuming yes. If you have support at the OS/hardware level for these transactional memory mechanism than sure, I can easily see how you can manipulate memory at a low enough level to maintain performance. Please enlighten me and/or point me to some articles.

  69. The simplest way to get it right. by arthurpaliden · · Score: 1

    Proper analysis of the problem to be solved and then the creation of a functional specification, with lots of diagrams of process and data flow, that is then vetted by your peers.

  70. Re:Don't Bother by drxenos · · Score: 1

    Most apps? I think you are living in a PC-centric world (and even then I do not believe its true). Ever write a real-time embedded system? Its pretty much impossible to prove you can meet several hard dead-lines with a single thread. And most systems in this realm do not use time-slicing (which I assume you mean when you wrote "multi-tasking"). Not all multitasking OSes are time-sliced. In the real-time world we usually use OSes that do cooperative multitasking that are priority preemptive. Unless a higher priority task wakes, or the owning task blocks, a task has to processor until it gives it up (kind of how Windows 3 worked). It's wrong to assume that all programs and programmers use a hosted system (Windows, Unix, etc.). Actually, there are more embedded systems in the world than PCs, Macs, and such. People who rail against multi-threading have no concept there are systems out there unlike their home computers.

    --


    Anonymous Cowards suck.
  71. Database! by Tablizer · · Score: 1

    Those of us who've worked on client/server types of apps have grown accustumed to managing mutliple processes (users) via databases. The database becomes the central "coordination engine", and A.C.I.D.-compliant databases ensure there are no stuck locks or race conditions. There may be lessons from this client/server world for a smaller scale. (Actually, I used to do it with desktop databases also, such as FoxPro; however, it was not ACID-compliant at the time, but close enough to be practical. As long as a user knew who had locked something, they could cordinate it among themselves.)

  72. He so punny! by nick_davison · · Score: 1
    From the abstract:

    The Multi- core revolution promises to provide unparalleled increases in performance.
  73. Was the question answered? by Numbah+One · · Score: 1

    Did any of the chickens cross the road? If so, did the chicken say why?

  74. Bjarne knows C++ isn't very suitable for MT by Anonymous Coward · · Score: 0

    Isn't one of C++0x's main features MultiThreading support?

  75. Re:Transactional Memory is no panacea by cpeterso · · Score: 1
  76. Multithreaded programming the easy way by pbrooks100 · · Score: 2, Informative
  77. Download and play with QtConcurrent today by IceFox · · Score: 2, Interesting

    A project that you can download and play with today is Trolltech's QtConcurrent. Given a task it will automatically manage creating threads and distributing the task among your cores.

    From the project page:

    The classes and functions available in the Qt Concurrent package allows you to write multi-threaded applications without having to use the basic threading synchronization primitives such as mutexes and wait conditions. This makes it easier to reason about and test parallel programs to make sure that they are correct.
    The Qt Concurrent components manage the threads they use automatically. Each application has a global thread counter, which limits the maximum number threads used at the same time. The maximum is scaled according to the number of CPU cores on the system at runtime. This means that programs written with Qt Concurrent today will continue to scale when deployed on many-core systems in the future.

    Very cool.

    --
    Do you changes clothes while making the "chee-chee-cha-cha-choh" transformation sound?
  78. Processes, threads, it's all parallelization by tepples · · Score: 1

    If one thread can run on core A and one thread can run on core B, what makes you think one process can't run on core A and another on core B? Because processes and threads are ultimately the same thing. Processes can share memory, and threads can pass messages. Breaking up an app into efficient processes is as hard as breaking it up into efficient threads. If you were making a web browser or a first-person shooter, what tasks would you give to one process and what to another?
  79. Waste of a second core? by tepples · · Score: 1

    However, it takes a great deal of burying one's head in the sand to think that the foreground task is the only program doing anything related to the user experience on a desktop machine. But do the other processes use the entire second, third, and fourth cores? If not, then time on the other cores will go unused unless the foreground application is broken up into threads or processes that execute in parallel.
    1. Re:Waste of a second core? by ari_j · · Score: 1

      If there is an actual performance benefit to doing so, then a careful reading of my comments here shows that I'm not at all opposed to doing so. It's when people go out of their way to break a single-threaded concept into multiple threads and get a net loss of performance that I take issue with.

  80. Inmos had the elegant solution by Tjp($)pjT · · Score: 2, Interesting
    Inmos Transputers C language development had an elegant solution. It should be migrated to mainstream C and C++ in my opinion:

    parallel
    {/* execute these statements in parallel if possible */
    statement1;
    statement2;
    ...
    statementn;
    }

    sequential
    {/* execute these statements in order as written */
    statement_1;
    statement_2;
    ...
    statement_n;
    }
    --
    - Tjp

    I am in wallow with my inner money grubbing capitalistic pig. ... Oink!

    1. Re:Inmos had the elegant solution by Anonymous Coward · · Score: 0

      The Transputer was originally designed to run Occam, or rather the two were designed together. The message passing and parallel constructs in Occam directly map onto the Transputer hardware, and so are incredibly efficient. I would imagine that Immos's C extensions were also designed to be efficiently implemented on the Transputer hardware - they may not be such a good general purpose solution for implementing on top of an abstract threading model such as Pthreads, or via OpenMP, or whatever.

    2. Re:Inmos had the elegant solution by Tjp($)pjT · · Score: 1

      While it might not be best to implement the message passing and timer primitives of the transputer as C/C++ language extensions but the "parallel" and "sequential" blocks could be in the same category as many of the other "compiler hint" items. They can't really be pragmas as that gets to be very awkward to read and understand, but as keywords that take a statement or statements (degenerative case is one statement) they can be, usually, safely ignored. It is really a case where a compiler that is aware of multiple core processors and hyper-threaded architectures could shine. Like generative programming with templates it is an area where the compiler design can be better used to implement the function. "Sequential" is for orthagonality and to add only a little bit to the readability and function of the little used "," (comma) operator. "parallel" is a lower level than pthreads, etc. as one does not have the ability to kill a thread or whatever other ancillary functionality might be used at a higher level, as such it is a much lower level (fiber-like, rather than thread-like might be the way to describe it). No separate contexts and so on. It was used for independant operations. Bad things would happen if you tricked the compiler (well non-deterministic) into using the same memory, register or stack location (I know, same difference as the registers were just low memory mapped) between parallel statements. The compiler caught most but not all. Like a compiler should. Modern compiler tech would do a much better job.

      Personally I would love to see a modern transputer chip. You could probably get a full switch and 16 transputers with lots of cache on a chip easily a tenth the power of a current generation processor. The digital phase locked loop clock multipliers of today are much better so links of 200 Mhz or better and internal clock speeds of several gigahertz would be killer. The basic patents for the architecture are all expired ... might make an interesting PHD project. :) I loved the direction the chip family was taking with specialized transputers for disk I/O and so forth. Maybe update the link architecture to hypertransport as it is an open standard as well. Could make a nice open hardware project too. Transputer GPU, Transputer NPU, Transputer Multi-I/O chip ... lots of silicon design reuse potential...

      --
      - Tjp

      I am in wallow with my inner money grubbing capitalistic pig. ... Oink!

    3. Re:Inmos had the elegant solution by xmedar · · Score: 1

      I agree completely, I remember coding a ray tracing app on 4 T800s 17 years ago and it stomped all over the x86 implementation, and I loved Occam and the nice folding editor they had, Inmos were well ahead of the game, the death of Inmos was particularly sad, if anyone wants to resurrect them I'd be more that interested.

      --
      Any sufficiently advanced man is indistinguishable from God
  81. Re:Don't Bother by statusbar · · Score: 1

    Ideally, there would be hardware support for the data to be updated in a single transaction.

    Another way to implement this is via pointers, where the pointer is updated once the new values are put into place. This pointer switch must be atomic. When a writer decides to write, it gets a copy of the current pointer. It starts the calculation and allocates a new result. This new result is put into place via changing the pointer with an atomic operation only if the original pointer has not changed.

    If it did, then it must THROW OUT the calculations that it did and start them over again.

    The reason why this is more efficient is that in typical programs more threads read the same data, and very little data is actually needing to be written to by multiple threads.

    --jeffk++

    --
    ipv6 is my vpn
  82. Method by bob.appleyard · · Score: 1

    OK chaps, care to enlighten me as to whether this method relies on specific language constructs, or whether it is implicitly managed after the fact?

    I.e. do you need thread objects and synchronisation primitives (such as critical sections), or can one design a program as though it were serial, and then let the compiler judge how to manage threading and concurrency?

    --
    How dare you be so modest!! You conceited bastard!!
  83. Use the torrent by FliesLikeABrick · · Score: 1

    If you're donwnloading the video, do their servers a favor and use the torrent. HTTP downloads are incredibly slow off of their server right now ....

  84. Re:This is about propping up an obsolete technolog by try_anything · · Score: 1

    When technology hits a dead end on one track, the industry goes looking for another track where progress can be made. A few brilliant people stay at the dead end beating their brains against the wall. You can stand there with your arms folded, tapping your foot, waiting for the brilliant people to knock down the wall, but I'm going to pursue the low-hanging fruit elsewhere. When they do knock it down, I'll be right behind you, and what will I have lost?

  85. OpenMP anybody? by tricaric · · Score: 1

    OpenMP is an open standard for multi-platform shared-memory parallel programming in C/C++ and Fortran. It is supported by GCC 4.2 and greater.

  86. Yet another low swipe at FP from a C++er by Anonymous Coward · · Score: 0

    The languages that do support native compiling almost always have other problems like horrible syntax (O'Caml, Lisp) or just general lack of refinement.

    Lisp is famous for its syntax. (Some of the really hardcore people say it has no syntax.) In any event, it does allow unrivaled support for defining your own syntax. That's basically how the language is used: you use Lisp to build the language you need.

    So if you think Lisp has "horrible syntax", I question how you're using it. Part of the beauty of Lisp is that anything you don't like, you can change. If you really like Erlang or Haskell, you can add their syntax to Lisp yourself. (The opposite, as you noted, is not true.)

    I admit it's a hard feature to wrap your brain around, if you're used to other languages. People coming from C++ tend to have the mindset:
      - learn the language's data types and raw syntax (how to define a function, how big ints are, etc.)
      - write your program using those features
    and so the Lisp strategy of "imagine the perfect language for your application" doesn't occur to them, because in C++-land it wouldn't do any good. Your langauge doesn't support syntactic abstraction.

    For example, if you were to think "gee, Aspects would really make this program simpler", in Lisp you might say "OK, so I'll spend an afternoon writing a macro to do that" (it's been done). In most any other language you'd have to say "Gee, I hope Gregor Kiczales and a few of his buddies from Xerox PARC decide to spend a few years founding a new company to write a compiler to add support for them to my language" (also been done).

    One of the Lisp web application servers famously doesn't use the Common Lisp "if" -- the author thought its implicit "else" hurt readability, so he wrote his own ("if*"). A C++ programmer would call this "syntax", and assume that you can never write your own new control structures (because in C++ you can't).

    I don't know OCaml, but I suspect there's a good chance you're dismissing it out of hand, too, simply because it doesn't look much like C++. Go on and use C++ if you like it; if it works for you, that's great. But stop badmouthing languages you've never really used.

    1. Re:Yet another low swipe at FP from a C++er by Anonymous Coward · · Score: 0

      Yet another person who dismisses someone based on conjecture.

      I learned Lisp before I learned C by the way (more years ago than I care to say). I also spent about a year writing in O'Caml so I at least have a little experience.

    2. Re:Yet another low swipe at FP from a C++er by Anonymous Coward · · Score: 0

      Yet another person who dismisses someone based on conjecture.

      What conjecture? He dismissed Lisp because of its syntax ... but in Lisp, you can make your own syntax. You may as well say C is crap because it doesn't have a built-in function to find the roots of a quadratic equation, even though C lets you define your own functions.

      The whole thing is right up there with "paint sucks because it doesn't come in the right colors".

  87. I do know ocaml. by Generic+Player · · Score: 1

    And its syntax does suck. I don't think that's a good enough reason to dismiss it however, especially since its so damn good in every other way:

    * Fast bytecode compilation that can be used in #! scripts
    * optimizing native code compilation that performs as well as or better than every other language outside of C/C++/D
    * debugger with backstepping
    * profiler
    * modules with seperate compilation
    * strong static typing
    * variant types and pattern matching
    * imperative and OO features for when you want them
    * a full lex and yacc
    * camlp4 for creating syntax extensions or redefining syntax

    How anyone could pass on all that just because the syntax is crappy is beyond me.

  88. no POSIX by NuShrike · · Score: 1

    Win2K is the last one to support POSIX: http://support.microsoft.com/default.aspx?scid=kb; en-us;308259

    You probably mean CreateProcess(), a Win32 API.

    I think what you're really arguing is for work/task loops built up by message passing. These work queues can either be a different threads, or processes, and spend most of their time looping through the queue, instead of being created&destroyed per "user".

  89. Bad example. by Generic+Player · · Score: 1

    Apache 2 is for windows. The threaded model doesn't perform any better than the prefork model on unix. For unix users, apache 2 is just a way to get a less reliable version of apache that has had many new security holes introduced.

  90. Re:This is about propping up an obsolete technolog by ClosedSource · · Score: 1

    "When technology hits a dead end on one track, the industry goes looking for another track where progress can be made."

    There are two different industries involved here: the hardware processor industry and the software industry. It looks like the former is looking for the latter to bail it out. Multicore isn't really a new track, it's more of a repackaging effort.

    "I'm going to pursue the low-hanging fruit elsewhere."

    I think all the discussion over this issue suggests that it isn't low-hanging fruit.

    "When they do knock it down, I'll be right behind you, and what will I have lost?"

    As always, it depends. If you and your co-workers don't make any additional mistakes due to the added complexity, probably nothing bad will happen. Obviously if multicore doesn't actually add a lot of multithreading to your project because you're already using it extensively, it won't have any effect at all.

  91. "a mostly seemless bittorrent alternative" by Master+of+Transhuman · · Score: 1


    Seems to be what?

    You gotta love the American educational system...or the lack of Web spellchecking...one or the other...

    --
    Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
  92. Re:This is about propping up an obsolete technolog by try_anything · · Score: 1

    There are two different industries involved here: the hardware processor industry and the software industry. It looks like the former is looking for the latter to bail it out.

    Considering that hardware designers have been leaning over backwards for years to help software programmers stick with one fairly easy programming model, I wouldn't be too quick to blame the hardware industry. They know which side their bread is buttered on. The hardware company that requires the least adaptation from software developers has the advantage, just like the software company that requires the least change and adaptation from users always has the advantage.

    I think all the discussion over this issue suggests that it isn't low-hanging fruit.

    This is Slashdot. People argue against something so they'll never have to try it :-) I've never had a choice; my first professional job was at a Java shop that used multithreaded servers, and at my current gig one of my ongoing tasks is to add multithreading to single-threaded C++ apps so that our customers get as much power as possible out of their multi-CPU workstations.

    What I've learned is that there's basically a one-time cost when you redesign and refactor a single-threaded app to a multi-threaded app. Multi-threaded design is harder, but the resulting designs are often simpler, since single-threading forces you to specify many arbitrary choices about sequencing. Then when you consider resource utilization, it turns out those arbitrary choices aren't arbitrary after all, so you end up interleaving work in weird ways to keep resource utilization up. With multi-threaded apps, I just leave it up to the thread scheduler, which does a pretty damn good job.

    On the programs I've worked on, the advantages of cleaner design (no need to specify serialization where it doesn't exist, no spaghetti flow control) have cancelled out the disadvantages of more complicated implementation (ensuring protection for every shared resource.) The cost of ensuring protection is mitigated by the fact that most of the resource-protection code in my programs is contained in a few shared data structure implementations that I reuse in all my projects.

    Of course I write lots of single-threaded programs, too. Most of my programs start out single-threaded, with an eye toward future multithreading. Single threading has a decided advantage for simple programs; I just believe that advantage disappears when programs get large and complicated.

  93. Sheltered does not mean what you think it means. by Generic+Player · · Score: 1

    Game console programming is just as sheltered as any other narrowly focused programming field. Higher level languages aren't about sheltering, they are about abstraction. I understand the low level implimentation details, but I can choose to leave those details out of my mind, and concentrate on the high level problem. You use this same technique all the time, but you have the unfortunately common misconception that abstracting certain things is good, and abstracting other things is bad.

    You don't have to impliment functions/procedures yourself, because you are using a high enough level language that it provides an abstraction for that. This makes you more productive, and your code easier to modularize and maintain. Keep moving up the high level chain and more and more things get abstracted, making you more productive and your code better. Especially since most low level programmers fail to impliment many abstractions that are very powerful, because they have never bothered to try existing high level languages and discover the power those abstractions provide.

  94. I would like to see a business programming example by Anonymous Coward · · Score: 0

    Well, everytime someone posts a multi threading example it has to do with something that is not done by 55% of the programmer out there; programming a business application accessing a database. I can understand an OS or a DB engine running multiple threads but how about a report or a OLTP transaction oriented example?

    bah!

  95. Re:C++ can't be made safe (SAFK) by Steveftoth · · Score: 1

    Maybe you should rethink your position about synchronizing every method to make Java 'thread safe' Cause what that will really do is cause all your code to freeze when it is actually run in a multithreaded environment unless it is very simple.

    You cannot just make code thread safe. You have to define where you are going to share memory between threads, and then figure out all interactions between the threads and define where the data can be shared. Otherwise you are asking for trouble. Using a process model makes it HARDER (but not impossible) to hang yourself, by forcing you to use shared memory, message passing, or other techniques to force you to think about the shared state of your code.

    Writing code is easy, but the tools we choose to use make it hard to write good multithreaded code.

  96. Re:This is about propping up an obsolete technolog by ClosedSource · · Score: 1

    "Considering that hardware designers have been leaning over backwards for years to help software programmers stick with one fairly easy programming model, I wouldn't be too quick to blame the hardware industry. They know which side their bread is buttered on. The hardware company that requires the least adaptation from software developers has the advantage, just like the software company that requires the least change and adaptation from users always has the advantage."

    I didn't see the "leaning over backwards" you refer to. What hardware improvements have hardware designers avoided because it would have caused the current programming model to change? You don't have to look any further than the x86 architecture with its segmented memory to conclude that ease of programming has never been a key goal at Intel.

    "I've never had a choice; my first professional job was at a Java shop that used multithreaded servers, and at my current gig one of my ongoing tasks is to add multithreading to single-threaded C++ apps so that our customers get as much power as possible out of their multi-CPU workstations."

    One advantage of server apps is that they inherently lend themselves to a mutithreading approach since each request can be seen as a seperate unit of work. As you probably know, the challenge comes when no such natural break-down is apparent.

  97. Re:C++ can't be made safe (SAFK) by Gr8Apes · · Score: 1

    I only claimed synchronizing would almost make it thread-safe. I did not ever state this was a good approach. You should read the second and third paragraphs of my GP.

    --
    The cesspool just got a check and balance.
  98. Re:This is about propping up an obsolete technolog by try_anything · · Score: 1

    I didn't see the "leaning over backwards" you refer to. What hardware improvements have hardware designers avoided because it would have caused the current programming model to change? You don't have to look any further than the x86 architecture with its segmented memory to conclude that ease of programming has never been a key goal at Intel.

    Intel desktop and workstation microprocessors, or any superscalar microprocessors for that matter, are a great example. The programmer (or compiler) writes a single-threaded assembly language program for an extremely simple machine that is basically a fiction. The assembly programmer gets to think in terms of ISA abstractions like "the EAX register" when in the real microprocessor, with its superscalar pipelines, out-of-order execution, branch prediction, and so forth, there really isn't anything you could reasonably point to and say, "This is the EAX register." The microprocessor is a dizzyingly complex hardware simulation of the simple machine that software programmers use as their mental model of "the hardware."

    Thanks to the insulation provided by the instruction set architecture, programmers have been able to mostly ignore the growing complexity of microprocessors and continue thinking in terms of the same single-threaded in-order execution model and a relatively slowly evolving set of ISAs. The cost of this mismatch between hardware and programming model becomes evident when you look at how much more performance you can get when you shift some of the burden from the hardware to the programmer, as with the Cell processor. (You can also save lots of power by forcing software tools to manage instruction scheduling. Microprocessors in cell phones and other battery-powered devices typically can't afford to spend power on analyzing, reordering, and creatively dispatching instruction streams. They leave that to compiler writers and programmers.)

    When you look at the alternatives (and I'm sure Intel, AMD, and IBM have worked with more alternatives than I'll ever even hear of), symmetric multiprocessing is the smallest, least disruptive step that software developers could be asked to take. We still get a nice simple machine model that thankfully reflects very little of the complexity of the underlying hardware. The only thing that has gotten worse is that instead of getting unconditional, dramatic performance improvements for no effort at all, we get dramatic performance improvements conditional on our ability to use cores efficiently. And heck, single-threaded programs might keep getting faster anyway.

    P.S. I admit I don't know anything about Intel's introduction of segmentation, but reading what Wikipedia says, it seems like the key selling point of segmentation was the ease of porting old software. Requiring users to move from straightforward 16-bit addressing to straightforward 32-bit addressing would have kept both hardware and software simpler than using Intel's segmentation idea, but segmentation made it easier to port old software, so it was a success. It sounds more like a present vs. future tradeoff than a software vs. hardware tradeoff.

  99. Re:This is about propping up an obsolete technolog by ClosedSource · · Score: 1

    "Intel desktop and workstation microprocessors, or any superscalar microprocessors for that matter, are a great example. The programmer (or compiler) writes a single-threaded assembly language program for an extremely simple machine that is basically a fiction. The assembly programmer gets to think in terms of ISA abstractions like "the EAX register" when in the real microprocessor, with its superscalar pipelines, out-of-order execution, branch prediction, and so forth, there really isn't anything you could reasonably point to and say, "This is the EAX register." The microprocessor is a dizzyingly complex hardware simulation of the simple machine that software programmers use as their mental model of "the hardware.""

    This sort of "fiction" is a fundamental requirement of these devices to be classified as microprocessors; otherwise you'd have to perform "programming" by doing digital logic design or microcoding. It's like saying that car manufactures accomodate drivers legacy expectations by providing wheels for the car.

    In addition, a number of these added features don't really preserve legacy expectations anyway. Hard real-time software can't really be written for most modern processors because the execution time of a particular section of code is not deterministic.

    "P.S. I admit I don't know anything about Intel's introduction of segmentation, but reading what Wikipedia says, it seems like the key selling point of segmentation was the ease of porting old software."

    Given the fact that the 8086/8088 couldn't natively run 8085 programs, I don't think backward compatibility was a major goal. If you have to translate instructions anyway, changing from an 8 bit address to 16 bit address isn't that much harder.

  100. Re:Don't Bother by Anonymous Coward · · Score: 0

    This pointer switch must be atomic.

    Ahh, but isn't this the crux of the matter? The only way to make the pointer switch atomic without assist from the OS is to use a lock, which means that all readers must test the lock to make sure the pointer isn't in the middle of being updated. Voila, you have a protected section of code that even readers have to do some type of test. Forgive me if I'm still not understanding something but I just can't see how this can be done without hardware/OS help, then at that point you could also devise a much cheaper lock while you're at it.

    If it did, then it must THROW OUT the calculations that it did and start them over again.

    I guess this is where knowing your app comes in. If your app happens to have many writers (say an OLTP app), then the cost for those collisions gets extremely high, so it may be worth your while to block writers. Then again, while many OLTP apps have many writers, they tend not to have high level data collisions, so this may only be applicable if your dealing with some type of index cache or tree. Anyway, interesting topic (ok, my true nerd self has been exposed).

  101. Re:This is about propping up an obsolete technolog by try_anything · · Score: 1

    This sort of "fiction" is a fundamental requirement of these devices to be classified as microprocessors; otherwise you'd have to perform "programming" by doing digital logic design or microcoding. It's like saying that car manufactures accomodate drivers legacy expectations by providing wheels for the car.

    It isn't necessary for the ISA to be a fiction. In simple processors, the ISA *can* reflect the real, concrete structure of the processor. Assembly statements like "mov r2, r1" might tell you quite literally what is happening inside the processor. In fact, the programmer doesn't even need to know whether the ISA reflects the real structure of the processor or not. What matters to the programmer is that the behavior of the processor conforms to the model specified by the ISA.

    Hardware designers went from processors almost as simple as a simple ISA to processors thousands of times more complex than the most complex ISA. Meanwhile, software programmers didn't have to come along for the ride. ISAs grew, but not at the same rate as hardware complexity. This is part of what I mean by hardware designers bending over backwards for software programmers. When it becomes possible to put more and more transistors on a microchip, yet the programming interface doesn't scale with the amount of processing power, it becomes hard to use the power efficiently. A Xeon can do a huge amount of work per unit time, but it's like five hundred people getting together to build a small house. When you throw all those transistors at a single instruction stream, most of them get applied to fairly small improvements like improving branch prediction by 2%. The instruction stream is simply inadequate to take advantage of that much hardware. Hardware efficiency suffers so programmers can keep using a simple model.

    That's the logic behind multicore chips, and it's also why the Cell processor can put up such ludicrous performance numbers without any kind of technology breakthrough. Instead of wasting vast numbers of transistors on making small tweaks to single-process execution, why not make them available for real work? Let the programmer decide what to do with them. The programming model gets more complicated, but the programmer gets to use a bunch processing power that has until now been inefficiently applied because the programming interface has been too narrow.

    Given the fact that the 8086/8088 couldn't natively run 8085 programs, I don't think backward compatibility was a major goal. If you have to translate instructions anyway, changing from an 8 bit address to 16 bit address isn't that much harder.

    I didn't mention backward compatibility; I mentioned porting. I'm pretty sure that changing the pointer size is in fact a big deal, especially for languages like C where types aren't exactly respected. If I understand what I'm reading, Intel's design allowed old programs that used 16-bit addressing to be recompiled to use 16-bit addressing inside a single segment. That meant that nobody had to dig through old code to find and correct all the places where a pointer was stored in a 16-bit datatype. Meanwhile, new programs could use up to 20 bits of address space.

  102. Ada 2005 by krischik · · Score: 2, Informative

    You forgot to mention that Ada 2005 now adds Interfaces to both protected and task objects. See:

    http://en.wikibooks.org/wiki/Ada_Programming/Taski ng

    Ada's multi-threadeding is not only without the pain but great fun!

    Martin

  103. Ada is thread ready since 1983... by krischik · · Score: 2, Informative

    ...and Ada 2005 even supports Real-Time programming. It is possible - just not with C++.

    Find a short intro here:

    http://en.wikibooks.org/wiki/Ada_Programming/Taski ng

    Martin