Slashdot Mirror


Faster Chips Are Leaving Programmers in Their Dust

mlimber writes "The New York Times is running a story about multicore computing and the efforts of Microsoft et al. to try to switch to the new paradigm: "The challenges [of parallel programming] have not dented the enthusiasm for the potential of the new parallel chips at Microsoft, where executives are betting that the arrival of manycore chips — processors with more than eight cores, possible as soon as 2010 — will transform the world of personal computing.... Engineers and computer scientists acknowledge that despite advances in recent decades, the computer industry is still lagging in its ability to write parallel programs." It mirrors what C++ guru and now Microsoft architect Herb Sutter has been saying in articles such as his "The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software." Sutter is part of the C++ standards committee that is working hard to make multithreading standard in C++."

87 of 573 comments (clear)

  1. 2005 Called by brunes69 · · Score: 5, Funny

    ....it wants it's article back.

    Seriously - any developer writing modern desktop or server applications that doesn't know how to do multi-threaded programming effectively deserves to be on EI anyway. It is not that difficult.

    1. Re:2005 Called by CastrTroy · · Score: 5, Insightful

      It's not just making your app multithreaded, it's completely changing your algorithms so they they take advantage of multiple processors. I took a parallel programming course in University, so I'm by no means an expert, but I'll give what insight I have. You can't just take a standard sort algorithm and run in multithreaded. You have to change the entire algorithm. In the end, you end up with something that sorts faster than n log (n). However, doing this type of programming where you break up the dataset, sort each set, and then gather the results can be very difficult. Many debuggers don't deal well with multiple threads, so that adds an extra layer of difficulty to the whole problem. Granted, I don't think that we really need this level of multithreadedness, but I think that's what the article is referring to. I think that 10+ core CPUs will only really help for those of us who like to do multiple things at the same time. I think it would even be beneficial to keep most apps tied to a single CPU so that a run-away app wouldn't take over the entire computer.

      --

      Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
    2. Re:2005 Called by gazbo · · Score: 4, Interesting
      In the end, you end up with something that sorts faster than n log (n).

      Not without an infinite number of processors you don't.

    3. Re:2005 Called by MrSteveSD · · Score: 2, Insightful

      A lot of multi-threading up until now has been about keeping applications responsive, rather than breaking up tasks. That makes sense since muti-core chips haven't been around that long in most peoples homes. Another issue is that once you have more than one processor, two threads really can run at the same time which can show up all kinds of bugs you would never notice on a single core system. The main problem I can see is with testing for errors. With multiple threads it's up to the OS on how it juggles them around and that juggling may be different for every test run. So you could run the same test a hundred times, then suddenly, you could get a failure. So multi-threading throws in a certain random aspect into the software which never used to be there.

    4. Re:2005 Called by ZeroFactorial · · Score: 5, Funny
      This sounds to me like a great example of passing the buck.

      EE Guy #1: We can't seem to build faster chips.
      EE Guy #2: No problem. We'll just put tons of processor cores in instead.
      EE Guy #1: But people have spent the past 30 years creating algorithms for single core machines. Almost none of the programmers have any experience writing multi-core algorithms!
      EE Guy #2: Exactly! We'll be able to blame the programmers for being lazy and not wanting to learn new complicated algorithms that require an additional 4 years of university.
      EE Guy #1: Brilliant! We should come up with a catchy headline like "The Free Lunch is Over" or something like that.
      EE Guy #2: Yeah, and we could get Slashdot to post a link to the article. Slashdot users are sure to sympathize with our devious plans...
    5. Re:2005 Called by caerwyn · · Score: 4, Informative

      As you know, multiple threads in a program do not actually execute concurrently - processing is still serial, it's just so fast that threads can appear to execute simultaneously - and it's not just about queuing execution either.

      That holds only for multithreaded programming on a single core. As soon as there are multiple cores available, processing does, in fact, happen simultaneously.

      --
      The ringing of the division bell has begun... -PF
    6. Re:2005 Called by chaboud · · Score: 3, Informative

      Well, 2005 called...

      it wants its reply back.

      The parent is exactly how I would have replied a couple of years ago. I was doing lots of threading work, and I found it easy to the point of being frustrated with other programmers who weren't thinking about threading all of the time.

      I was wrong in two ways:

      1. It's not that easy to do threading in the most efficient way possible. There's almost always room for improvement in real-world software.

      2. There are plenty of programmers who don't write thread-safe/parallel code well (or at all) that are still quite useful in a product development context. Some haven't bothered to learn and some just don't have the head for it. Both types are still useful for getting your work finished, and, if you're responsible for the architecture, you need to think about presenting threading to them in a way that makes it obvious while protecting the ability to reach in and mess with the internals.

      The first point is probably the most important. There are several things that programmers will go through on their way to being decent at parallelization. This is in no strict order and this is definitely not a complete list:

      - OpenMP: "Okay, I've put a loop in OpenMP, and it's faster. I'm using multiple processors!!! Oh.. wait, there's more?"
      Now, to be fair, OpenMP is enough to catch the low-hanging fruit in a lot of software. It's also really easy to try out on your code (and can be controlled at run-time).

      - OpenMP 2: "Wait... why isn't it any faster? Wait.. is it slower?"
      Are you locking on some object? Did you kill an in-loop stateful optimization to break out into multiple threads? Are you memory bound? Blowing cache? It's time to crack out VTune/CodeAnalyst.

      - Traditional threading constructs (mutices, semaphores): "Hey, sweet. I just lock around this important data and we're threadsafe."
      This is also often enough in current software. A critical section (or mutex) protecting some critical data solves the crashing problem, but it injects the lock-contention problem. It can also add the cost of round-tripping to the kernel, thus making some code slower.

      - Transactional data structures: "Awesome. I've cracked the concurrency problem completely."
      Transactional mechanisms are great, and they solve the larger data problem with the skill and cleanliness of an interlocked pointer exchange. Still, there are some issues. Does the naive approach cleanly handle overlapping threads stomping on each-others' write-changes? If so, does it do it without making life hell for the code changing the data? Does the copy/allocation/write strategy save you enough time through parallelism to make back its overhead?

      Should you just go back to a critical section for this code? Should you just go back to OpenMP? Should you just go back to single-threading for this section of code? (not a joke)

      Perhaps as processors get faster by core-scaling instead of clock-scaling this will become less of a dilemma, but to say that "[to do multi-threaded programming effectively] is not that difficult" is akin to writing your first ray-tracer and saying that 3D is "not that difficult." Somtimes it is. At least at this point there are places where threading effectively is a delicate dance that not every developer need think about for a team to produce solid multi-threaded software.

      That doesn't mean that I object to threading being a more tightly-integrated part of the language, of course.

    7. Re:2005 Called by egomaniac · · Score: 2, Informative

      The difference between parallel programming and multithreaded programming is this ... with a parallel algorithm, different parts of one task/thread are done on separate CPUs, whereas with multithreaded programming each one thread/task is done entirely on one processor.

      Wait... what? "Different parts of one thread are done on separate CPUs"?

      In what (real world, non-research) system is a single thread run on multiple processors at the same time? And why are you claiming that running each thread on a single processor, as is done by all major OSes, not parallel programming?

      It's not a semantic difference. Threads are basically just lightweight processes...so each thread of a program execution can be thought of as a different process.

      I've re-read that about five times, and I still don't have a clue what point you're trying to make here. From an algorithmic standpoint, all that matters is "these instructions are run in sequence, and these two sets of sequential instructions can run in parallel". The terminology that generally describes the concept of a sequential set of instructions is "thread". Sure, on a given operating system you might use a lightweight process or even a full-blown process to implement each 'thread', but that's an implementation detail and has nothing to do with the algorithm. What are you trying to say?

      OTOH, in parallel programming, a thread/task is broken down into pieces and brought back together when the pieces are done. Think SETI@Home, but on a much smaller scale.

      You're referring to "data parallelism" versus "task parallelism". Breaking a single computation's data set up into parallelizable chunks a la SETI@Home is "data parallelism", whereas running two relatively unrelated tasks in parallel is "task parallelism". They are both forms of parallel programming and your assertion that only data parallelism 'counts' is simply false.

      --
      ZFS: because love is never having to say fsck
    8. Re:2005 Called by bfields · · Score: 3, Insightful

      Oh, good grief--since the original poster didn't specify units, and since it's highly unlikely the running time would be exactly n log n for any choice of units, and since it's pretty common to leave out the O() in casual conversation, the only sensible interpretation is that the O() was implied....

    9. Re:2005 Called by gazbo · · Score: 2, Interesting
      Indeed, sorting is highly parallelisable. My point was that you can't change general algorithmic complexity by adding k more processor cores.

      Regarding the other "he said n log n int O(n log n)" comment...well, that's already been answered (and with considerably more tact than I would have used).

    10. Re:2005 Called by AuMatar · · Score: 3, Interesting

      Yes it is something the app developer needs to deal with. The problem isn't creating threads- the OS does that. The problem is in protecting data from concurrent unsynchronized access (this needs to be done by the app, as the OS has no clue what can/can't be accessed synchronously, that's an application detail) and parallelizing algorithms (again, not something the OS can do).

      Eventually, good parallel algorithm libraries will pop up. That will help some subset of problems. I'd expect frameworks to pop up as well, helping another. But in many cases it just comes down to changing how we write programs.

      And you're right, this isn't really a desktop issue- its mainly a server one. Desktops really don't need all the power they have now, perhaps one percent of users outside of gamers actually use it. That doesn't make it any less important of a problem to solve. Although I expect in the end people will still end up disappointed- parallelization is not magic pixie dust, you can only get so much of a speedup. I wouldn't be surprised if those 8 corse only give a 2x speedup over a single core on many apps.

      --
      I still have more fans than freaks. WTF is wrong with you people?
    11. Re:2005 Called by chaboud · · Score: 4, Insightful

      It's not quite like that.

      On modern systems, threads are themselves first-class constructs, and it runs somewhat like this:

      A process has things like memory-tables for virtual memory, handles for objects, files, socket connections, etc. A process always contains at least one thread (this isn't always true while a process is being set up or torn down, but it's true when most anyone's code is running).

      A thread generally has a stack (in the host-process's virtual address space, so everyone can read it), some thread-local storage to make life easier for some api's (you don't need to care about this in most cases), and lives in a process. This means that threads can use virtual addresses for memory interchangeably with other threads in the same process.

      Additionally, some operating systems support fibers. A fiber is like a thread except that it has to be explicitly or cooperatively (not quite the same thing) multi-tasked. Fibers use even less memory than threads, and you really don't have to care about them.

      When you're in, say, Visual Studio, there's a "threads" window for all of the threads of the process that you are debugging. You can end up stepping through code on one thread while other threads are running.

      The modern hardware designs lead to interesting performance side-effects from cache location and memory location. It's not quite as hard as systems that have asymmetric access to resources (e.g. Playstation 2), but it makes for fun work.

    12. Re:2005 Called by EatHam · · Score: 5, Funny

      pretty common to leave out the O() in casual conversation
      I would say that it is *extremely* common for casual conversation to not have anything whatsoever to do with O().
    13. Re:2005 Called by workdeville · · Score: 2, Informative

      It's not particularly unlikely that an algorithm runs in exactly n log n operations (for a sane choice of units). That's kind of the point of Big O notation. Dropping scalar factors into your units. (Though I will note that I prefer using notation like O(n^2 + n) over just O(n^2) since it is more informative, and admit that it complicates my "summary"). The point is: the GGP said n log n, not O(n log n). n/2 log n is faster than n log n, even if they're both of the same algorithmic order.

    14. Re:2005 Called by Pollardito · · Score: 2, Funny

      actually they divided the buck into 10 dimes and then passed them all in parallel.

      seriously though, it was only a few years ago that people were scoffing at the usefulness of dual processor desktop machines and arguing the value of being able to run multi-threaded apps and multiple apps faster at the expense of poorer performance on the vast majority of apps and games which people were running in isolation. it doesn't seem like applications or operating systems have seen a major overhaul since that time (just incremental gains), but the enthusiasm with which they're piling on more and more cores has drowned out all the questions people had. i think this has more to do with chip marketers needing to be able to trumpet something with great excitement than actual newfound utility of multiprocessing

    15. Re:2005 Called by pthisis · · Score: 2, Informative


      On modern systems, threads are themselves first-class constructs

      Not in, say, Linux or Plan 9. Context of execution are first-class constructs, and both threads and processes are special cases of COEs.

      A process has things like memory-tables for virtual memory, handles for objects, files, socket connections, etc. A process always contains at least one thread (this isn't always true while a process is being set up or torn down, but it's true when most anyone's code is running).

      The latter sentence here is nonsensical on many modern systems.

      The core distinction which applies to most common modern systems (Windows, OS X, Linux, modern Unices, etc) is that:

      In a multithreaded program, the threads all share memory (aside from the stack and possibly thread-local storage). This can be alternately phrased as threads lack memory protection from each other. Processes do not share memory except what is specifically allocated as shared memory (through CreateMemoryMapping, mmap, shm_get, or whatever)

      When you are making the choice of whether to use threads or processes, your fundamental question should be "Do I want to implicitly share all memory?", or "Do I want to throw out memory protection?". Sometimes the answer is yes, but more often it's no (in which case you probably want to go with multiple COW processes, which the Unix/Mac crowd is familiar with through fork() but the equivalent NTCreateProcess with a NULL SectionHandle is much underpublicized on Windows).

      Additionally, some operating systems support fibers.

      Fibers are pretty tangential to the conversation and can also be implemented in user space. They're not really threads (or processes) at all, they're just coroutines. Java's "green threads" are one common example.

      --
      rage, rage against the dying of the light
    16. Re:2005 Called by felix9x · · Score: 2, Insightful

      Actually you can sort the same data with multiple threads in parallel. Consider for example you divide an array of items into two halfs and sort each half with a separate thread using quicksort. There is no problem with synchronizing the data since the two threads will be working on separate data. The merge of the two sorted sets you can be done single threaded, which is of linear complexity. You can also get fancy with the merge but it gets more complex.

      As far as sorting stuff like drop-down boxes you will not have enough data to justify using multiple cores on it, unless you got millions of items in it but then you got other problems.

    17. Re:2005 Called by Darinbob · · Score: 3, Insightful

      There's a wide variance in what "parallel computing" means. For multicore, you've essentially just got a cheaper version of SMP (symmetric multiprocessing). This is worlds away from what occurs in a parallel computer and what most parallel programming algorithms deal with. With multicore and SMP you program mostly like you're doing multithreading on a single CPU.

      The algorithms programmers have to deal with here involve concurrency, and have been in use for decades by anyone writing an OS or device driver. Dining Philosopher problem, readers and writers synchronization, etc. These are used on what most people think of as single processor computers and are essential. So I don't really think of these as "parallel programming", but as "parallel-light".

      Parallel programming to me means dealing with SIMD or MIMD machines. MIMD has multiple processors each with its own memory and data, not multiple processors all sharing the same memory like SMP does. They may have high speed connections to a subset of other processors, such as being arranged in a grid or cube. SIMD has multiple processors all with their own data space but executing the same instruction sequences; the simplest form of which might be vector processors. The algorithms for these machines have very little in common with multithreading types of algorithms.

      The parallel algorithms that require lots of sharing between processors will hit a bottleneck on the RAM with these multicore CPUs.

    18. Re:2005 Called by dbIII · · Score: 2, Insightful

      Desktops really don't need all the power they have now, perhaps one percent of users outside of gamers actually use it.

      Doing things with digital video and photoshopping still images will use as muich CPU as you can feed it. These are now mainsteam uses for home computers.

    19. Re:2005 Called by try_anything · · Score: 2, Insightful
      More and more cores? Consumer desktops and laptops have gone up to a whopping two cores -- four cores only if you blow a wad of dough for bragging rights. Two processors is definitely not overkill for the average user, especially since most users have a browser full of Ajax-ridden web pages open 24/7. I doubt that four cores will be overkill, either, once we start to realize all the various ways we've crippled applications to make them well-behaved citizens of the vanishing single-core desktop.

      The massively multicore processors are exactly where they need to be: in servers and workstations, and on the desks of hardware queens who absorb the cost of product development so I don't have to.

      poorer performance on the vast majority of apps and games which people were running in isolation

      People run the vast majority of their applications concurrently with other applications. The only significant exception is gamers. When you're dealing with a sluggish app on a single-core machine, what are the odds it's unresponsive because of another application vs. being unresponsive because of its own problems? Now, same question, on a dual-core machine? The odds drop quite a bit. It's nice to have a spare core so when one app gets fussy the rest of your applications keep responding normally.

      it doesn't seem like applications or operating systems have seen a major overhaul since that time (just incremental gains)

      All the more reason to have multiple cores. In my experience, having multiple processors actually compensates for application-level and OS-level multiprocessing deficiencies, because let's face it, one hoggish app can make it very annoying to use a single-core machine. OSes are supposed to mitigate that, but since they don't do a perfect job, multiple cores help keep the system usable. Granted, there are other resources besides CPU that can suffer from contention, but every little bit helps.

    20. Re:2005 Called by joto · · Score: 2, Interesting

      Actually, Itanium was a fairly good idea. That it didn't work out, could just as well be ascribed to politics and real-world issues, as to technical issues. For example, the requirements said it should be able to run x86 unmodified (why? if you want x86 you already know where to get it, right?). It was oversold (the next desktop processor), and underperformed (late delivery, bad performance). None of these issues indicate that explicit instruction level paralellism (EPIC) is a bad idea. And they certainly have good Itanium compilers now. The main problem with Itanium (apart from the initial delays) was that it was a solution in search of a problem. It still is. But what a marvellous solution!

  2. M$ programmers should be already capable by scafuz · · Score: 5, Funny

    just start a multithread process: 1 core for the program itself, the remaining 7 for the bugs...

  3. hhooppee tthheeyy ffiixx tthhiiss ssoooonn by Chordonblue · · Score: 5, Funny

    II hhaavvee aann XX22 pprrocceessssoor? Ii ccaann ggooeess TTWWIICCEE aass ffaasstt nnooww?

    --
    "...Well, there's egg and bacon; egg sausage and bacon; egg and spam; egg bacon and spam; egg bacon sausage and spam..."
    1. Re:hhooppee tthheeyy ffiixx tthhiiss ssoooonn by ByOhTek · · Score: 4, Funny

      my eyes, they bleed.

      --
      Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
    2. Re:hhooppee tthheeyy ffiixx tthhiiss ssoooonn by Nova1313 · · Score: 3, Interesting

      When the first AMD x2 chips came out the linux kernel had issues with the clock on those chips. The clock would be several times (presumably 2 times?) faster then it should be, the cores clocks were not synchronized for some reason or the kernel would lose track... When you typed a letter it would repeat multiple times as you described. :)

      --
      There exists some positive integer N that you are the Nth person to read this signature.
  4. OS/2? by SCHecklerX · · Score: 4, Interesting

    I remember learning to write software for OS/2 back in the early 90's. Multi-threaded programming was *the* model there, and had it been more popular, it would be pretty much standard practice today, making scaling to multiple cores pretty effortless, I'd think. It's a shame that the single-threaded model became so ingrained in everything, including linux. For an example that comes to mind, why do I need to wait for my mail program to download all headers from the IMAP server before I can compose a new message on initial startup? Same with a lot of things in firefox.

    Does anybody remember DeScribe?

    1. Re:OS/2? by eMartin · · Score: 3, Insightful

      Neither does Microsoft's Outlook Express, but I don't think that was his point.

    2. Re:OS/2? by shoor · · Score: 2, Interesting

      I was working at a very small software shop when OS/2 came out. We would get a customer, who wanted something to work on an apollo workstation, another one wanted it for xenix, a third for Unix BSD 4.2 (my favorite), or Unix System V (ugh!), or Dos. So, we got a project to port something to OS/2 version 1.0, and I got it to work, and it used multi-threading which I thought was pretty cute and I was proud of myself for figuring it all out just from the manuals. Then the new revision of OS/2 came out and everything I had done was broken. My boss was so mad he swore off OS/2 forever after that.

      --
      In theory, theory and practice are the same; in practice they're different. (Yogi Berra & A. Einstein)
    3. Re:OS/2? by pthisis · · Score: 3, Insightful

      It's a shame that the single-threaded model became so ingrained in everything, including linux. For an example that comes to mind, why do I need to wait for my mail program to download all headers from the IMAP server before I can compose a new message on initial startup?

      I'm of the opposite opinion; it's a shame that so many people equate parallel processing with threads. When there's not much shared data, using multiple processes keeps memory protection between your parallel "things", decreasing coupling, increasing isolation, and generally resulting in a more stable system (and for certain things where you can avoid some cache coherency problems, a faster system). Your example is perfect; there's really no good reason to use a thread for such lookups. Another process would do, or even better just use select() and avoid all the pain (and bugs) of a multithreaded solution.

      OS developers spent a lot of engineering time implementing protected memory. Threads throw out a huge portion of that; a good programmer won't do that without very good reasons. Some tasks, where there really are tons of complicated data structures to be shared, are good candidates for threading. More commonly, though, threads are used either because the programmer doesn't know any better or because they allow you to be a slacker about defining exactly what is shared and mediating access to it. The latter is especially dangerous; defining exactly what (and how) things are shared goes most of the way toward eliminating multiprocessing bugs, and threads make it easy to slack off on that and get a "mostly working" solution that occasionally deadlocks, fails to scale, etc.

      Use processes or state machines when you can, and threads when you must.

      --
      rage, rage against the dying of the light
    4. Re:OS/2? by TheRaven64 · · Score: 3, Interesting

      I'm of the opposite opinion; it's a shame that so many people equate parallel processing with threads. I read that and wished I had mod points. Anyone who has programmed with a language designed for concurrency, like Erlang, Termite, or a few Haskell dialects hates using threads. Threads are something that two kinds of people should use; operating system designers and compiler writers. Everyone else should be using a higher-level abstraction.

      The big problem is not the operating system designers, it's the CPU designers. They integrated two orthogonal concepts, protection and translation, into the same mechanism (page tables, segment tables, etc). The operating system wants to do translation so it can implement virtual memory. The userspace program wants to do protection so it can use parallel contexts efficiently. Mondrian memory protection would fix this, but no one has implemented it in a commercial microprocessor (to my knowledge).

      --
      I am TheRaven on Soylent News
  5. Thank god by Fizzl · · Score: 4, Funny

    Thank god that Java, C# and other piles of shit I hate do this quite intuitively and easily.
    Guess I had it coming.
    /me closes his eyes and embraces C++ for the last time before the inevitable doom

    1. Re:Thank god by zifn4b · · Score: 5, Informative

      The only significant thing that managed languages make easier with regard to multithreading other than a more intuitive API is garbage collection so that you don't have to worry about using reference counting when passing pointers between multiple threads.

      All of the same challenges that exist in C/C++ such as deadly embrace and dining philosophers still exist in managed languages and require the developer to be trained in multi-threaded programming.

      Some things can be more difficult to implement like semaphores. You also have to be careful about what asynchronous methods and events you invoke because those get queued up on the thread pool and it has a max count.

      I would say managed languages are "easier" to use but to be used effectively you still have to understand the fundamental concepts of multithreaded programming and what's going on underneath the hood of your runtime environment.

      --
      We'll make great pets
    2. Re:Thank god by Fizzl · · Score: 2, Informative

      Ugly, clumsy and complicated compared to Java's way.

      I know how to do threading in C++ on every platform I have used for development. It's just that the modern languages have elegant system with forethought given to threading while desinging the platform/language. Why would anyone new want to learn how to do clumsy non-standard threading in C++?
      I think the options are to adapt or continue riding the dinosaur untill they die out and be left behind. Sorry that I am sending mixed singnals. I have always worked with C++ and only used these newfangled language when forced to. I feel that I have done something stupid by being too closed minded.

    3. Re:Thank god by gbjbaanb · · Score: 2, Interesting

      Yeah, but they do it really slowly 'cos you're tied to the framework that has to do it safely no matter what - even if you have 2 threads that never interact with each other, the framework will slap synchronisation all over them anyway.

      (I know - I had a discussion with a chap about C# thread-safe singleton initialisation. A simple app to test performance on my little laptop had a static initialised singleton taking 1.5 seconds, lock-based initialisation in 6 seconds. No big deal, we expect that, but then I ran the same tests on a dual-CPU server and both apps took 30 seconds - the framework decided it knew best).

    4. Re:Thank god by Richthofen80 · · Score: 2, Insightful

      Speaking of C#, MS just released a technology preview that adds extensions / namespaces to C# that make it pretty easy to write parallel-executing code:
      http://www.microsoft.com/downloads/details.aspx?FamilyID=e848dc1d-5be3-4941-8705-024bc7f180ba&displaylang=en

      Essentially, they turn
      for (int i = 0; i < 100; i++) {
          a[i] = a[i]*a[i];
      }

      into

      Parallel.For(0, 100, delegate(int i) {
          a[i] = a[i]*a[i];
      });

      and the hint tells the .NET runtime to execute the solution in parallel. No shared memory, no locks, all done for you. That's the way parallelism should work, IMHO

      http://msdn.microsoft.com/msdnmag/issues/07/10/Futures/default.aspx

      --
      Reason, free market capitalism, and individualism
    5. Re:Thank god by anwyn · · Score: 2, Insightful
      Yet another annoying attempt to force garbage collection on C++!

      Garbage collection is a one size fits all solution, that is not appropriate for all the applications in the C++ problem space. Further there is a lot of C++ code already out there that does its own memory management. It would be difficult to retrofit this code to garbage collection.

      Furthermore, many garbage collected languages lack proper destructors. At best they have a finalize method. This interfears with the C++ idiom "object creation is resource allocation; object destruction is resource release". This is the way C++ manages all resources. There are other resources besides memory; like open files, descriptors, network connections and many others. Because the garbage collected languages lack proper destructors, they actually make the management of these other resources more difficult. This can make garbage collected languages more complex and buggy. What the garbage collected languages give with one hand, they take away with the other!

      I wish someone would develop a language with optional garbage collection and with proper destructors!

    6. Re:Thank god by CoughDropAddict · · Score: 2, Interesting
      Essentially, they turn
      for (int i = 0; i < 100; i++) {
              a[i] = a[i]*a[i];
      }

      into

      Parallel.For(0, 100, delegate(int i) {
              a[i] = a[i]*a[i];
      });

      and the hint tells the .NET runtime to execute the solution in parallel. No shared memory, no locks, all done for you. That's the way parallelism should work, IMHO


      So let me get this straight: the runtime is going to
      1. find one or more other threads to farm this work out to, either by creating new ones or taking them from an existing pool
      2. make the thread(s) runnable, and wait for them to get scheduled by the OS
      3. coordinate the communication between the main thread and the other thread(s) about what part of the solution each thread should work on
      ...and this is supposed to be faster than a simple for loop with 100 iterations?

      Sounds like a losing proposition to me. I don't think this is the kind of parallelism that is going to bring noticeable gains.
  6. The basic problem by ucblockhead · · Score: 5, Insightful

    Some algorithms are inherently not amenable to parallelization. If you have eight cores instead of one, then the performance boost you can get can be anywhere from eight times faster to none at all.

    So far, multiple cores have boosted performance mostly because the typical user has multiple applications running at a time. But as the number of cores increases, the beneficial effects diminish dramatically.

    In addition, most applications these days are not CPU bound. Having eight cores doesn't help you much when three are waiting on socket calls, four are waiting on disk access calls and the last is waiting for the graphics card.

    --
    The cake is a pie
    1. Re:The basic problem by Anonymous Coward · · Score: 2, Interesting

      In addition, most applications these days are not CPU bound. Having eight cores doesn't help you much when three are waiting on socket calls, four are waiting on disk access calls and the last is waiting for the graphics card. Processors don't "wait" on blocked IO calls. Your program waits while the processor switches to another task. When the processor switches back to your program, it checks to see if the blocked IO call has completed. If it has, it continues executing your program again. If not, your program continues to wait while the processor again switches to other tasks.
      So it is you (as the programmer) that determines if your program just sits and waits for blocked IO to complete. Or you could spawn a thread for blocked IO calls so your main program thread continues executing (if it is viable to your situation).

      With more processors, your program and its blocked IO calls will be checked more frequently. So even blocked IO calls will see a performance increase.

    2. Re:The basic problem by savuporo · · Score: 3, Insightful

      So far, multiple cores have boosted performance mostly because the typical user has multiple applications running at a time. But as the number of cores increases, the beneficial effects diminish dramatically.
      They diminish, but they never disappear. Even in algorithms where you completely have to wait the results of previous computation to go on, you can still get a speedup with branch prediction. In essence, while your one core is cracking the numbers, other cores do the what if work, and even if you mispredict in lots of cases, you can still get speedups with large datasets, because in some cases, when your first core comes up with a result, you will discover that the what if computation started out with a right guess.
      Hey, i hear they are doing essentially the same stuff with all those newfangled multiscalar processors and branch prediction anyway.

      --
      http://validator.w3.org/check?uri=http%3A%2F%2Fwww.slashdot.org Errors found while checking this document as HTML5!
  7. YOUR eyes?! by Chordonblue · · Score: 2, Funny

    Just be glad I didn't upgrade to the X4 yet! :)

    --
    "...Well, there's egg and bacon; egg sausage and bacon; egg and spam; egg bacon and spam; egg bacon sausage and spam..."
  8. Re:concurrency - the developer's responsibility? by LWATCDR · · Score: 2, Insightful

    "But seriously, isn't the OS responsible for the heavy lifting with regards to task scheduling and concurrency? Oh, wait, this is Microsoft, right? Perhaps this is similar to their take on Security being somebody else's problem."
    Huhhh?
    My guess is that you never wrote any code.
    Linux doesn't do any more heavy lifting for you than Windows does. I doubt that OS/X does.
    So what are you talking about.
    An OS will never figure out what part of your program is going to need to be in which thread. A compiler MAY at some time do it but they are just now doing a good job with vectors.

    --
    See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
  9. Personal computing? by Dan+East · · Score: 5, Interesting

    "processors with more than eight cores, possible as soon as 2010 -- will transform the world of personal computing"

    Exactly what areas of "personal computing" are requiring this horsepower? The only two that come to mind are games and encoding video. The video encoding part is already covered - that scales nicely to multiple threads, and even free encoders will use the extra cores to their full potential. That leaves gaming, which is basically proprietary. The game engine must be designed so that AI, physics, and other CPU-bound algorithms can be executed in parallel. This has already been addressed.

    So this begs the question, exactly how will average consumer benefit from an OS and software that can make optimum use of multiple cores, when the performance issues users complain about are not even CPU-bound in the first place?

    Dan East

    --
    Better known as 318230.
    1. Re:Personal computing? by bogie · · Score: 2, Insightful

      "So this begs the question, exactly how will average consumer benefit from an OS and software that can make optimum use of multiple cores"

      AOL 10.0 will say "You got mail!" .25ms faster.

      --
      If you wanna get rich, you know that payback is a bitch
    2. Re:Personal computing? by kebes · · Score: 2, Insightful

      You point out that few desktop tasks require parallel processing... but think about the flip-side of this: if we could speed-up many tasks, how would that affect desktop computing?

      There are plenty of tasks that people do routinely on computers that are not "instantaneously" fast (spreadsheets, photo-editing, etc.). Furthermore there are many aspects of modern user interfaces that would be better if they were faster (generating thumbnail previews, sorting entries, rescanning music collections, searching, etc.). Also, it's important to realize that the commonplace desktop elements of tomorrow may not have been imagined today. Many things that we don't even consider (and certainly don't consider as "necessary") may become possible (and thus "necessary") with greater computer power (complex graphs/images/previews that update in realtime as a user slides a control, instantaneous re-encoding of video when you drag-and-drop to an external device, etc.).

      My only point is that it is tempting to say that computers are "fast enough" and yet in my own computer-use (and watching the computer use of others) there are definitely times when the user must wait for the computer to finish a task (whether it is a split-second page render or a many-seconds refresh of a spreadsheet or a many-minute generation of a complex image). Until all of these tasks are "instantaneous" (shorter than human reaction time), then there is definite room for improvement in computer speed; and moreover improvements that the end-user will appreciate and come to rely on.

      You'll notice that of the examples I've mentioned, many of them could in principle be parallelized (and thus benefit from multi-core systems).

    3. Re:Personal computing? by 99BottlesOfBeerInMyF · · Score: 2, Insightful

      Exactly what areas of "personal computing" are requiring this horsepower?

      Video, audio, gaming, emulators, and VMs are starters. But I think you're missing some of the picture. Most computer users have one or two programs open at a time and end up quitting everything when they want to run something processor intensive like a game or photoshop. With the move towards multi-core and with a little work from developers, people might be able to leave 90% of the apps they use running, all the time. Multiple cores also provides something of a buffer. When a thread goes rogue, their machine does not grind to a halt. Heck, just yesterday my girlfriend was complaining because she tried to open a page in Firefox and it locked up the whole application including the other 8 tabs she had open. That means she had to kill it (which took a while itself) and then try to decide if she wanted to reopen all those tabs and risk it locking up again, or just try to remember what she had open and reopen them all by hand. If each tab, however is running in its own thread and there are enough cores to handle it, this could easily have been a much better experience for her. She could have just closed the unresponsive tab.

      Basically, I'd argue that if you provide the resources, smart developers will find a way to make clever use of those resources. Dual core has already sparked a revolution for virtualization and led to some other, really cool OS changes to increase speed. Many cores will provide diminishing returns (we have 2 eyes for a reason), but I bet 8 cores will be well utilized within a few years.

    4. Re:Personal computing? by ppanon · · Score: 2, Informative

      Well, you could parallelize recalculation of large spreadsheets. Create dependency trees for cells and split the branch recalculations among different threads. Some accountants and executives with large "what-if?"-type spreadsheets could find that quite useful.

      Browsers could have separate threads for data transfer and rendering. If the web site is using tags and CSS, you could split the rendering work for each div to a separate thread. More rapid and frequent partial screen updating can provide today's generation of MTV-style re-orientation addicted workers the perception of faster performance.

      Parallelize WISYWYG document preparation with a backend using TeX text-layout algorithms.

      But probably the biggest advantage would be obtained from more parallelism (both coarse and fine-grained) in GUI operations. That probably requires a re-architecting of display and GUI subsystems. But that's a bit of a chicken-and-egg problem because, to do that properly, you also need GPUs to become multi-core to remove the GPU as a single-thread bottleneck. GPUs are going to hit the same wall general purpose CPUs are hitting now, with a few years' delay. There's hope that today's Crossfire/SLI approaches could provide a hardware base to find an evolutionary path for that.

      I figure it will take at least another 5 years or more for a graphics subsystem redesign, and my guess is that it will happen on Linux first. I don't see Microsoft being first in re-architecting the Windows display subsystem to do it. Certainly not for the next Windows version in 2010(?), and thus they probably won't implement it until 2014 at the earliest. I think it's more likely to happen with somebody replacing large parts of the X.org server as a PhD thesis.

      But, yeah, fundamentally the biggest bottleneck with personal computer systems is the bandwidth between the user and the computer and there's no way to parallelize the user.

      --
      Laissez lire, et laissez danser; ces deux amusements ne feront jamais de mal au monde. - Voltaire
  10. Re:melt in your mouth not in your mobo by $RANDOMLUSER · · Score: 2, Informative

    Well then you're not remembering very well. There was some crazy statistic floating around that a Prescott at ~25Ghz would put out as much heat per cm^2 as the surface of the sun.

    --
    No folly is more costly than the folly of intolerant idealism. - Winston Churchill
  11. Re:How many languages have multithread support? by ILongForDarkness · · Score: 2, Interesting
    Matlab isn't that smart, you still have to tell it that the for loop is parallizable for example. I might be wrong but I don't think Java or C# do either. Their frameworks/VM's supply API's to do multi-threading you simply call into them for the support that you need. C has had pthreads for a long time (since it was standardized?), for some reason the C++ committee's never agreed on an implementation.

    There is a great talk by Bjarne Stroustrup (http://csclub.uwaterloo.ca/media/C++0x%20-%20An%20Overview.html) about the new version of C++ coming out and some of the difficulties getting things added. Essentially, if a new feature will only help 100,000 developers, it isn't important enough to be implemented. With such a huge developer community all the "little" things get left for non-standard API implementations, only big, almost everyone will find useful features get added. That is probably why this version or the next of C++ probably will get a standard tread library, because almost everyone has access to a multicore system. Oh yeah, also, and it sucks, anyone with a few thousand dollars to waste can get added to the committee, but most people don't care enough to go get their feature implemented for that much money (you also have the travel/time off to attend the meetings) except big business, so guess who runs the show (I don't expect anyone to be suprised).

  12. HPC by ShakaUVM · · Score: 2, Interesting

    As someone who got a master's in computer science with a focus in high performance computing / parallel processing, and have taught on the subject, *yes*, it does take a bit of work to wrap one's mind around the concept of parallel processing, and to correctly write code with concurrency. But *no*, it's not really that hard. Once you get used to the idea of having computation and communication cycles over a processor geometry, it becomes little more difficult to write parallel code than serial.

    It's like of like when people see recursive functions for the first time. If they don't understand the base condition and inductive step, then they can easily fall into infinite loops or write bugs. Parallel code is the same way... just a bit more tricky.

    1. Re:HPC by curunir · · Score: 2, Insightful

      I think the one thing that makes parallel computing more difficult, and quite a bit more so than recursion, is the fact that it makes your program non-deterministic. With a single-threaded application, it's pretty obvious when you've made your application non-deterministic...you reference the time or some resource external to your application. And those kinds of non-deterministic behaviors are much easier to understand...they're mostly just data. But if your application is running on multiple processors using multiple threads, that's not the case. You can run your application multiple times and see different results depending on which threads execute the fastest. And in the worst-case scenario, you get dead locks that are a nightmare to debug.

      It's often quite difficult to wrap your head around that unpredictability, especially since so much of the beginning computer science education teaches programmers to evaluate each instructions in their programs in source order as the computer is likely to end up doing when the program is run. This is made even worse by the fact that some languages (I know Java, but there may be others too) allow a compiler to re-order instructions to improve performance provided it doesn't alter that thread's behavior. This is fine for a single-threaded application, but can be quite confusing for a multi-threaded application when you can no longer assume source ordering of instructions from other threads.

      It took a while before I got comfortable with essentially asking myself "What am I assuming and do I actually know that at this point or do I just think I know it at this point" with every line of code that I write that might execute in a multi-threaded environment. Even with that, I still run into occasions where it takes over an hour to debug a race condition when that error only happens a small percentage of the time.

      --
      "Don't blame me, I voted for Kodos!"
  13. Sameless Plug: Qt 4.4 by scorp1us · · Score: 5, Informative

    Full disclosure: I am a Qt Developer (user) I do not work for TrollTech

    The new Qt4.4 (due 1Q2008) has QtConcurrent, a set of classes that make multi-core processing trivial.

    From the docs:

    The QtConcurrent namespace provides high-level APIs that make it possible to write multi-threaded programs without using low-level threading primitives such as mutexes, read-write locks, wait conditions, or semaphores. Programs written with QtConcurrent automaticallly adjust the number of threads used according to the number of processor cores available. This means that applications written today will continue to scale when deployed on multi-core systems in the future.

    QtConcurrent includes functional programming style APIs for parallel list prosessing, including a MapReduce and FilterReduce implementation for shared-memory (non-distributed) systems, and classes for managing asynchronous computations in GUI applications:

            * QtConcurrent::map() applies a function to every item in a container, modifying the items in-place.
            * QtConcurrent::mapped() is like map(), except that it returns a new container with the modifications.
            * QtConcurrent::mappedReduced() is like mapped(), except that the modified results are reduced or folded into a single result.
            * QtConcurrent::filter() removes all items from a container based on the result of a filter function.
            * QtConcurrent::filtered() is like filter(), except that it returns a new container with the filtered results.
            * QtConcurrent::filteredReduced() is like filtered(), except that the filtered results are reduced or folded into a single result.
            * QtConcurrent::run() runs a function in another thread.
            * QFuture represents the result of an asynchronous computation.
            * QFutureIterator allows iterating through results available via QFuture.
            * QFutureWatcher allows monitoring a QFuture using signals-and-slots.
            * QFutureSynchronizer is a convenience class that automatically synchronizes several QFutures.
            * QRunnable is an abstract class representing a runnable object.
            * QThreadPool manages a pool of threads that run QRunnable objects.

    This makes multi-core programming almost a no-brainer.

    --
    Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.
    1. Re:Sameless Plug: Qt 4.4 by Rodyland · · Score: 2, Insightful
      This makes multi-core programming almost a no-brainer.


      While you did say 'almost', I'm still going to take exception with that statement.

      That is a very dangerous thing to say without reams of qualifications.

      Programming (of any non-trivial nature) is not currently, nor is it likely to be any time soon, a 'no-brainer'. No library, no framework, no toolset, no abstraction takes away from the core fact that programming is hard. Sure, you can take away the boring/trivial stuff and give the programmers more time to work on the hard/interesting stuff, but that doesn't make it a 'no-brainer'.

      Abstracting away mapReduce just means you don't have to know how to write your own mapReduce implementation. It doesn't automatically make the user of Qt (or whatever) an expert in designing parallel algorithms, nor parallel debugging, nor the performance benefits and tradeoffs and gotchas of parallel programming.

  14. Evolution that halted at 4 ghz.... by MindPrison · · Score: 2, Interesting

    It's not easy... especially since things sort of halted at 4 ghz, what on earth am I typing about? Well...picture this...limitations...yes they do exist..and sometimes it's important to think beyond what lies just straight ahead (such as the next cycle speed)...and think into a second...maybe even a 3rd dimmension to expand your communication speed. I have for over 6 years been thinking..of a 3d-dimmension processor that cross communicates over a diagonal matrix instead of the traditional serial and parallel communication model. Imagine this folks...if your code could "walk" across a matrix of 10 x 10 x 10 instead of just 8 x 8 or 64 x 64 if you want...get the picture, no? Imagine that your data could communicate on a 3 dimmensional axis - imagine that you had 10 stacks of cores on top of each other - and instead of just connecting they communication bus to a parallel or a serial model...they could in fact communicate on a diagonel basis... this would make it possible to send commands...data..etc....in a 3d-space rather than just a "queue". This of course...would demand a different "mindset" of coding... everything would have to be written from scratch....though...but the benefits would be tremendeous .....you could 10 fold existing computational speed by increasing the communication across processor-cores...maybe even more! Even by todays technology standards. Ok..ok...sounds far fetched for you doesnt it? Well..get this...this was my invention 6 years ago (maybe even 9 years ago...I am getting older so I dont really care...I do care for freedom of information and sharing...Not so much wealth so listen on)...The theory of what I just wrote here on Slashdot (which has more implication on your life in the future than you will ever be capable of comprehending...yes...I am full of myself aint i....Who cares? You dont know me) .. point is... There was once a missing brick to the idea of diagonal cross matrix computing....with yesteryears technology it just would not be feasible to do it... but ...if you have ANY understanding of what I write here (yes...I am not kidding...this may change history as we know it...and I am drunk right now...and I dont want to keep a lid on it anymore)...here we go... Please think about what I just wrote - and - look up frances hellman's lecture upon magnetic materials in semiconductors...and you WILL have your 4-th link in the 3-B-E-C (base, Emitter, Collector) construction...to make the Cross Matrix Processor possible....just understand this....JoOngle invented this...Frances made it possible - YOU read it from a drunk nobody of Slashdot.org....) now...go make it real!

    --
    What this world is coming to - is for you and me to decide.
    1. Re:Evolution that halted at 4 ghz.... by Animats · · Score: 4, Informative

      I have for over 6 years been thinking..of a 3d-dimmension processor that cross communicates over a diagonal matrix instead of the traditional serial and parallel communication model.

      Six years, and you haven't discovered all the machines built to try that? This was a hot idea in the 1980s. Hypercubes, connection machines, and even perfect shuffle machines work something like that. There's a long history of multidimensional interconnect schemes. Some of them even work.

    2. Re:Evolution that halted at 4 ghz.... by Skrynkelberg · · Score: 5, Funny

      You may want to switch of the rapid fire-mode for your "."-key.

  15. And so it goes...... by Nonillion · · Score: 5, Insightful

    processors with more than eight cores, possible as soon as 2010 -- will transform the world of personal computing....

    Translation:

    Code will get even more inefficient / bloated and require faster hardware to do the same thing you are doing now. While I'm all for better / faster computer hardware, most if not all Jane and Joe Sixpack users never need Super Computer power to surf the net, read e-mail and watch videos.

    --
    "I bow to no man" - Riddick
  16. Erlang by Niten · · Score: 4, Informative

    Oddly enough, I just watched a presentation about this very topic, with an emphasis on Erlang's model for concurrency. The slides are available here:

    http://www.algorithm.com.au/downloads/talks/Concurrency-and-Erlang-LCA2007-andrep.pdf

    The presentation itself (OGG Theora video available here) included an interesting quote from Tim Sweeney, creator of the Unreal Engine: "Shared state concurrency is hopelessly intractable."

    The point expounded upon in the presentation is that when you have thousands of mutable objects, say in a video game, that are updated many times per second, and each of which touches 5-10 other objects, manual synchronization is hopelessly useless. And if Tim Sweeney thinks it's an intractable problem, what hope is there for us mere mortals?

    The rest of this presentation served as an introduction to the Erlang model of concurrency, wherein lightweight threads have no shared state between them. Rather, thread communication is performed by an asynchronous, nothing-shared message passing system. Erlang was created by Ericsson and has been used to create a variety of highly scalable industrial applications, as well as more familiar programs such as the ejabberd Jabber daemon.

    This type of concurrency really looks to be the way forward to efficient utilization of multi-core systems, and I encourage everyone to at least play with Erlang a little to gain some perspective on this style of programming.

    For a stylish introduction to the language from our Swedish friends, be sure to check out Erlang: The Movie.

  17. Re:Oh, wow by bladesjester · · Score: 5, Insightful

    A guy who's on the C++ standards committee AND works for Microsoft.

    Actually, according to the latest Dr Dobbs, Herb is the *chair* of the ISO C++ Standards committee. (He had an article on lock hierarchies being used to avoid deadlock)

    He's really going to know what he's talking about, then.

    As chair of the committee, I'd say there's a pretty fair chance that he *does*.

    I really love people who bash things just because Microsoft is involved. Contrary to what seems to be a popular belief here, they have some incredibly intelligent people who are very good at what they do there.

    --
    Everything I need to know I learned by killing smart people and eating their brains.
  18. Threads considered harmful by richieb · · Score: 4, Interesting
    Check out this article on O'Reilly's site. Threads are actually very low level construts (like pointers and manual memory management). Accordingly the future belongs to languages that eliminate threads as a basis for concurrency. See Erlang and Haskell.

    --
    ...richie - It is a good day to code.
  19. Re:Threads Are Not the Answer by caerwyn · · Score: 5, Interesting

    This is very, very wrong. Data-set partitioning is certainly one way of achieving parallelism in programming, but it is hardly the only way- nor is it applicable to all domains, as many problems have solutions with too many inter-cell data dependencies. In addition, threads provide a wealth of benefits to application developers by allowing multiple unrelated tasks to be performed simultaneously.

    There is, and will always be, overhead associated with parallelization. It may sound great to say "oh, we can farm out parts of this data set to other cores!", but that requires a lot of start-up and tear-down synchronization. It's not at all uncommon for overall performance to be improved by doing something *unrelated* at the same time, requiring less synchronization overhead.

    Are threads perfect for everything? No. But calling them the second worse thing to happen to computing is, as best, disingenuous.

    --
    The ringing of the division bell has begun... -PF
  20. This has been coming for a while. by jskline · · Score: 4, Interesting

    The fact is that programming by and large has gotten lazy, shiftless and sloppy over time and not any better or faster. They really did rely on processing and memory architectures getting faster to overcome their coding bottlenecks. The words; "optimized code" have little or no significance in todays programming shops because of budgets. Because of the push to get stuff out the door as quickly as possible, corners are cut all over the place on many things.

    There once was time when debugging was part of your job. Now; someone else does that and at most, the better coders do some unit testing to ensure their code snippet does what it is supposed to. There generally isn't any "standard" with regard to processes except in some houses that follow *recommended coding guidelines* but these are few and far between. Old school coders had a process in mind to fit a project as a whole and could see the end running program. Many times now, you are to code an algorithm without any regard or concept as to how it might be used. A lot of strange stuff going on out there in the business world with this!

    If there is a fundamental change in the base for C++, et al., this is going to possibly have a detrimental effect on the employment market as there will be many who cannot conceptualize multi-threading methodologies much less modeling some existing processing in this paradigm; and leave the markets.

    I left the programming markets because of the clash of bean counters vs quality, and maybe this will have a telling change in that curve. I always did enjoy some coding over the years and maybe this would make an interesting re-introduction. I have personally not coded in a multi-threading project but have the concepts down. Might be fun!

    --
    All content in this message is copyright (c) 2008. All rights reserved. RIAA is prohibited here.
  21. It's the OS Stupid !!! by deweycheetham · · Score: 2, Insightful

    It's the OS Stupid, Not Parallel Programming !!!

    Just because the latest and greatest release of a New OS by a certain vendor is dog slow doesn't mean it's time to start blaming Programmers and calling them LAME.

    There are several good Operating Systems out there that handle multiple threads on multi core machines just fine. They even do this in there basic scripting languages native to those Operating Systems and many have been doing them since the 70's.

    There are techniques out there that handle work just fine in a Parallel Program/Core Environments. On a side note, Data Encapsulated Object Oriented techniques are not always the best way handle performance issues. A look back in time has the several answers to this question and more. (Less We Forget)

    --- Old engineers never die, they just build away. (By deweycheetham) ---

  22. There's not much hope for the C++ committee by Animats · · Score: 3, Insightful

    I have little hope for the C++ standards committee. It's dominated by people who think really l33t templates are really cool. Everything has to be a template feature. They're fooling around with a proposal for declaring variables atomic through something like atomic<int> n; This allows really l33t programmers to write really l33t code using really l33t lockless programming. But without the proofs of correctness needed to make that actually work reliably.

    It's also long been Strostrup's position that concurrency is a library problem. As long as the OS provides threads and locking, it's not a language problem. This isn't good enough.

    The fundamental problem is that, as currently defined, a C++ compiler has no idea which variables are shared between threads, and which are never shared. The compiler has no notion of critical sections. Fixing this requires some fundamental changes to the language. It's known what to do; Modula, Ada, and Java all have synchronization and isolation built into the language. But there's nothing like that in C++, and the designers of C++ don't want to admit their mistakes.

    It's not just a C++ problem. Python has a similar issue. Python as a language doesn't deal with concurrency adequately. The main implementation, CPython, has a "global interpreter lock" that slows the thing down to single-CPU speed.

    1. Re:There's not much hope for the C++ committee by Animats · · Score: 2, Insightful

      Critical sections are a high level future which must be in a library.

      The problem is that a C++ compiler doesn't know what data is locked, and which data items are locked by which lock, because the language has no way to talk about that subject. OS-level primitives lock everything. The compiler has a hard time telling which data needs concurrency protection. Thus, the compiler can't diagnose race conditions.

      If the language understood locking, one could do more checking at compile time. One could take a hard-nosed approach. Every variable has to be locked by something. Either it's locked by the object of which it is a member (like Java's "Synchronized"), or the thread to which it is local, or by some other object which owns the variable. This last is something for which a language needs descriptive syntax.

      One approach would be syntax where the programmer declares a critical section, and lists everything that can be referenced within the critical section. But that might not be necessary. A system more like the way an SQL database decides transaction locking issues might be easier on the programmer.

      The big memory headache in C and C++ is always "who owns what", something with which the language provides no assistance. That's the cause of dangling pointers and memory leaks, but it's also the cause of much locking trouble.

  23. The rise of Erlang and Haskell? by steveha · · Score: 2, Interesting

    I know that languages like Erlang and Haskell are better for concurrent programming than more traditional languages. However, so far they have not been as popular as more traditional languages.

    Will the new world of concurrency cause a shift in language popularity? Or will traditional languages remain more popular, perhaps with some enhancements? C++ is gaining concurrency enhancements; C++, Python, and many other languages work well with map/reduce systems like Google MapReduce; and even with no enhancements to the language, you can decompose larger systems into multiple threads or multiple processes to better harness concurrency.

    If you know Haskell and Erlang, please comment: do those languages bring enough power or convenience for concurrency that they will rise in popularity? People grow very attached to their familiar languages and tools; to displace the entrenched languages, alternative languages need to not just be better, they need to be a lot better.

    steveha

    --
    lf(1): it's like ls(1) but sorts filenames by extension, tersely
    1. Re:The rise of Erlang and Haskell? by John+C+Peterson · · Score: 2, Interesting

      The purely functional approach has a lot of merit. But functional programming by itself probably won't solve the big problems. Erlang has a very specific approach to threads and parallelism that works very well when it's appropriate. A more general approach is taken in Concurrent Haskell (http://en.wikipedia.org/wiki/Concurrent_Haskell), in which a Software Transactional Memory (STM) replaces the lower level mechanisms such as locks that are so crucial to threaded programming. I expect the real breakthrough to occur when high level concurrency tools like STM come into use to replace the existing parallel programming framework of threads and locks. There was a time when everyone assumed that automatic memory management was "too high level / slow / buggy" to be practical in a real programming language but now most programmers are happy to build their programs without worrying about memory allocation. In the parallel world, threads and locks are the malloc / free of the past and something like STM could well be the basis for a higher level approach that will make concurrency a natural way to program.

    2. Re:The rise of Erlang and Haskell? by yermoungder · · Score: 2, Informative

      You could try Ada.

      Ada is a multi-paradigm language (i.e. procedural or OO) that has threads ("tasks") built it. The experiences of Ada83 tasking wasn't brilliant - the OS/hardward available at the time just weren't up to the job and hopelessly expensive. This left a nasty taste for some which in turn led to FUD about the language as a whole - you wouldn't believe the rubbish I've heard over the years about what Ada is or is supposed to do!

      Ada95 (and in particular the $0, Open Source GNAT compiler) changed that, making an affordable-for-the-masses,, fast Ada environment available on GNU/Linux and Windows platforms. It now comes with an Eclipse plug-in too.

      Now, Ada2005 has arrived which even extends OO into the domain of active objects (i.e. extensible, polymorphic tasks).

  24. Chip makers want developers to pay for lunch by ClosedSource · · Score: 2, Interesting

    Instead of developing single-core chips with better performance, chip makers are now making multicore machines and expecting developers to provide the extra performance.

    Without the work of developers, multi-core chips will be like the extra transistors in transistor radios in the 1960s: good for marketing but functionally useless.

  25. Re:Wow, this is a great idea! by poot_rootbeer · · Score: 2, Insightful

    People used to optimise everything way back when, but now I suspect that most people just let the faster processor take care of things rather than trying to squeeze every nanosecond of performance out of their apps :(

    Thank God for that.

    I'm glad that coders today can use high-level tools and languages without having to spend half their time on performance tweaking.

    Take as an example a game like Halo (or Guitar Hero, or World of Warcraft, or whatever your favorite modern game is). If the developers of these titles had to execute the same amount of care in optimization as developers did on the Atari 2600 -- where often, the author had to unroll simple countdown loops because they could not afford the overheard of DEC and BEQ instructions -- yes, the game kernel would probably run twice as fast. But on the other hand, each game would take a decade to complete!

    I'd happily trade some (but not all) efficiency in program execution for an increase in efficiency in program authoring. And that's exactly what we've done.

  26. Fine grained vs Coarse grained parallelism by White+Flame · · Score: 2, Insightful

    Fine grained (spread your for loops across processors) and coarse grained parallelism (different independent actors exchanging messages and working on tasks separately) are two completely different approaches, though they generally use the same mechanisms. Everybody always focuses on the fine grained and how that affects algorithms, but I personally believe that personal computing yields more benefit from coarse grained parallelism, where nothing in your program blocks because every task that it's performing is independent. Having modal, sequential operations that you have to wait for your computer perform before you get control back for an unrelated task in the same program is absolutely absurd in this day and age.

    The few instances where a personal application does spend significant time in a single task (media manipulation, mostly) could use fine grained parallelism, but that is not the common case. Stop whining about algorithm parallelism and get your system/application design broken out into independent components and tasks properly.

    Besides, as others have said, neither is particularly difficult to do properly. It's when you try to hack in threaded shared access without having properly contained the mutable data that you shoot yourself in the foot.

  27. Re:Wait for the new C++ standard before you switch by mariuszbi · · Score: 3, Informative

    Wait a second! Have you ever coded in C++ ? Even if threads are not in the standard library, you have boost, you have Intel's TBB(threading building blocks), besides the native threading library. Do you trust you library in Java? What if the VM screws everything up. As for the compiler "optimizing" everything there is a little keyword : volatile that just tells the compiler not to optimize memory access for that varible. A think the real problem is working in a new programming paradigm : have a problem with sharing variables : code everything using pure functions.

  28. LabVIEW [& other graphical environments] by mosel-saar-ruwer · · Score: 2, Interesting


    my current major language (Igor pro) will use all the cores automatically, and how many languages do multithread this way? Matlab(?), Octave(?)

    LabVIEW, by its very nature [which is graphical - based on "G" - the "Graphical" programming language] is kinda/sorta topologically self-threading: If a piece of LabVIEW code sits off in its own connected component, then [more or less] it gets its own thread.

    Of course, all your ".h" & ".c" [or ".cc"] files [& their innards] might very well break down into little distinct connected components which are ripe for running their own threads, it's just that you can't - unless you're some sort of a super genius - you can't readily visualize all those connected components as they exist in your code.

    Now you and your colleagues could try to anticipate the connected components a priori, during the "planning" phase: You could draw huge pictures on the dry-erase board, and everyone could yell and scream at each other about the topological structure which the code should ultimately embody, and then everyone would have to promise - Scout's Honor! - that they would stick to the blueprint [which they might very well resent as having been shoved down their throats by some pointed-headed suit who didn't have any clue what he was talking about] - but the beauty of LabVIEW is that THE CODE IS THE BLUEPRINT [which I think is a point that Jack Reeves used to make].

    There's actually a Slashdotter, MOBE2001, who maintains a blog called Rebel Science News, who's got some pretty interesting ideas here - he seems to be leaning towards a graphical approach to this [realizing that the fundamental nature of the problem tends to be topological, rather than anything which we (YET!) would recognize as semantic], but his program is very, very ambitious [if I had a couple of spare lifetimes, I must just throw one in that general direction].

    Another line of thought which everyone should keep an eye on is the discipline of Petri nets - it's kinduva big graphical/topological approach to state machines, which [if someone were to put the necessary elbow grease into it] might prove to be very useful in squeezing the most bang for the buck out of these massively-multicore CPU's.

  29. Microsofts view on cores by bjb_admin · · Score: 4, Funny

    No need for parallel computing all cores are already used.

    Core one: For the OS
    Core two: Anti-virus
    Core three: Anti-Spyware / Windows Defender
    Core four: Firewall
    Core five: Windows update notifications and installations
    Core six: Windows Genuine advantage checks
    Core seven: Eye Candy (Vista) with XP you get a bonus CPU
    Core eight: What ever the user wants to run, except when you get a virus, then
    you have to share it with the SPAM bot.

    Guess we will be waiting for 16 core CPU's.

    Oh and don't start me on memory requirements :-)

  30. Sutter's article is awesome by athloi · · Score: 2, Interesting

    When I first started programming, in BASIC on an Apple ][ (not IIe), I remember being baffled by the fact that the computer did not operate with multiple concurrent streams. To me, this seemed the point of making something that was "more than a calculator," and the only way we would be able to do the really interesting stuff with it.

    When I first started writing object-oriented code, I was somewhat dismayed to find that OO was an extension to the same ol' linear programming. It seemed to me that objects should be able to exist as if alive and react freely, but really, they were just a fancy interface to the linear runtime. Color me disapointed yet again.

    It's an important paradigm shift to recognize parallel computing. Maybe when the world realizes the importance of parallel computing, and parallel thinking, we'll have that singularity that some writers talk about. People will no longer think in such basic terms and be so ignorant of context and timing. That in itself must be nice.

    Sutter's article hits home with all of this. His conclusion is that efficient programming, and elegant programming that takes advantage of, not conforms to, the parallel model is the future. Judging by the chips I see on the market today, he was right, 2.5 years ago. He will continue to be right. The question is whether programmers step up to this challenge, and see it as being as fun as I think it will be.

  31. Er... What drugs are you taking? by djelovic · · Score: 2, Insightful

    > This makes multi-core programming almost a no-brainer.

    What uttermost and complete crap.

    We are nowhere near multi-core programming being a no-brainer.

    Here's what we know right now:

    1. We know how to manually create threads to perform specialized tasks. This comes nowhere near the ideal which is loading all the CPUs roughly the same, taking in account CPU affinity for some tasks in order to keep the caches warm and work well on NUMA architectures.

    2. We know how to exploit data parallelism in those cases where we have large quantities of data.

    Other than that we are still trying to find any paradigm that would make arbitrary systems scale well on a massive number of cores. Some of them are based on pi calculus, some on join calculus, some on more practical foundations.

    At this point some things are obvious:

    1. CPU threads are useless except as part of the foundation on which other abstractions are built. All really scalable systems use either lightweight threads/processes or smaller tasks which are scheduled in user space.

    2. Native stacks are evil.

    3. Thread affinity, as implemented by Windows USER and GDI modules and STAs is evil. Don't know how this works under Linux as I never did any GUI work there but I assume many components have similar limitations.

    4. Any solution that exposes locks to the user instead of hiding them in the infrastructure is evil. Locks are not composable are very error-prone in real-world scenarios.

    Dejan

  32. Re:Threads Are Not the Answer by 0xABADC0DA · · Score: 2, Insightful

    There is, and will always be, overhead associated with parallelization. It may sound great to say "oh, we can farm out parts of this data set to other cores!", but that requires a lot of start-up and tear-down synchronization. I think what you meant to say is that with os threads it requires a lot of effort and overhead. For example on Tera/Cray's MTA it took basically no extra overhead at all to run a loop in parallel over N hardware threads. The only 'hard' part was letting the compiler know which loops to do in parallel.

    The problem with os threads is that the things the benefit the most from parallel processing are the finest grained, but the os threads are only usable for the coarsest grained problems. So, OS threads are generally only useful for concurrency and not for parallel execution. Ie meaning that os threads can let you do two mostly different 'tasks' at the same time (repainting the GUI while the data is being processed), but are really bad at actually making a single task run faster.

    You can, sometimes, with incredible effort make os threads run one task faster. But that doesn't change the fact that they are a really really bad solution for this.
  33. Re:Diaspora by Chirs · · Score: 3, Insightful

    For many large-scale software projects (I work in industry so I have some experience with this) it is far easier to find more cpu power than more programmers.

    Making code easy to read and maintain is critical to maximizing the efficiency of the programmer. The efficiency of the code is generally a secondary issue, and is only a factor if the code in question is found to be a bottleneck.

    Brian Kernighan once said,

    "Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?"

  34. Re:C++? by Yetihehe · · Score: 4, Informative

    ...while all the clever folks have already started writing their scalable applications in something reasonable, like Erlang?
    From erlang site:

    1.4. What sort of problems is Erlang not particularly suitable for?

    People use Erlang for all sorts of surprising things, for instance to communicate with X11 at the protocol level, but, there are some common situations where Erlang is not likely to be the language of choice.

    The most common class of 'less suitable' problems is characterised by performance being a prime requirement and constant-factors having a large effect on performance. Typical examples are image processing, signal processing, sorting large volumes of data and low-level protocol termination.
    That's why most applications are still in c/c++
    --
    Extreme Programming - Redundant Array of Inexpensive Developers
  35. The cure is the Actor programming model by master_p · · Score: 2, Interesting

    The cure for solving all of parallel programming problems (deadlocks, priority inversion etc) is the Actor model: each object is a separate thread, and calling a method does not invoke code, it only puts a request in the message queue of the called object. Then the thread behind the object wakes up and processes the requests.

    If an object wants a result from another object, then it obtains a future value that represents the result of the computation when it will be ready. When the caller wants the actual value, it blocks until the result is available.

    Of course, blocking on a result would cause a deadlock in recursive algorithms...therefore, objects don't wait for a result, they simply enter a new message loop at the position they wait for a result. When the result is ready, the callee wakes up the caller by putting a 'terminate current loop' message in the caller's message loop after the result is computed.

    The Actor model, implemented as described above, not only solves the problems of classical parallel programming (deadlocks, priority inversion, etc), but it also exposes whatever parallelism is there in a program.

    Synchronization is performed only in two places:

    1) when inserting/removing elements in an object's queue.
    2) when adding the current thread into the waiting list of a future value.

    Both synchronizations are implemented via spinlocks. In the case of the queue, there is no need to synchronize on all the queue, just on the edges.

    I have made a demo in C++, using Boehm's garbage collector (it is a quite complex system, it needs gc), and it works beautifully. With this model, there is no need to use mutexes, semaphores, wait conditions, or any other synchronization primitive.

    I chose C++ because:

    1) operator overloading allows future values to be treated naturally like non-future values.
    2) when waiting for a result, the waiting thread puts itself in the waiting list of the future. The nodes of the list are allocated on the stack; only c/c++ can do this, and it is crucial, because it minimizes allocation.

    Another advantage of this system is that tail recursion comes for free: when you call a method which you don't want the result of, the local stack is not exhausted, because there is no call, only a message placed in a queue.

    Patterns like the producer/consumer pattern come for free: one object simply invokes the other.

    Data parallelism comes for free: invoking a computation on an array of objects will execute the computations in parallel, on each element of the array. For example, increasing the elements of an array can take O(N) with one CPU and O(1) with N cpus.

    Of course, it is much slower on two or even four cores than the same sequential code. But given 10 or more cores, programs start to exhibit linear increase in performance, depending on algorithm of course.

    The system is much like the nervous system of an animal: signals are transmitted slowly from one nerve to another, but processing is parallel, so the organism can do many things at the same time.

    Another similarity between this system and the nervous system of an animal is that when a nerve wants to transmit an electrical signal to another nerve, the nerves must synchronize, much like there should be synchronization when an object puts a message in the object of another thread.

  36. Memory matters, too by Furry+Ice · · Score: 2, Insightful

    I see a lot of comments indicating that all a programmer needs to do to scale to more cores is just multithread your algorithms. If only that were true! Unfortunately, memory access patterns become extremely important for getting good performance, and that requires some pretty sophisticated knowledge about the hardware and proper tuning is almost a black art. Once large numbers of cores are in use, scaling your software optimally is going to be very difficult. Don't delude yourself. Talented programmers are going to be very much in demand, and I suggest starting to learn everything you can about it now. For starters, Ulrich Drepper has written an incredibly detailed and helpful article available at http://people.redhat.com/drepper/cpumemory.pdf which should really help dispel any notions that this change to computing is going to be easy!

  37. Developers need to get out for lunch by dbIII · · Score: 2, Interesting

    Even a child's toy like the Nintendo DS from 2004 has two cores. Developers need to remember it isn't the early 1990s anymore and that they will have to deal with multiprocessor machines.

  38. threads are too high level by Zork+the+Almighty · · Score: 2, Insightful

    Perhaps I am the only person who thinks this, but is seems to me that threads are not a very good low-level primitive for concurrent programming. They inherently assume that whatever is running on the different processors is independent. As a result, writing a tightly coupled parallel algorithm is "hard".

    I would much rather the operating system switch 4 or 16 synchronized cores completely over to me. Add prefixes to the assembly instructions so that I can explicitly execute instructions on processor 1, 2, 3, etc, in a shared memory model. Add logic similar to simultaneous multithreading to keep unused cores saturated with instructions from other threads when possible. This would help the programmer extract parallelism from tightly coupled algorithms. There seems to be no real multithreaded analogue to assembly language, and I think that is a big part of the problem. If we had such a thing it would be much easier to write tightly coupled parallel code, and higher level parallelization (from compilers) would follow inevitably.

    Of course I'm not saying this is some sort of magic bullet. We would still need to split up computations and use threads as best as possible, but I think this is an obvious tool that we are missing.

    --

    In Soviet America the banks rob you!
  39. Re:Oh, wow by bladesjester · · Score: 2, Insightful

    We bash in attempt to convince those smart people to leave MS and work in a more open way.

    In doing so, you prove yourself a fool. It is a childish action that only hurts your cause, and Microsoft (as well as most people with any business or social sense) knows it.

    You see Microsoft as some great evil to be overcome without seeing that a large part of your problem is yourself.

    Companies see people like you bash anything that isn't open source or "free" and they quite rightly think that you haven't really thought things out or lack the business acumen to realize why all of the world can't work that way. (Not to mention the extreme lack of social skills that it shows)

    I like open source, I use it, I occasionally write it, and I've championed the cause in a sane way.

    What you are missing is that Microsoft is giving a lot of people and companies what they want - software that is relatively easy to use and which everyone else is already using ("best" doesn't matter most of the time, which a lot of you have problems understanding).

    At the same time, they treat their employees well, paying them well with good benefits (from what I've heard from people I know who work there), and maintain well-respected research labs.

    You do not draw good people from a good environment by telling them it's not a good environment because they don't make everything open source. You draw good people by being a better environment in terms of pay, benefits, culture, work-life balance etc *and* appealing to their sensibilities.

    If you can't do that, and instead simply bash anyone for associating with "the enemy", you are doomed to fail because, at best, people will work on it as a hobby. The lion's share of good open source software is done by people being paid to do it. Bashing the company of people you want to work for you does not help.

    Not all of the world cares about open source, and many of us who do are not fanatical about it and realize that, while it is good for some things, is absolutely horrible for other things from a business standpoint. We like working on things that we see as important, but we also like being able to pay our bills and having a life outside of work.

    --
    Everything I need to know I learned by killing smart people and eating their brains.
  40. Re:Me too by cecil_turtle · · Score: 2, Informative

    I'd love to say,"Core 1, you will convert DVDs(or mp3s, or some other processor-intensive task). Core 2, run everything else." You can, with processor affinity. Unless you're saying your processor isn't dual-core. Even still you can just set your process priority / nice level (whatever OS you run) so that it's a lower priority so your other programs run OK.

    I don't know, I haven't owned a computer since 2003 where the processor was really a bottleneck anyway. Unless you're doing something specific like converting media files or running a distributed application (seti, folding, etc.) then normally the bottleneck is disk access. Even on servers it's not much of an issue for me, it's pretty easy to throw more CPU horsepower at a machine nowadays, but again disk performance is killer expensive.
  41. Chip makers are at least 2 decades behind by ClosedSource · · Score: 2, Insightful

    "Multi-CPU systems started becoming common in the mid 1990s so developers being a decade behind the times is a little embarrassing and there are many situations where the task is not completly serial."

    So after a decade of poor adoption on the part of software developers, the chip makers have ignored the fact that the wisdom of the (programming) mob indicates that multi-processing is not an attractive solution. Chip makers have known for more than two decades that they were going to run into physical limits eventually using the current technology, but opted for milking the 1970's model as long as possible rather than developing new technologies that might lead to much better single-core performance.

  42. High level language by oliderid · · Score: 2, Interesting

    I guess it will a dumb question but:

    Why a Java virtual machine can't take the burden of the multi-core adaptation?

    They have promised "write once run anywhere"!

    Lazy coder :-)