Faster Chips Are Leaving Programmers in Their Dust

← Back to Stories (view on slashdot.org)

Faster Chips Are Leaving Programmers in Their Dust

Posted by CmdrTaco on Monday December 17, 2007 @05:42AM from the or-maybe-they've-already-wrapped-around-to-zero dept.

mlimber writes "The New York Times is running a story about multicore computing and the efforts of Microsoft et al. to try to switch to the new paradigm: "The challenges [of parallel programming] have not dented the enthusiasm for the potential of the new parallel chips at Microsoft, where executives are betting that the arrival of manycore chips — processors with more than eight cores, possible as soon as 2010 — will transform the world of personal computing.... Engineers and computer scientists acknowledge that despite advances in recent decades, the computer industry is still lagging in its ability to write parallel programs." It mirrors what C++ guru and now Microsoft architect Herb Sutter has been saying in articles such as his "The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software." Sutter is part of the C++ standards committee that is working hard to make multithreading standard in C++."

8 of 573 comments (clear)

Min score:

Reason:

Sort:

Re:Thank god by zifn4b · 2007-12-17 06:11 · Score: 5, Informative

The only significant thing that managed languages make easier with regard to multithreading other than a more intuitive API is garbage collection so that you don't have to worry about using reference counting when passing pointers between multiple threads.

All of the same challenges that exist in C/C++ such as deadly embrace and dining philosophers still exist in managed languages and require the developer to be trained in multi-threaded programming.

Some things can be more difficult to implement like semaphores. You also have to be careful about what asynchronous methods and events you invoke because those get queued up on the thread pool and it has a max count.

I would say managed languages are "easier" to use but to be used effectively you still have to understand the fundamental concepts of multithreaded programming and what's going on underneath the hood of your runtime environment.

--
We'll make great pets
Sameless Plug: Qt 4.4 by scorp1us · 2007-12-17 06:19 · Score: 5, Informative

Full disclosure: I am a Qt Developer (user) I do not work for TrollTech

The new Qt4.4 (due 1Q2008) has QtConcurrent, a set of classes that make multi-core processing trivial.

From the docs:

The QtConcurrent namespace provides high-level APIs that make it possible to write multi-threaded programs without using low-level threading primitives such as mutexes, read-write locks, wait conditions, or semaphores. Programs written with QtConcurrent automaticallly adjust the number of threads used according to the number of processor cores available. This means that applications written today will continue to scale when deployed on multi-core systems in the future.

QtConcurrent includes functional programming style APIs for parallel list prosessing, including a MapReduce and FilterReduce implementation for shared-memory (non-distributed) systems, and classes for managing asynchronous computations in GUI applications:

* QtConcurrent::map() applies a function to every item in a container, modifying the items in-place.
* QtConcurrent::mapped() is like map(), except that it returns a new container with the modifications.
* QtConcurrent::mappedReduced() is like mapped(), except that the modified results are reduced or folded into a single result.
* QtConcurrent::filter() removes all items from a container based on the result of a filter function.
* QtConcurrent::filtered() is like filter(), except that it returns a new container with the filtered results.
* QtConcurrent::filteredReduced() is like filtered(), except that the filtered results are reduced or folded into a single result.
* QtConcurrent::run() runs a function in another thread.
* QFuture represents the result of an asynchronous computation.
* QFutureIterator allows iterating through results available via QFuture.
* QFutureWatcher allows monitoring a QFuture using signals-and-slots.
* QFutureSynchronizer is a convenience class that automatically synchronizes several QFutures.
* QRunnable is an abstract class representing a runnable object.
* QThreadPool manages a pool of threads that run QRunnable objects.

This makes multi-core programming almost a no-brainer.

--
Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.
Re:2005 Called by caerwyn · 2007-12-17 06:24 · Score: 4, Informative

As you know, multiple threads in a program do not actually execute concurrently - processing is still serial, it's just so fast that threads can appear to execute simultaneously - and it's not just about queuing execution either.

That holds only for multithreaded programming on a single core. As soon as there are multiple cores available, processing does, in fact, happen simultaneously.

--
The ringing of the division bell has begun... -PF
Erlang by Niten · 2007-12-17 06:24 · Score: 4, Informative

Oddly enough, I just watched a presentation about this very topic, with an emphasis on Erlang's model for concurrency. The slides are available here:

http://www.algorithm.com.au/downloads/talks/Concurrency-and-Erlang-LCA2007-andrep.pdf

The presentation itself (OGG Theora video available here) included an interesting quote from Tim Sweeney, creator of the Unreal Engine: "Shared state concurrency is hopelessly intractable."
The point expounded upon in the presentation is that when you have thousands of mutable objects, say in a video game, that are updated many times per second, and each of which touches 5-10 other objects, manual synchronization is hopelessly useless. And if Tim Sweeney thinks it's an intractable problem, what hope is there for us mere mortals?

The rest of this presentation served as an introduction to the Erlang model of concurrency, wherein lightweight threads have no shared state between them. Rather, thread communication is performed by an asynchronous, nothing-shared message passing system. Erlang was created by Ericsson and has been used to create a variety of highly scalable industrial applications, as well as more familiar programs such as the ejabberd Jabber daemon.

This type of concurrency really looks to be the way forward to efficient utilization of multi-core systems, and I encourage everyone to at least play with Erlang a little to gain some perspective on this style of programming.

For a stylish introduction to the language from our Swedish friends, be sure to check out Erlang: The Movie.
Re:Evolution that halted at 4 ghz.... by Animats · 2007-12-17 06:36 · Score: 4, Informative

I have for over 6 years been thinking..of a 3d-dimmension processor that cross communicates over a diagonal matrix instead of the traditional serial and parallel communication model.
Six years, and you haven't discovered all the machines built to try that? This was a hot idea in the 1980s. Hypercubes, connection machines, and even perfect shuffle machines work something like that. There's a long history of multidimensional interconnect schemes. Some of them even work.
Re:2005 Called by chaboud · 2007-12-17 06:37 · Score: 3, Informative

Well, 2005 called...

it wants its reply back.

The parent is exactly how I would have replied a couple of years ago. I was doing lots of threading work, and I found it easy to the point of being frustrated with other programmers who weren't thinking about threading all of the time.

I was wrong in two ways:

1. It's not that easy to do threading in the most efficient way possible. There's almost always room for improvement in real-world software.

2. There are plenty of programmers who don't write thread-safe/parallel code well (or at all) that are still quite useful in a product development context. Some haven't bothered to learn and some just don't have the head for it. Both types are still useful for getting your work finished, and, if you're responsible for the architecture, you need to think about presenting threading to them in a way that makes it obvious while protecting the ability to reach in and mess with the internals.

The first point is probably the most important. There are several things that programmers will go through on their way to being decent at parallelization. This is in no strict order and this is definitely not a complete list:

- OpenMP: "Okay, I've put a loop in OpenMP, and it's faster. I'm using multiple processors!!! Oh.. wait, there's more?"
Now, to be fair, OpenMP is enough to catch the low-hanging fruit in a lot of software. It's also really easy to try out on your code (and can be controlled at run-time).

- OpenMP 2: "Wait... why isn't it any faster? Wait.. is it slower?"
Are you locking on some object? Did you kill an in-loop stateful optimization to break out into multiple threads? Are you memory bound? Blowing cache? It's time to crack out VTune/CodeAnalyst.

- Traditional threading constructs (mutices, semaphores): "Hey, sweet. I just lock around this important data and we're threadsafe."
This is also often enough in current software. A critical section (or mutex) protecting some critical data solves the crashing problem, but it injects the lock-contention problem. It can also add the cost of round-tripping to the kernel, thus making some code slower.

- Transactional data structures: "Awesome. I've cracked the concurrency problem completely."
Transactional mechanisms are great, and they solve the larger data problem with the skill and cleanliness of an interlocked pointer exchange. Still, there are some issues. Does the naive approach cleanly handle overlapping threads stomping on each-others' write-changes? If so, does it do it without making life hell for the code changing the data? Does the copy/allocation/write strategy save you enough time through parallelism to make back its overhead?

Should you just go back to a critical section for this code? Should you just go back to OpenMP? Should you just go back to single-threading for this section of code? (not a joke)

Perhaps as processors get faster by core-scaling instead of clock-scaling this will become less of a dilemma, but to say that "[to do multi-threaded programming effectively] is not that difficult" is akin to writing your first ray-tracer and saying that 3D is "not that difficult." Somtimes it is. At least at this point there are places where threading effectively is a delicate dance that not every developer need think about for a team to produce solid multi-threaded software.

That doesn't mean that I object to threading being a more tightly-integrated part of the language, of course.
Re:Wait for the new C++ standard before you switch by mariuszbi · 2007-12-17 07:10 · Score: 3, Informative

Wait a second! Have you ever coded in C++ ? Even if threads are not in the standard library, you have boost, you have Intel's TBB(threading building blocks), besides the native threading library. Do you trust you library in Java? What if the VM screws everything up. As for the compiler "optimizing" everything there is a little keyword : volatile that just tells the compiler not to optimize memory access for that varible. A think the real problem is working in a new programming paradigm : have a problem with sharing variables : code everything using pure functions.
Re:C++? by Yetihehe · 2007-12-17 08:26 · Score: 4, Informative

...while all the clever folks have already started writing their scalable applications in something reasonable, like Erlang?
From erlang site:
1.4. What sort of problems is Erlang not particularly suitable for?

People use Erlang for all sorts of surprising things, for instance to communicate with X11 at the protocol level, but, there are some common situations where Erlang is not likely to be the language of choice.

The most common class of 'less suitable' problems is characterised by performance being a prime requirement and constant-factors having a large effect on performance. Typical examples are image processing, signal processing, sorting large volumes of data and low-level protocol termination.
That's why most applications are still in c/c++

--
Extreme Programming - Redundant Array of Inexpensive Developers