Faster Chips Are Leaving Programmers in Their Dust
mlimber writes "The New York Times is running a story about multicore computing and the efforts of Microsoft et al. to try to switch to the new paradigm: "The challenges [of parallel programming] have not dented the enthusiasm for the potential of the new parallel chips at Microsoft, where executives are betting that the arrival of manycore chips — processors with more than eight cores, possible as soon as 2010 — will transform the world of personal computing.... Engineers and computer scientists acknowledge that despite advances in recent decades, the computer industry is still lagging in its ability to write parallel programs." It mirrors what C++ guru and now Microsoft architect Herb Sutter has been saying in articles such as his "The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software." Sutter is part of the C++ standards committee that is working hard to make multithreading standard in C++."
Some algorithms are inherently not amenable to parallelization. If you have eight cores instead of one, then the performance boost you can get can be anywhere from eight times faster to none at all.
So far, multiple cores have boosted performance mostly because the typical user has multiple applications running at a time. But as the number of cores increases, the beneficial effects diminish dramatically.
In addition, most applications these days are not CPU bound. Having eight cores doesn't help you much when three are waiting on socket calls, four are waiting on disk access calls and the last is waiting for the graphics card.
The cake is a pie
It's not just making your app multithreaded, it's completely changing your algorithms so they they take advantage of multiple processors. I took a parallel programming course in University, so I'm by no means an expert, but I'll give what insight I have. You can't just take a standard sort algorithm and run in multithreaded. You have to change the entire algorithm. In the end, you end up with something that sorts faster than n log (n). However, doing this type of programming where you break up the dataset, sort each set, and then gather the results can be very difficult. Many debuggers don't deal well with multiple threads, so that adds an extra layer of difficulty to the whole problem. Granted, I don't think that we really need this level of multithreadedness, but I think that's what the article is referring to. I think that 10+ core CPUs will only really help for those of us who like to do multiple things at the same time. I think it would even be beneficial to keep most apps tied to a single CPU so that a run-away app wouldn't take over the entire computer.
Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Neither does Microsoft's Outlook Express, but I don't think that was his point.
processors with more than eight cores, possible as soon as 2010 -- will transform the world of personal computing....
Translation:
Code will get even more inefficient / bloated and require faster hardware to do the same thing you are doing now. While I'm all for better / faster computer hardware, most if not all Jane and Joe Sixpack users never need Super Computer power to surf the net, read e-mail and watch videos.
"I bow to no man" - Riddick
A guy who's on the C++ standards committee AND works for Microsoft.
Actually, according to the latest Dr Dobbs, Herb is the *chair* of the ISO C++ Standards committee. (He had an article on lock hierarchies being used to avoid deadlock)
He's really going to know what he's talking about, then.
As chair of the committee, I'd say there's a pretty fair chance that he *does*.
I really love people who bash things just because Microsoft is involved. Contrary to what seems to be a popular belief here, they have some incredibly intelligent people who are very good at what they do there.
Everything I need to know I learned by killing smart people and eating their brains.
I have little hope for the C++ standards committee. It's dominated by people who think really l33t templates are really cool. Everything has to be a template feature. They're fooling around with a proposal for declaring variables atomic through something like atomic<int> n; This allows really l33t programmers to write really l33t code using really l33t lockless programming. But without the proofs of correctness needed to make that actually work reliably.
It's also long been Strostrup's position that concurrency is a library problem. As long as the OS provides threads and locking, it's not a language problem. This isn't good enough.
The fundamental problem is that, as currently defined, a C++ compiler has no idea which variables are shared between threads, and which are never shared. The compiler has no notion of critical sections. Fixing this requires some fundamental changes to the language. It's known what to do; Modula, Ada, and Java all have synchronization and isolation built into the language. But there's nothing like that in C++, and the designers of C++ don't want to admit their mistakes.
It's not just a C++ problem. Python has a similar issue. Python as a language doesn't deal with concurrency adequately. The main implementation, CPython, has a "global interpreter lock" that slows the thing down to single-CPU speed.
Oh, good grief--since the original poster didn't specify units, and since it's highly unlikely the running time would be exactly n log n for any choice of units, and since it's pretty common to leave out the O() in casual conversation, the only sensible interpretation is that the O() was implied....
It's not quite like that.
On modern systems, threads are themselves first-class constructs, and it runs somewhat like this:
A process has things like memory-tables for virtual memory, handles for objects, files, socket connections, etc. A process always contains at least one thread (this isn't always true while a process is being set up or torn down, but it's true when most anyone's code is running).
A thread generally has a stack (in the host-process's virtual address space, so everyone can read it), some thread-local storage to make life easier for some api's (you don't need to care about this in most cases), and lives in a process. This means that threads can use virtual addresses for memory interchangeably with other threads in the same process.
Additionally, some operating systems support fibers. A fiber is like a thread except that it has to be explicitly or cooperatively (not quite the same thing) multi-tasked. Fibers use even less memory than threads, and you really don't have to care about them.
When you're in, say, Visual Studio, there's a "threads" window for all of the threads of the process that you are debugging. You can end up stepping through code on one thread while other threads are running.
The modern hardware designs lead to interesting performance side-effects from cache location and memory location. It's not quite as hard as systems that have asymmetric access to resources (e.g. Playstation 2), but it makes for fun work.
For many large-scale software projects (I work in industry so I have some experience with this) it is far easier to find more cpu power than more programmers.
Making code easy to read and maintain is critical to maximizing the efficiency of the programmer. The efficiency of the code is generally a secondary issue, and is only a factor if the code in question is found to be a bottleneck.
Brian Kernighan once said,
"Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?"
It's a shame that the single-threaded model became so ingrained in everything, including linux. For an example that comes to mind, why do I need to wait for my mail program to download all headers from the IMAP server before I can compose a new message on initial startup?
I'm of the opposite opinion; it's a shame that so many people equate parallel processing with threads. When there's not much shared data, using multiple processes keeps memory protection between your parallel "things", decreasing coupling, increasing isolation, and generally resulting in a more stable system (and for certain things where you can avoid some cache coherency problems, a faster system). Your example is perfect; there's really no good reason to use a thread for such lookups. Another process would do, or even better just use select() and avoid all the pain (and bugs) of a multithreaded solution.
OS developers spent a lot of engineering time implementing protected memory. Threads throw out a huge portion of that; a good programmer won't do that without very good reasons. Some tasks, where there really are tons of complicated data structures to be shared, are good candidates for threading. More commonly, though, threads are used either because the programmer doesn't know any better or because they allow you to be a slacker about defining exactly what is shared and mediating access to it. The latter is especially dangerous; defining exactly what (and how) things are shared goes most of the way toward eliminating multiprocessing bugs, and threads make it easy to slack off on that and get a "mostly working" solution that occasionally deadlocks, fails to scale, etc.
Use processes or state machines when you can, and threads when you must.
rage, rage against the dying of the light
There's a wide variance in what "parallel computing" means. For multicore, you've essentially just got a cheaper version of SMP (symmetric multiprocessing). This is worlds away from what occurs in a parallel computer and what most parallel programming algorithms deal with. With multicore and SMP you program mostly like you're doing multithreading on a single CPU.
The algorithms programmers have to deal with here involve concurrency, and have been in use for decades by anyone writing an OS or device driver. Dining Philosopher problem, readers and writers synchronization, etc. These are used on what most people think of as single processor computers and are essential. So I don't really think of these as "parallel programming", but as "parallel-light".
Parallel programming to me means dealing with SIMD or MIMD machines. MIMD has multiple processors each with its own memory and data, not multiple processors all sharing the same memory like SMP does. They may have high speed connections to a subset of other processors, such as being arranged in a grid or cube. SIMD has multiple processors all with their own data space but executing the same instruction sequences; the simplest form of which might be vector processors. The algorithms for these machines have very little in common with multithreading types of algorithms.
The parallel algorithms that require lots of sharing between processors will hit a bottleneck on the RAM with these multicore CPUs.