Faster Chips Are Leaving Programmers in Their Dust
mlimber writes "The New York Times is running a story about multicore computing and the efforts of Microsoft et al. to try to switch to the new paradigm: "The challenges [of parallel programming] have not dented the enthusiasm for the potential of the new parallel chips at Microsoft, where executives are betting that the arrival of manycore chips — processors with more than eight cores, possible as soon as 2010 — will transform the world of personal computing.... Engineers and computer scientists acknowledge that despite advances in recent decades, the computer industry is still lagging in its ability to write parallel programs." It mirrors what C++ guru and now Microsoft architect Herb Sutter has been saying in articles such as his "The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software." Sutter is part of the C++ standards committee that is working hard to make multithreading standard in C++."
....it wants it's article back.
Seriously - any developer writing modern desktop or server applications that doesn't know how to do multi-threaded programming effectively deserves to be on EI anyway. It is not that difficult.
just start a multithread process: 1 core for the program itself, the remaining 7 for the bugs...
II hhaavvee aann XX22 pprrocceessssoor? Ii ccaann ggooeess TTWWIICCEE aass ffaasstt nnooww?
"...Well, there's egg and bacon; egg sausage and bacon; egg and spam; egg bacon and spam; egg bacon sausage and spam..."
I remember learning to write software for OS/2 back in the early 90's. Multi-threaded programming was *the* model there, and had it been more popular, it would be pretty much standard practice today, making scaling to multiple cores pretty effortless, I'd think. It's a shame that the single-threaded model became so ingrained in everything, including linux. For an example that comes to mind, why do I need to wait for my mail program to download all headers from the IMAP server before I can compose a new message on initial startup? Same with a lot of things in firefox.
Does anybody remember DeScribe?
Thank god that Java, C# and other piles of shit I hate do this quite intuitively and easily.
/me closes his eyes and embraces C++ for the last time before the inevitable doom
Guess I had it coming.
Bot Assisted Blogging
Some algorithms are inherently not amenable to parallelization. If you have eight cores instead of one, then the performance boost you can get can be anywhere from eight times faster to none at all.
So far, multiple cores have boosted performance mostly because the typical user has multiple applications running at a time. But as the number of cores increases, the beneficial effects diminish dramatically.
In addition, most applications these days are not CPU bound. Having eight cores doesn't help you much when three are waiting on socket calls, four are waiting on disk access calls and the last is waiting for the graphics card.
The cake is a pie
"processors with more than eight cores, possible as soon as 2010 -- will transform the world of personal computing"
Exactly what areas of "personal computing" are requiring this horsepower? The only two that come to mind are games and encoding video. The video encoding part is already covered - that scales nicely to multiple threads, and even free encoders will use the extra cores to their full potential. That leaves gaming, which is basically proprietary. The game engine must be designed so that AI, physics, and other CPU-bound algorithms can be executed in parallel. This has already been addressed.
So this begs the question, exactly how will average consumer benefit from an OS and software that can make optimum use of multiple cores, when the performance issues users complain about are not even CPU-bound in the first place?
Dan East
Better known as 318230.
Full disclosure: I am a Qt Developer (user) I do not work for TrollTech
The new Qt4.4 (due 1Q2008) has QtConcurrent, a set of classes that make multi-core processing trivial.
From the docs:
The QtConcurrent namespace provides high-level APIs that make it possible to write multi-threaded programs without using low-level threading primitives such as mutexes, read-write locks, wait conditions, or semaphores. Programs written with QtConcurrent automaticallly adjust the number of threads used according to the number of processor cores available. This means that applications written today will continue to scale when deployed on multi-core systems in the future.
QtConcurrent includes functional programming style APIs for parallel list prosessing, including a MapReduce and FilterReduce implementation for shared-memory (non-distributed) systems, and classes for managing asynchronous computations in GUI applications:
* QtConcurrent::map() applies a function to every item in a container, modifying the items in-place.
* QtConcurrent::mapped() is like map(), except that it returns a new container with the modifications.
* QtConcurrent::mappedReduced() is like mapped(), except that the modified results are reduced or folded into a single result.
* QtConcurrent::filter() removes all items from a container based on the result of a filter function.
* QtConcurrent::filtered() is like filter(), except that it returns a new container with the filtered results.
* QtConcurrent::filteredReduced() is like filtered(), except that the filtered results are reduced or folded into a single result.
* QtConcurrent::run() runs a function in another thread.
* QFuture represents the result of an asynchronous computation.
* QFutureIterator allows iterating through results available via QFuture.
* QFutureWatcher allows monitoring a QFuture using signals-and-slots.
* QFutureSynchronizer is a convenience class that automatically synchronizes several QFutures.
* QRunnable is an abstract class representing a runnable object.
* QThreadPool manages a pool of threads that run QRunnable objects.
This makes multi-core programming almost a no-brainer.
Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.
processors with more than eight cores, possible as soon as 2010 -- will transform the world of personal computing....
Translation:
Code will get even more inefficient / bloated and require faster hardware to do the same thing you are doing now. While I'm all for better / faster computer hardware, most if not all Jane and Joe Sixpack users never need Super Computer power to surf the net, read e-mail and watch videos.
"I bow to no man" - Riddick
Oddly enough, I just watched a presentation about this very topic, with an emphasis on Erlang's model for concurrency. The slides are available here:
http://www.algorithm.com.au/downloads/talks/Concurrency-and-Erlang-LCA2007-andrep.pdf
The presentation itself (OGG Theora video available here) included an interesting quote from Tim Sweeney, creator of the Unreal Engine: "Shared state concurrency is hopelessly intractable."
The point expounded upon in the presentation is that when you have thousands of mutable objects, say in a video game, that are updated many times per second, and each of which touches 5-10 other objects, manual synchronization is hopelessly useless. And if Tim Sweeney thinks it's an intractable problem, what hope is there for us mere mortals?
The rest of this presentation served as an introduction to the Erlang model of concurrency, wherein lightweight threads have no shared state between them. Rather, thread communication is performed by an asynchronous, nothing-shared message passing system. Erlang was created by Ericsson and has been used to create a variety of highly scalable industrial applications, as well as more familiar programs such as the ejabberd Jabber daemon.
This type of concurrency really looks to be the way forward to efficient utilization of multi-core systems, and I encourage everyone to at least play with Erlang a little to gain some perspective on this style of programming.
For a stylish introduction to the language from our Swedish friends, be sure to check out Erlang: The Movie.
A guy who's on the C++ standards committee AND works for Microsoft.
Actually, according to the latest Dr Dobbs, Herb is the *chair* of the ISO C++ Standards committee. (He had an article on lock hierarchies being used to avoid deadlock)
He's really going to know what he's talking about, then.
As chair of the committee, I'd say there's a pretty fair chance that he *does*.
I really love people who bash things just because Microsoft is involved. Contrary to what seems to be a popular belief here, they have some incredibly intelligent people who are very good at what they do there.
Everything I need to know I learned by killing smart people and eating their brains.
...richie - It is a good day to code.
This is very, very wrong. Data-set partitioning is certainly one way of achieving parallelism in programming, but it is hardly the only way- nor is it applicable to all domains, as many problems have solutions with too many inter-cell data dependencies. In addition, threads provide a wealth of benefits to application developers by allowing multiple unrelated tasks to be performed simultaneously.
There is, and will always be, overhead associated with parallelization. It may sound great to say "oh, we can farm out parts of this data set to other cores!", but that requires a lot of start-up and tear-down synchronization. It's not at all uncommon for overall performance to be improved by doing something *unrelated* at the same time, requiring less synchronization overhead.
Are threads perfect for everything? No. But calling them the second worse thing to happen to computing is, as best, disingenuous.
The ringing of the division bell has begun... -PF
I have for over 6 years been thinking..of a 3d-dimmension processor that cross communicates over a diagonal matrix instead of the traditional serial and parallel communication model.
Six years, and you haven't discovered all the machines built to try that? This was a hot idea in the 1980s. Hypercubes, connection machines, and even perfect shuffle machines work something like that. There's a long history of multidimensional interconnect schemes. Some of them even work.
The fact is that programming by and large has gotten lazy, shiftless and sloppy over time and not any better or faster. They really did rely on processing and memory architectures getting faster to overcome their coding bottlenecks. The words; "optimized code" have little or no significance in todays programming shops because of budgets. Because of the push to get stuff out the door as quickly as possible, corners are cut all over the place on many things.
There once was time when debugging was part of your job. Now; someone else does that and at most, the better coders do some unit testing to ensure their code snippet does what it is supposed to. There generally isn't any "standard" with regard to processes except in some houses that follow *recommended coding guidelines* but these are few and far between. Old school coders had a process in mind to fit a project as a whole and could see the end running program. Many times now, you are to code an algorithm without any regard or concept as to how it might be used. A lot of strange stuff going on out there in the business world with this!
If there is a fundamental change in the base for C++, et al., this is going to possibly have a detrimental effect on the employment market as there will be many who cannot conceptualize multi-threading methodologies much less modeling some existing processing in this paradigm; and leave the markets.
I left the programming markets because of the clash of bean counters vs quality, and maybe this will have a telling change in that curve. I always did enjoy some coding over the years and maybe this would make an interesting re-introduction. I have personally not coded in a multi-threading project but have the concepts down. Might be fun!
All content in this message is copyright (c) 2008. All rights reserved. RIAA is prohibited here.
I have little hope for the C++ standards committee. It's dominated by people who think really l33t templates are really cool. Everything has to be a template feature. They're fooling around with a proposal for declaring variables atomic through something like atomic<int> n; This allows really l33t programmers to write really l33t code using really l33t lockless programming. But without the proofs of correctness needed to make that actually work reliably.
It's also long been Strostrup's position that concurrency is a library problem. As long as the OS provides threads and locking, it's not a language problem. This isn't good enough.
The fundamental problem is that, as currently defined, a C++ compiler has no idea which variables are shared between threads, and which are never shared. The compiler has no notion of critical sections. Fixing this requires some fundamental changes to the language. It's known what to do; Modula, Ada, and Java all have synchronization and isolation built into the language. But there's nothing like that in C++, and the designers of C++ don't want to admit their mistakes.
It's not just a C++ problem. Python has a similar issue. Python as a language doesn't deal with concurrency adequately. The main implementation, CPython, has a "global interpreter lock" that slows the thing down to single-CPU speed.
You may want to switch of the rapid fire-mode for your "."-key.
Wait a second! Have you ever coded in C++ ? Even if threads are not in the standard library, you have boost, you have Intel's TBB(threading building blocks), besides the native threading library. Do you trust you library in Java? What if the VM screws everything up. As for the compiler "optimizing" everything there is a little keyword : volatile that just tells the compiler not to optimize memory access for that varible. A think the real problem is working in a new programming paradigm : have a problem with sharing variables : code everything using pure functions.
No need for parallel computing all cores are already used.
:-)
Core one: For the OS
Core two: Anti-virus
Core three: Anti-Spyware / Windows Defender
Core four: Firewall
Core five: Windows update notifications and installations
Core six: Windows Genuine advantage checks
Core seven: Eye Candy (Vista) with XP you get a bonus CPU
Core eight: What ever the user wants to run, except when you get a virus, then
you have to share it with the SPAM bot.
Guess we will be waiting for 16 core CPU's.
Oh and don't start me on memory requirements
For many large-scale software projects (I work in industry so I have some experience with this) it is far easier to find more cpu power than more programmers.
Making code easy to read and maintain is critical to maximizing the efficiency of the programmer. The efficiency of the code is generally a secondary issue, and is only a factor if the code in question is found to be a bottleneck.
Brian Kernighan once said,
"Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?"
Extreme Programming - Redundant Array of Inexpensive Developers