Windows and Linux Not Well Prepared For Multicore Chips
Mike Chapman points out this InfoWorld article, according to which you shouldn't immediately expect much in the way of performance gains from Windows 7 (or Linux) from eight-core chips that come out from Intel this year. "For systems going beyond quad-core chips, the performance may actually drop beyond quad-core chips. Why? Windows and Linux aren't designed for PCs beyond quad-core chips, and programmers are to blame for that. Developers still write programs for single-core chips and need the tools necessary to break up tasks over multiple cores. Problem? The development tools aren't available and research is only starting."
With Linux "My Kung-Fu is Strong!"
Give us a year maybe two.
http://www.infoworld.com/archives/emailPrint.jsp?R=printThis&A=/article/09/03/20/Multicore_chips_pose_next_big_challenge_for_industry_1.html
So basically yet another tech writer finds out that a huge number of applications are still single threaded, and that it will be a while before we have applications that can take advantage of the cores that the OS isn't actively using at the moment. Well, assuming you're running a desktop and not a server.
This isn't a performance issue with regards to Windows or Linux, they're quite adept at handling multiple cores. They just don't need that much themselves and the applications run these days, individually, don't need much more than that either.
So yes, applications need parallelization. The tools for it are rudimentary at best. We know this. Nothing to see here.
get a mac..
Is this just me, or is this a classic piece of non-news on a par with the one the post subject is in reference to?
I mean, isn't it a typical and completely rational technological modus operandi that hardware developments come first and software implementations take some time to emerge (with the possible exception of specialized applications)
I mean, imagine software being developed for imaginary or speculatory hardware. Sounds like a big waste of time to me...
Is TFA talking about the Linux or Windows thread and scheduling not good enough for 4+ cores (so your programs no matter how good designed will not benefit from more cores), about being damn hard to split, thread and join tasks, or both?
Multiple virtual machines on the same piece of metal, with a workstation hypervisor, and intelligent balancing of apps between backends.
Multiple OSes sharing the same cores. Multiple apps running on the different OSes, and working together.
Which can also be used to provide fault tolerance... if one of the worker apps fails, or even one of the OSes fails, your processor capability is reduced, a worker app in a different OS takes over, use checkpointing procedures, and shared state, so the apps don't even lose data.
You should even be able to shutdown a virtual OS for windows updates without impact, if the apps that arise get designed properly...
...programmers are to blame for that
The development tools aren't available and research is only starting."
Stupid programmers! Not able to develop software without the tools! In my day we wrote our own tools - in the snow, uphill, both ways! We didn't need no stink'n vendor to do it for us - and we liked it that way!
Firstly, it's false on the face of it: Ubuntu is certified on Sun T2000, a 32-thread and Canonical is supporting it.
Secondly. it's the same FUD as we heard from uniprocessor manufacturers when multiprocessors first came out: this new "symmetrical multiprocessing" stuff will never work, it'll bottleneck on locks.
The real problem is that some programs are indeed badly written. In most cases, you just run lots of individual instances of them. Others, for grid, are well-written, and scale wonderfully.
The ones in the middle are the problem, as they need to coordinate to some degree, and don't do that well. It's a research area in computer science, and one of the interesting areas is in transactional memory.
That's what the folks at the Multicore Expo are worried about: Linux itself is fine, and has been for a while.
--dave
davecb@spamcop.net
The article doesn't really say that Windows and Linux aren't "designed" for quad+ core chips; it just says that most software is still single threaded. No kidding.
Languages like PHP/Perl, as a rule, are not designed for threading - at ALL. This makes multi-core performance a non-starter. Sure, you can run more INSTANCES of the language with multiple cores, but you can't get any single instance of a script to run any faster than what a single core can do.
I have, so, so, SOOOO many times wished I could split a PHP script into threads, but it's just not there. The closest you can get is with (heavy, slow, painful) forking and multiprocess communication through sockets or (worse) shared memory.
Truth be told, there's a whole rash of security issues through race conditions that we'll soon have crawling out of nearly every pore as the development community slowly digests multi-threaded applications (for real!) in the newly commoditized multi-CPU environment.
I have no problem with your religion until you decide it's reason to deprive others of the truth.
"The development tools aren't available and research is only starting"
Hardly. Erlang's been around 20 years. Newer languages like Scala, Clojure, and F# all have strong concurrency. Haskell has had a lot of recent effort in concurrency (www.haskell.org/~simonmar/papers/multicore-ghc.pdf).
If you prefer books there's: Patterns for Parallel Programming, the Art of Multiprocessor Programming, and Java Concurrency in Practice, to name a few.
All of these are available now, and some have been available for years.
The problem isn't that tools aren't available, it's that the programmers aren't preparing themselves and haven't embraced the right tools.
Too bad BeOS died. One of the axioms the developers had was 'the machine is a multi processor machine', and everything was built to support that.
Seems like they were 15 years ahead of their time. But, on the other hand, too late to establish an other OS in a saturated market. Pity, really.
get a mac..
I assume you're talking about Mac OS X 10.6 (Snow Leopard), whose Grand Central framework is supposed to add some tools to make Mac-exclusive multithreaded apps easier to program.
Yes, some problems lend themselves very well to multicore designs. Many others do not. Just because they are building multicore ships does not mean that multicore is the right answer. Current multicore designs have too small cache, and too slow memory bandwidth. If my problem is CPU bound, multicore can be a solution. If my problem is memory access bound, multicore is only going to make it worse.
"To those who are overly cautious, everything is impossible. "
The idea that every program needs to support threading is kinda stupid. Most programs barely use any computational power, in fact there are very few programs that require all that computing power to operation and those are certainly well designed.
did you forget to take your meds?
imagine software being developed for imaginary or speculatory hardware.
I think Sun called it "Java". It was run on emulators long before ARM and others came out with hardware-assisted JVMs such as Jazelle.
Maybe I'm just not a multicore user. Ever thought of that?
The quote presented in the summary is nowhere to be found in the linked article. To make matters worse, the summary claims that linux and windows aren't designed for multicore computers but the linked article only claims that some applications are not designed to be multi-threaded or running multiple processes. Well, who said that every application under the sun must be heavily multi-threaded or spawning multiple processes? Where's the need for a email client to spawn 8 or 16 threads? Will my address book be any better if it spans a bunch of processes?
The article is bad and timothy should feel bad. Why is he still responsible for any news being posted on slashdot?
What good are multiple cores and threads when you are running event driven GUI application? Some applications, especially Java applications, already use too many unnecessary threads. Inciting threads use where it is unnecessary is stupid. There is only a limited space for parallelism in any algorithm.
Also, "...research is only starting." What BS is this? Multiprocessing and multithreading issues are being researched and solved several decades now.
Who would have ever guessed that most software is single-threaded rather then multi-threaded, and the programmers of Linux and Windows don't really feel like optimizing everything for 8-core CPUs that won't be released for quite some time and won't end up in the average user's box for 3 or more years.
Taxation is legalized theft, no more, no less.
Multiple virtual machines on the same piece of metal, with a workstation hypervisor, and intelligent balancing of apps between backends.
But with how many apps can one user interact? I understood the article to be referring to desktop applications, not server applications. In a desktop environment, most applications spend much of their time waiting for an event. For example, a virus scanner blocks until a file is modified or a removable medium is mounted. Or are you envisioning connecting four terminals to one desktop PC and binding one virtual machine to each terminal?
The /. summary of TFA is almost exquisitely bad. It's not Window or Linux that's not ready for multicore (as both have supported multi-processor machines for on the order of a decade or more), but rather the userspace applications that aren't ready. The reason is simple: Parallel programming is rather hard, and historically most ISVs have haven't wanted to invest in it because they could rely on the processors getting faster every year or two... but no longer.
One area where I disagree with TFA is the claimed paucity of programming models and tools. Virtually every OS out there supports some kind of concurrent programming model, and often more than one depending on what language is used -- pthreads, Win32 threads, Java threads, OpenMP, MPI or Global Arrays on the high end, etc. Most debuggers (even gdb) also support debugging threaded programs, and if those don't have enough heft, there's always Totalview. The problem is that most ISVs have studiously avoided using any of these except when given no other choice.
--t
"My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac
Developers still write programs for single-core chips and need the tools necessary to break up tasks over multiple cores.
So what? If I had a 32 core system, at least each running process (even if single-threaded) could have a core just for itself. Only a few basic applications (such as a browser) really need to be designed for multiples threads.
Most programs barely use any computational power, in fact there are very few programs that require all that computing power to operation and those are certainly well designed.
Home users do use some apps that could benefit from multiple cores. Video encoding is one of them, but that one is embarrassingly parallel because the encoder could just split the video into quadrants and have each of four cores work on one quadrant.
When I was studying Comp Sci, I recall that most assignments were to 'understand the concept' and program a solution.
Usually the programs were single-threaded. Maybe a section of a course was on concurrency (mutexes, threading), but not an entire course or courses.
As multi-core becomes more the norm, then perhaps there can be an entire course on concurrency and how to design/program with this thinking in mind.
Uh, Linux geek since 1999.
This canard will not fly! In the 1980's we and many others were profitably using shared-memory processors (discrete versions of multi-core chips), 20 CPUs in a Sequent Balance, in my case. We used high-level languages and sophisticated library support.
The success then was because processor and memory speeds were "balanced," now wildly imbalanced. It's an architectural problem that has not been solved by huge, multi-level caches. For a simple explanation of why, try the classic "Htting the Memory Wall: Implications of the Obvious" (www.cs.virginia.edu/papers/Hitting_Memory_Wall-wulf94.pdf).
Split-phase memory operations have been shown to help, but that innovation must be tied to hardware-supported multithreading.
Where's the need for a email client to spawn 8 or 16 threads?
Message classification. An e-mail client could open a process for each message, and the process would analyze the message to see what labels (spam, work, personal, etc.) belong on the message. If you get a lot of mail, I imagine that classifying several hundred downloaded messages might take a while.
This is a problem, but one specific only to certain programs. Pull up task manager, and take a look at the processes list. Odds are unless you're running something big in the back ground, you won't see any process taking up more than 50% CPU on your dual-core, or 25% CPU on your quad core. In fact, odds are none will be even close to that.
Multi-threading can offer little speed increase there (there is theoretically some as code is executed simultaneously, but it's negligible and probably unnoticeable); its value is only truly seen is when a program can actually make use of more processor power than any single core has. Video conversion is a good example -- on my dual core at home, most of my video conversion tools hit 50% CPU and run at that until done. It's programs like this that can take advantage of multi-threading and therefore having access to more raw processing power at once (double, in fact).
I agree that it would be nice to see more tools out there to add ease to coding for multi-core processors, and to see those few, CPU intense programs suddenly see double the processing power. But given that only a very specific selection of software requires it, and moreover a lot of the time that is not software the "average joe" would be using, it's probably just not vital enough to hit the priority lists yet (especially given that there are a few programs out there that do successfully implement multi-threading, and others that mimic it to a lesser extent).
What good are multiple cores and threads when you are running event driven GUI application?
Mozilla Firefox is an event-driven GUI application. But if I open a page in a new tab, a big reflow or JavaScript run in that page can freeze the page I'm looking at. You can see this yourself: open this page in multiple tabs, and then try to scroll the foreground page. If Firefox used a thread or process per page like Google Chrome does, the operating system would take care of this. Other applications need to spawn threads when calling an API that blocks, such as gethostbyname() or getaddrinfo(); otherwise, the part of the program that interacts with the user will freeze. But these are the kind of threads that are useful even on a single core, not multicore-specific optimizations.
Wait, what? Sorry, the windows kernel might indeed not be prepared for large multicore systems (IIRC a 64-proc limit), but linux ALREADY RUNS most of the world's large HPC systems - some of which are clusters, but some of which are enormous SMP machines - linux runs 1024-proc SGI machines, for example. Linux has O(1) scheduling, good NUMA support, and can handle many, many cores already very well.
Of course the article has nothing to do with that, but rather userspace.
But even there, while multicore is new to the bulk of developers and there's a lot of wheel-reinvention going on, the HPC world has been doing parallel programming for DECADES. It's not new, the techniques are well known.
And just look at a typical linux (or windows) box - it's running quite a few concurrent processes already. Individual applications might not be parallelised, but having multicore sure helps when I want to leave amarok playing in the background while I play sauerbraten while I have bittorrent going.
There's not even a way in the C or C++ core language to start a new thread. And with many different third party libraries, there'll never be a reliable standard way to do it. Multithreading is great and everything, but if even such a popular programming language doesn't allow it, how am I supposed to produce programs for 8-core CPU's?
Seriously, no one has brought up functional programming, LISP, Scala or Erlang? When you use functional programming, no data changes and so each call can happen on another thread, with the main thread blocking when (& not before) it needs the return value. In particular, Erlang and Scala are specifically designed to make the most of multiple cores/processors/machines.
See also map-reduce and multiprocessor database techniques like BSD and CouchDB (http://books.couchdb.org/relax/eventual-consistency).
"... and research is only starting."
Hmmm... I remember people doing research on this subject at the University of Illinois when I was a graduate student there in the 1980s.
If you spend more time assigning blame than you do describing the problem, then clearly you don't have anything insightful to say.
"With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea...."
RFC 1925
However, there is an issue of overhead with switching, and it seems like running specific processes on specific cores would do enough to help here. I don't see why the average application needs run on more thane one core. It seems like the OS can assign a core a process, and there would no issue beyond the current multithreading.
Now, like the stuff written for the cray, there is some applications could take advantage of the parallel processing, but I don't see a general need for this. It would be like the original Mac where certain processes weree shifted to the graphics processor by the OS. Not that programs are not going to written differently, but this will happen over time. DOS applications did not become full fledged window applications over night.
"She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
'In fact, TFA doesn't even use the words "Linux" or "Windows."'
Yup. There may be a reason for that too.
The initial SMP support was added to Linux 1.3.42 on 15 Nov 1995. Linux is clearly well adapted to multicore CPUs. That is one of the reasons why Linux dominates over Windows on www.top500.org. The other argument is cost.
If you don't believe me, pull out a profiler and run it on one of your programs, it will show you where things can be easily sped up.
Now, given that the performance of most programs is not processor bound
That's a pretty big leap, I think.
Yes a lot of todays apps are more user bound than anything. But there are plenty of real-world apps that people use that are still pretty processor bound - Photoshop, and image processing in general is a big one. So can be video, which starts out disk bound but is heavily processor bound as you apply effects.
Even Javascript apps are processor bound, hence Chrome...
So there's still a big need for understanding how to take advantage of more cores - because chips aren't really getting faster these days so much as more cores are being added.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Being able to burn a DVD while encoding a video and playing some game all at the same time, is something that benefits from extra cores and does not require the apps themselves to know about the cores. Of course, this is not the most common situation - Perhaps the IT world is starting to realize home/office don't really need as much power? Even since 5 years ago MS was the only force driving us to require more power upgrades, but now with even them focusing for performance in windows 7, perhaps it is going to be the "year of Moore's law no longer relevant in the desktop"
Copyright infringement is "piracy" in the same way DRM is "consumer rape"
So here we are at the multiprocessing dilemma again. The summary gets it all wrong. It is referring to operating systems, which are fine with this kind of stuff. UNIX and derivate (Linux) systems were fine with multiprocessing for decades. Most of the big irons in the top 500 are running multi-core just fine. Even Windows got the hang out of it lately (I guess).
The problem is, that most application developers did not learn to wrap their minds around the multiprocessing paradigm. No tool can magically design your single threaded application to work multi-threaded. The developer needs to analyze the program flow, export computationally expensive operations to separate threads and manage to get a good junction control (locks, balanced threading). It's a design paradigm that has to be learned.
Problem is, that you can't get developers that are not used to the idea of multiprocessing paradigms to switch. Another problem is, that exactly this group of people is also teaching the new generation, so it is not going to change that fast either.
It's a bit like a chicken and egg problem: until there is no large distribution of multi-core systems, no one will have the urge to switch. So that's why it is a good thing that this new CPU's get out. Once they are there, developers will derive the need to utilize them to stay competitive. Kind of like natural selection and adoption of new environments.
Isn't exactly rocket science (well, except if you are writing a rocket guidance system).
He says Windows and Linux but spuriously leaves out Mac?
mac suffers from the same damn problem. The OS and most apps weren't written for multi-core processors.
that's why any true multi-core app is distributed. hence rendering farms, not rendering server.
They're using their grammar skills there.
You need to establish/prove purity to the compiler so it can actually make use of it.
Lisp, Scala and Erlang don't have that property.
Haskell does.
Haskell and other pure languages are where the future of parallelism might lie.
I've been programming for 30 years or so, and I've been feeling ashamed. I've been feeling like I've done something wrong and that I haven't structured my programs right. That if only I was smart enough I would be able to take advantage of these multicore systems.
But I think I'm feeling better about myself. If I write rational multithreaded programs and use scalable patterns like producer / consumer, then I'll be pretty much ready to go.
And it seems like a lot of this isn't really relevant for desktop applications. I mean, there's some amount of keeping the main event thread moving so that your application is responsive, and you do time consuming operations on separate threads. But the only time I've really used a whole lot of threading is in server apps where you have a whole bunch of incoming connections that you're processing concurrently.
I understand that there is a branch of computer science that surrounds parallel computing, and there are some applications that might benefit from this (image processing being the canonical example). But I think it's another tool in the toolbox. Another way to approach a problem like map / reduce or whatever is in vogue. Some problems will benefit from being solved this way. Some won't. Use the right tool.
And I don't understand why we need to beat the drum for more efficient use of multicore. It's cool, we'll figure out what to do with all these cores. And then we'll put that in our toolbox and use it when appropriate.
So you're watching a movie and writing a Slashdot comment. How many other things can you do at once that would require a core? Even if you have 30 other processes open, most of them would just be waiting for input. There's a limit on the tasks that a single human being can care about at once, and Microsoft doesn't appear ready to bring terminal servers to the home market.
Part of the problem is that tools do very little to help break programs down into parallelizable tasks. That has to be done by the programmer, they have to take a completely different view of the problem and the methods to be used to solve it. Tools can't help them select algorithms and data structures. One good book related to this was one called something like "Zen of Assembly-Language Optimization". One exercise in it went through a long, detailed process of optimizing a program, going all the way down to hand-coding highly-bummed inner loops in assembly. And it then proceeded to show how a simple program written in interpreted BASIC(!) could completely blow away that hand-optimized assembly-language just by using a more efficient algorithm. Something similar applies to multi-threaded programming: all the tools in the world can't help you much if you've selected an essentially single-threaded approach to the problem. They can help you squeeze out fractional improvements, but to really gain anything you need to put the tools down, step back and select a different approach, one that's inherently parallelizable. And by doing that, without using any tools at all, you'll make more gains than any tool could have given you. Then you can start applying the tools to squeeze even more out, but you have to do the hard skull-sweat first.
And the basic problem is that schools don't teach how to parallelize problems. It's hard, and not everybody can wrap their brain around the concept, so teachers leave it as a 1-week "Oh, and you can theoretically do this, now let's move on to the next subject." thing.
"And I don't understand why we need to beat the drum for more efficient use of multicore."
Huh? It is really simple. Because the industry wish to perpetuate a need for new products, whether we need them for the moment or not.
In the meantime, maybe some dude may discover the next killer-application which could actually harness the power at hand.
Very few pc programs can make the latest quad cores crawl. They typically handle anything you throw at them. Even most 3D games are swallowed.
So, accept it. The progress is there. If not for the need, so at least because of marketing and market shares.
Who in their right mind would by an inferior product, e.g. a CPU, if the competitor was cheaper and faster and consumed less power?
.
In the 80's Fortran, which stayed alive and healthy by working in the vector processor communit, got all sorts of instructions that are naturally out-of-order block processes. For example, for-loop and where-loop declarations that say the loop counter or loop array can be interated in any order. It has matrix parallel operation declarations.
Sun's fortran variant Fortress (sort of Java meets fortran) is designed from the start for thread safety so operations don't explicitly have to lock and unlock before expression.
And the new PGI fortran compiler has all sorts of compiler directives for automatic parallelization.
Some drink at the fountain of knowledge. Others just gargle.
Intel releases a new processor this year and the author is surprised that existing software applications aren't immediately taking advantage of it? This isn't a matter of changing a compiler setting or modifying a few methods, parallel computing requires major refactoring and fundamental redesign. And how are Windows 7 and Linux not well prepared? The development tools and applications aren't prepared, not the operating systems.
I disbelieve this entirely. UNIX/Linux is well designed for multiple core CPUs. Just take the whole single program, single small job approach of a pipeline command and you have your multicore solution ready. Programs that can make use of tasks that are IO bound are frequently written with threading in mind. qmail/apache are both well written for mutliple core CPUs. I don't see what the article is trying to say. Its clearly wrong.
Why UNIX?
I think that linux has been used successfully in massive multiprocessor computers (unless most of top 500 computers are mostly single processor ones).
In a desktop pc, the OS will take care of the multiple cpus to run the different apps, unless you are talking about heavily cpu intensive apps, and yes, you can put blame on those specific apps (at least for linux most apps arent OS specific)... but not in the OS.
I am, unfortunately, not an expert in functional languages. I do remember that LISP isn't pure functional.
The main point still stands - functional languages do already address this issue. You're absolutely right that LISP doesn't do all it needs to out of the box to address the issue properly.
I honestly have no idea if Erlang, Scala or Haskell do allow the compiler to identify pure functional calls, although I tend to believe the other AC response that Haskell, at least, does.
The tools has been here for some 10 years now, multithreading has existed a reaaally long time now, documentation was still lacking in late 90s, but running multiple threads is child's play now.
Like someone else stated, mostly programs aren't CPU bound, they spend most of their time waiting for data from HDD etc.
Applications benefitting from multiple cores have been multithreaded, or a lot of them. It's not a software paradigm limiting scalability.
Furthermore Windows 7 is MORE than capable of handling 8 cores, infact, Windows 7 probably starts to shine at 16 cores with it's SMP capabilities. Microsoft spent A LOT of time making sure there's that kind of scalability on Windows kernel.
I can't express enough how misinformed TFA writer is, and how clueless and ignorant he is. I'm SHOCKED that this kind of garbage is on Slashdot! Come on, even half-witted self-respecting geeks know about this stuff already better.
Pulsed Media Seedboxes
I championed it here but there is no software that utilizes it and programming for it is difficult as mentioned here in many articles.
New languages aren't being used to help out multicore or parallel processing with graphics chips.
A graphics chip computer built for gaming and general use would be amazing. It would cost as much as an entry level general chip using pc but could do 3d GAMES!
But would need parallel processing language.
http://www.gpgpu.org/
http://en.wikipedia.org/wiki/GPGPU
A computer that is used in an efficient way will at any time either do nothing (and hopefully switch to standby/hibernate after a couple minutes) or do several things parallelly. While I read Slashdot, my computer is mostly downloading mail, uploading files to a web server, defragging the disk, encoding a video, doing a background backup, etc. Or if it isn't, it can fold proteins. Modern browsers will also soon be multithreaded, some already are, so every tab, plugin etc. can run on its own core.
Apps that lack multithreading can also be a blessing - less overhead, and they are restricted to one core, so no matter how bad an app behaves, there will always be a core that isn't affected by the CPU hog so the machine stays responsive. Responsiveness is much more important than raw computing speed.
Mac FTW!
I like to differentiate myself with threading as a developer but this article is over the line.
It's absolutely absurd to say that multicore chips won't benefit a system when any modern Windows or Linux installation will not benefit users. I think I have like 20 windows open, and quite a few processes. Some of them are active, and some are not. The fact is, these systems, both Windows and Linux, and if anything, Linux, are designed to serve up multiple threads with multiple users on multiple processors. They -are- mainframe operating systems in a consumer role..
This is my sig.
apples has no 2 core + systems under $2500 so we need to hope that so 10.6 will work on any pc / work with to days hacks.
it is the answer to the question that no one asked...
In a real world application, as others have mentioned pretty much all of a programs time is spent in an idle loop waiting something to happen and in almost all circumstances it is input from the user in whatever form, mouse, keyboard, etc.
So lets say it is something life Final Cut. Now to be sure when someone kicks of a render this is an operation that can be spun off on its own thread or its own process, freeing up the main process loop to respond to other things that the user might be doing, but that is where the rubber really hits the road is user input. The user could do something that affects the process that was just spun off, either as a separate thread or process on the same core or any other number of cores so you have to keep track of what the user is doing in the context of things that have been farmed out into other cores/processes/threads.
Enter the OS.. Take your pick since it really does not matter which OS we are talking about, they all do the same basic things, perhaps differently, but they do. How does an OS designer make sure any of say 16 cores ( dual 8 core processors) are actually well and fairly utilized? Would it be designed to use a core to handle each of the main functions of the OS, lets say Drive Access, Com Stack pick your protocol here, Video Processing etc., or should it just run a scheduler like those that they now run which farms out thread processing based on priority? Is there really any priority scheme for multiple cores that could run say hundreds of threads / processes each? And what about memory? A single core machine that is say truly 64 bit can handle a very large amount of memory and that single core controls and has access to all that ram at its whim ( DMA not withstanding ), but what do you do now that you have 16 cores all wanting to use that memory, do we create a scheduler to schedule access from 16 different demanding stand alone processors or do we simply give each core a finite memory space and then have to control the movement of data from each memory space to another, since a single process thread ( handling the main UI thread for a program ) has to be aware of when something is finished on one core and then get access to that memory to present results either as data written to say a file or written into video memory for display?
I submit that the current paradigm of SMP is inadequate for these tasks and must be rethought to take advantage of this new hardware. I think a more efficient approach is that each core detected would be fired up with its own monitor stack as a place to start so that the scheduling is based upon the feedback from each core. The monitor program would be able to ensure that the core it is responsible for is optimized for the kind of work that is presented. This concept while complicated could be implemented and serve as a basis for further development in this very complex space.
In the terms of "super computers" this has been dealt with but in a very different methodology that I do not think lends itself to general computing. Deep Blue, Cray's and things like that aren't really relevant in this case since those are mostly very custom designs to handle a single purpose and are optimized for things like Chess or Weather Modeling, Nuclear Weapons study where the problem are already discretely chunked out with a known set of algorithms and processes. General purpose computing on the other hand is like trying to heard cats from the OS point of view since you never really know what is going to be demanded and how.
OS designers and user space software designers need to really break this down and think it all the way through before we get much further or all this silicon is not going to used well or efficiently.
Hey KID! Yeah you, get the fuck off my lawn!
Not all linear problems can be solved with parallel processing.
It takes 1 woman 9 months to produce a baby - but 9 women cannot produce a baby in a single month...
Software operates primarily on a linear function: Process A needs to be done before B, and B before C and so forth. The real issue is that dividing a linear process across parallel processors is notoriously difficult: Task "D" is sent to processor 2, however the data it needs to process is already sitting in the cache of processor 0... this slows things down and E finishes before D and the app crashes.
This is where the design of Microsoft's Hyper-V platform shows real promise. By placing a virtualization layer (Hypervisor) between the OS and the processor, the added abstraction layer can distribute dissimilar or unrelated processes to different cores. It can also assist with non-linear computing tasks that work well with parallel processing and even provide the framework by which
Look at it this way: there is no way that Microsoft is going to leave spare resources sitting idle. They'll figure out some way to consume every single one. It's the Microsoft way!
Good security is based upon reality and common sense. Common sense is a function of having common knowledge.
The entire idea of multi-core is not that your performance increases, but that performance doesn't decrease.
I want every thread to run simultaniously instead of timesharing. Imagine all your apps are devided in multiple threads, then you'r all timesharing again and boy, don't you just hate it when your entire computer slows down to a crawl?
I mean look at the succes of 3D window management; you'll lose a little performance overall but when a single process jumps to 100% CPU reservation then at least there's no 2D WM lockup.
Here be signatures
Would it not be much simpler though from a prespective of not needing to deal with complicating motion estimation algorithms and such just to split video work along groups of b-frames? Seems like as long as the video was more then a few frame groups in length you would get just as much gains without even needing to rempliment much if any of your existing codec algorithms.
In these discussions about parallelism, I used to recommend splitting a 2-hour video into four 30-minute parts and feeding each to a single-threaded encoder. But that would need more cache and memory bandwidth, something that a lot of PCs with multicore CPUs lack, and it wouldn't work at all for live streaming. Splitting at group-of-picture boundaries might work better, but it would still add more latency to a live stream.
The idea of an OS and/or suppoet tools handling the SMP problem is nothing more than a crutch for bad programming.
In fact, anyone who grew up with a real multitheaded, multitasking OS is already writing code that will scale just dandy to 8 cores and beyond. When you accept that a thread is nothing more or less than a typical programming construct, you simply write better code. This is no more or less an amazing thing than when regular programmers embraced subroutines or structures.
This was S.O.P. back in the late 80s under the AmigaOS, and enhanced in the early/mid 90s under BeOS. This in not new, and not even remotely tied to the advent of multicore CPUs.
The problem here is simple: UNIX and Windows. Windows had fake multitasking for so long, Windows programmers barely knew what you could do when you had "thread" in the same toolkit as "subroutine", rather than it being something exotic. UNIX, as a whole, didn't even have lightweight preemptive threads until fairly recently, and UNIX programmers are only slowly catching up.
However, neither of these is even slightly an OS problem... it's an application-level problem. If programmers continue to code as if they had a 70s-vintage OS, they're going to think in single threads and suck on 8-core CPUs. If programmers update themselves to state-of-the-1980s thinking, they'll scale to 8-cores and well beyond.
-Dave Haynie
I understand how on a single CPU, the interrupt line is set low and the device puts a unique number on the data bus. How are interrupts handled on these multicore chips.
No one's going to learn how to really work around using multiple cores until they're really out there in the wild where developers can work with them.
It really is the next logical step, and no one ought to be bitching no one knows how to dig holes when we just found out how to make shovels.
"Most people, I think, don't even know what a rootkit is, so why should they care about it?"
The simple fact is that most programming tasks are inherently linear. Sure, you can design programs that are better, and you can offload work to other CPU's in clever ways, but at the end of the day, you can't do that much better than a couple of major threads per program, with all of them running on an empty CPU.
In Office apps, you can't "offload" anything at all, really. Possibly a spellcheck or grammar check on the side, but you're not going to make *any* gains over the simplistic setups. Why? Because 99% of the program is spent waiting for the user to do something and, when they do, 99% of the time you can complete that task in a matter of microseconds.
In games, you can offload AI, physics, pathfinding, graphics drawing, etc. but at the end of the day you still have to limit interaction to what the user does (i.e. shoots, moves, etc.) and/or the FPS limit. You can get slightly more done by parallelising in that time, purely because the AI is not reliant on the graphics drawing etc., but every 1/60th of a second you have to bring everything to a halt and pass it off to be drawn in order.
In database apps, you can pass off I/O and tricky queries to other threads and so make gains, but you're just introducing a lot of locks, callbacks and everything else to be able to do that. You can scale with that, but you can't scale that far. And at some point, you've got to read the same data off the same disk as on a single-CPU system and pass that, with *all* it's results, to the user.
In operating systems, you can offload a lot of tasks, but again, most of the time you are looking at waiting for user input to actually do something.
It's an inherent limitation of the machinery and the uses, not the design of a particular operating system. Sure, you can make gains over what we have now, but the simple fact is that at some point you have to manage and collate all those seperate tasks into a result and you can't do it until everything's finished. To use games as an example (because they are a mass-market, hardware-pushing, performance-critical application that will routinely make use of multiple CPU's/GPU's to the full extent), you can't necessarily do the AI until the physics has been done (otherwise bots would walk into moving objects that weren't there a second ago). You can't do the graphics until the AI and physics are both done. And over all that, you have to do SOMETHING every 1/60th of a second whether the other threads have finished or not. And there's only so many ways you can split up tasks. You can do graphics rendering in blocks of pixels, as proposed, but at what point does the locking of memory and random bus access killing the memory cache actually make it *less* efficient that just running from 0 to 1024 and then from 0 to 768 (or whatever).
A lot of applications *don't* thread things that they should. On the desktop, asynchronous DNS is a major culprit in my opinion - I should not be able to hang file manager windows, firewalls, browsers, FTP clients, etc. just because my DNS server has gone down or is momentarily inaccessible. And when I click the god-damn Cancel button, then you should CANCEL the other thread as quickly as possible BUT also let me just get on with whatever else I want to do with this app. However, this has nothing to do with multi-core or operating systems, it's to do with single-threaded apps still being made on systems that have reliably handled multi-thread apps (even on single-core machines) for decades. Ideally, EVERY tab in my browser window should be a different thread. It would mean that a tab with a particularly heavy Javascript or particularly slow flash movie will not slow the operation of the browser itself down. It's quite a simple job but a lot of browsers don't do it - there's a reason for that and it's not because GCC or the operating system doesn't include a "pass this off to another thread" function.
The problem is not new, it's not exciting, it's not revolutionary, it's not going to lead to a whole new way to pr
Question: Is this looking at a single app or multiple? It seems fairly straightforward to me that most individual apps aren't going to see a huge boost from a hefty amount of cores, but multiple apps or threads/instances would probably see plenty.
Servers especially should be able to take advantage of this, where individual cores - just like multiple CPU's before - can handle multiple instances of a server daemon.
From what I've seen thus far, Unixy OS's handle SMP fairly well. I haven't touched windows webservers in awhile, but I'd imagine they might do well enough in that scenario too.
Translation: Not a big increase for your game/spreadsheet, but still some extra bang for multitasking. IO is still going to be a big bottleneck though.
I am writing this from my 8 core Intel box running Linux with 8GB of memory. This is the FASTEST computer I've ever had and the first time I've noticed a big leap forward. I normally don't care about cpu speeds, graphics cards, etc. Hardware tends to be fast enough for the current generation of software (I run Linux) and that's usually all you need. But this 8 core thing is different.
I develop and run very heavy graphics applications, where cpu tends to be the bottleneck. In my world, you used to rely on extra cpu from render farms or clusters to get the job done.
This world is changing. Shorter kind of jobs that require a quick turnaround, can now be done locally instead of sending jobs to the render farm. This is massive. As people start doing more jobs locally, it also frees up space for the longer running batch jobs, so they get done faster too.
When I first got the machine, it had Windows installed and it felt just as slow as a regular (single or double cpu/core) box. That should be of no surprise to anyone around here. But Linux sure knows how to use the multi core magic.
This is what i hear, "waa waaa please use more cores so people see a need to get a new 8 way CPU!"
I don't think this is a problem that programmers should solve. Sure its nice if they utilize multiple cores as much as possible but i don't want cores to be used just for the sake of it. If adding another core to an application gives a 30% gain im pretty sure i could use that power for better things in most cases.
The problem in my mind is that core speed has hit a brick wall and tossing more cores at the problem is just a desperate attempt to keep the upgrade treadmill going. Beyond four cores i personally wouldnt see any performance gains other than in rare occasions where i browse, watch movies, encode movies and unzip some large file. I would in those cases also hit the HD and i/o much more than the CPU.
HTTP/1.1 400
--dave
davecb@spamcop.net
Apple have no 2 core intel systems. Period.
Even the lowly Mac mini is a dual-core system. Every laptop is a dual-core system. The Mac Pro is either 4-core (with hyperthreading for a virtual 8-core) or 8-core (with hyperthreading for a virtual 16-core) system.
"Better to keep silent and look the fool, rather than speak and remove all doubt"
Simon.
Physicists get Hadrons!
This article is incorrect on two counts:
1) Linux scales just fine for SMP. Windows doesn't handle the caches on multicore systems quite optimally but also is fine on multicore systems (other than M$ having to update licensing I suppose, since they usually license for 4 cores max now on "normal" windows versions.) If your applications don't keep cores busy that is NOT the OSes problem.
2) Amdahl's Law. From wikipedia, "The speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program." It is NOT sensible to just say "Oh, most people have 8-cores, let's just turn OpenOffice (for example) into a bunch of threads." It just doesn't make sense. It may be *possible* to do, but at the expense of bugs (due to race conditions) and high synchronization overhead (the thread overhead isn't significant if each thread does a ton of work as is the case with most current multithreaded apps, but if you're splitting a app into threads just because, overhead could be quite high.) AMD and Intel might hate it, but most people just won't have a use for more than 1 or 2 cores. Even me, and I run TV capture and playback.
Presently I have entirely single-core systems (Athlon XP 2200+, a 2100+, a Sempron 2500, a P4-3.0, and a Celeron M-1.5). On my busiest system, if I upgraded to a quad or 8-core, 1 core would go for mythbackend (it's got a BT878 so CPU usage is high), Xorg maybe another core, and mythfrontend maybe a 3rd. But the 2nd and 3rd core would not be so busy, I would probably set the kernel core handling to powersave* and only actually use 1-2 cores.
*"echo 1 > /sys/devices/system/cpu/sched_mc_power_savings" should turn on the multicore power saving scheduler option, which gets a core near 100% busy before powering up the next core instead of dividing up threads evenly between all cores, saving power on systems that can power down cores individually.
Windows and Linux aren't designed for PCs beyond quad-core chips, and programmers are to blame for that. Developers still write programs for single-core chips and need the tools necessary to break up tasks over multiple cores.
How many times do we have to tell that Linux *IS* the fscking kernel??
Given that, including Linux and Windows in the same bag doesn't make sense. Which makes the entire post m00t.
Solutions:
1) s/Windows/Windows NT kernel/
2) s/Linux/GNU\/Linux/
Nice try to get a battle though.
Unix has for ages run on multi CPU systems. And it does this well. And with easy tools you can harvest the power of all CPUs: the pipe
Every part of the pipe can run on another CPU.
I recently came across fslint, which is a example of heavily piped shell.
In short (leaving out the parameters and options) it runs
find | sort | tr | sort | bash | merge_hardlinks| uniq | sort | cut | tr | bash | xargs | sort | uniq | cut | sort | tr | xargs | sort | uniq | cut | sort |tr | xargs | sort | uniq | cut | bash | sort
That's a lot of CPUs :-)
OK it's not a great example for CPU hungry programs. But the progress of the modern programming languages which tend to be monolythic beasts to do everything (perl, php, java) lead to programs not using pipes or other types of inter process communication because it's just cumbersome.
The pipe concept enables multi CPU programming without even thinking about how to put tasks on different processors.
Unfortunatly I have not found a language which sets such a simple concept as the fundamental programming principle.
See the unix shell, without the pipe you can't really do much.
Atari rules... ermm... ruled.
Silicon Graphics did systems with 1024 processors around 2002.
Running their unix and running Linux.
compilers to use them at 99% of peak exist,
programmers exist too,
The problem is the casual programmers have no clue.
Now Linux kernel can schedule 8 big programs and each uses a cpu (basic scheduling here). What it brings is you can be encoding a video for your ipod, playing a game, having a video conference, etc all at the same time.
Now a good programmer would do video ending in parallel, split a film in 8 segments and encode each in a separate thread. I want a program like this on an 8 core machine.
Plus I want fast memory and fast disk.
How about rewriting the standard libraries for many procedural languages (this includes OO languages, since OO is really just a style of procedural programming) to use multi-threading whenever appropriate? For instance, any array sorter should use a multi-threaded heapsort instead of a quicksort if the array is above a certain size. The program flow would still be procedural, and the average programmer would not have to deal with parallel programming very often, and the parallel specialists can handle the libraries where its needed. Of course this won't work for every circumstance, but it would be a great way to get the most out of the code we already have.
Most programs barely use any computational power, in fact there are very few programs that require all that computing power to operation and those are certainly well designed.
Home users do use some apps that could benefit from multiple cores. Video encoding is one of them, but that one is embarrassingly parallel because the encoder could just split the video into quadrants and have each of four cores work on one quadrant.
Or n-sections even where n = number of available cores. Why break into quadrants if you have 8 or 16 cores available?
Ummmm.....Grand Central anyone?
http://www.apple.com/macosx/snowleopard/
... how dare you not focus your efforts on the 0.0002% of your users who run your desktop app on their server big iron!
========
CINC, 4th Penguin Legion
Doing a multithreaded GUI is hard, and very prone to bugs, and on top of that there are quite a number of assumptions made by JS code that force monothreading.
But anyway, FF does multithreading, in fact in the 3.1+ you can spawn new worker thread from unpriviledged javascript; however that code won't be able to touch the DOM, only pass and receive JSON-encoded messages.
Personally, I run workloads which currently consist of many, many, many minimally threaded applications running on systems with lots of cores. However, if those applications suddenly became heavily threaded, I might have an entirely different expectation placed on me. Instead of giving each virtual machine a single thread, I'd have to give them a minimum of two, and instead of running dual quad-core servers, I'd be looking at quad quad-core servers.
I guess you could say that I can't have my cake and eat it too -- expect the manufacturers to keep spitting out higher core densities, and still expect my users to require no more computing power than they have currently. Yet, it would be nice that if I had twice the number of cores, that I could run twice as many applications, rather than only being able to run the same number of applications that have been "enhanced" to abuse more of my cores.
If Intel comes out with 8-core processors with hyperthreading, supports quad-socket motherboards, and actually has a decent memory bus behind that, I'll be happy for a while ;-)
Windows 7 will support 256 cores in the 64-bit version. Microsoft has made significant tweaks to the thread dispatcher code to make this possible. A good discussion can be found here: See http://channel9.msdn.com/shows/Going+Deep/Mark-Russinovich-Inside-Windows-7/
I sometimes despair at what I read here, sin(x), cos(x) are both very smooth, ie infinitely differentiable periodic functions, so why would it surprise you that interpolation off a table, spacing determined to give desired accuracy would not be quicker than the Taylor series, and since the function is periodic the table size is bounded.
Try that with any _ROUGH_ non periodic function and see where you get
eg 1/(1-e^x)
It wasn't C, if it was BCPL when David Barron developed the Transputer, at Southampton I would be surprised, but the world turns and now we have the AMD Hyperchannel and Infini-x
Three Cores for the MAC kings under the sky
Seven for the Windows-lords in their halls of stone
Nine for Linux users doomed to die virgins
One for the Dark Lord on his dark throne
In the Land of MS where the Shadows lie.
One Core to rule them all, One Core to find them,
One Core to bring them all and in the darkness bind them (with restrictive licensing)
In the Land of MS where the Shadows lie.
Sorry if this is straightforward to the hardcore programmers. I'm just a business programming sort of guy. Lots of lists and mailmerges.
One of most common tasks in web programming is
Couldn't a whole bunch of these be farmed off onto different processors?
Yay me!
So what the author is blathering and foaming about are problems found and solved 20+ years ago. Instead of programmers studying anything, the author should study some. NUMA has been in Linux for close to 10 years. It solves the memory bus problem. Multi-threaded applications solves the problem of using more than 1 core. I do it all the time. Did it yesterday, will likely do it tomorrow. Not every program takes advantage of multiple cores. Quite a few do. Those that scream the need for parallel computing use all of the cores (on my nehalem system it shows up as 8 cores). I do with authors would do the tiniest squeak of research before describing how the world is going to end. Oh well.
Knuth's maxim is sufficiently pithy to have become, over time, self referential, as evidenced by your misunderstanding.
The root of all evil used to be deep and singular, now it is broad and shallow. I guarantee you that Knuth did not include choosing the best fundamental algorithm under the label "premature" unless it involves squabbling over log log N terms or stray digits in the exponent term.
http://www.siam.org/pdf/news/174.pdf
An unpacked (deoptimized) version of Knuth's maxim is that the transition from program structure and notation which maximizes readability, comprehension, and conviction (concerning its correctness and merit) to one which favours performance should be delayed as long as possible. Ideally until performance becomes the sole remaining success factor.
(Taking into account the human mind's special capacity to imprint upon evil, Knuth's formulation remains the better one.)
Originally Knuth meant manually hoisting loop constant expressions (often in ways that later turn out to not be fully general) or manually evaluating constant expressions or manually fusing nested function calls and the kind of rot that a good compiler these days will do on your behalf. Anyone used the "register" keyword lately? Once upon a time it seemed like a good idea.
While the principle remains the same, the temptations have changed. Such as parallelizing a bad implementation of a poor algorithm in the misguided belief that the underlying task is not sequentially bound.
That said, projects which do *no* evil typically fail to impress anyone. The ideal is to wrap large amount of cleanly structured and accessible source code around a nugget of pure, smoldering evil, coked to the last clock cycle.
Perversely, the worst example of this is TeX itself. The smoldering nugget of pure evil is the single pass parsing regime and data packing eight bit character values.
I suspect the literature on parallel programming would roughly equal the literature on electro-chemical storage cells. Sheesh, if only those guys were paying attention, we'd have watch batteries powering small cities by now.
On second thought, how much literature could there really be if you can summon the majority of it onto your screen in 4/10'ths of a second for any combination of keywords?
Parallel programming is a lot like fuel cells. You get some pretty impressive results on selected applications involving pristine apparatus in a controlled setting, dating back to the Apollo program (in both cases).
Reality on the ground is rarely so forgiving.
If we hadn't already achieved a pixel processing speed-up between 1980 and 2008 best approximated by a sideways 8, Javascript wouldn't even have entered the conversation.
It boils down to this: ignoring everything you guys have already accomplished, you've pretty much done nothing. I worked for that kind of company once. The guy in charge put on a Cirque du Soleil of intestinal recursion. That's how I feel about the claim that software developers haven't been paying attention to parallelism for elephant years.
It's not really a problem. If you can't split a single task over multiple CPUS, you can just run multiple separate tasks(like erlang does).
- burning DVDS
- playing videos
- generating sweet fractals
- web browser(recently being more CPU intensive)
- Bit torrent
- BOINC
- Indexing files for searching later
- Rendering frames
- Compiling updates
- Compressing backups
- While still having enough spare resources to remain responsive.
...and that is all I have to say about that.
http://jessta.id.au
Optimizing applications for multiple threads is like unrolling loops. Programmers are writing logic not implementation, compilers should be taking care of implementing logic in a way that is optimal for the hardware.
Now, get back to me when -fuse_threads is a compiler option and implicit when choosing o3
It's really quite frustrating to see posts like this. Posts that dont take into account what is needed and focus on what we are incapable of doing - even when they dont need to.
So lets look at reality for second. First, most modern OS's scale very very far past 4 cpu's (not sure what windows scales to, but linux certainly has no limitation based on current cpu reality). So the kernels are just dandy for multi-core cpu's, bring it on! 128 cores, we're ready for ya!.
The same is not true at the application level, and that is a fair comment. But dont confuse linux and windows with their apps for crying out loud! From an application point of view we are capable of parallel coding, but its non-trivial. Its also not something we need alot of the time.
For instance, we now buy servers (our cheapest models) with dual cpu's and quad cores and we're tending to virtualise it up into several machines with 1 or 2 cpu's each. Now whether you do this because you assume the OS will utilise one cpu and the apps will utilise another (as one person told me is irrelavent). Surfice it to say, having 2 cpu's is usually quite nice.
But what requires more then that in reality? well, your desktop might - after all theres alot of things going on at once right? In some point cases, thats true (there are quite a number of very heavy applications out there, and supprise supprise, they can multitask *GASP*).
Same at the server, not many things require that many CPU's and even at the application level, we've gotten good at spreading heavily loaded applications across multiple servers (we call it load balancing, was that too sarcastic?). Take mail (weather its exchange or postfix or sendmail or whatever), or web servers, etc. Those server applications that do require heavy grunt tend to already be coded with "parallel" in mind, even across multiple servers (think oracle RAC).
As for cache contention - well it sounds like the hardware makers are finally fess'ing up to the fact they have a problem, Houston!
make -j bitches
optimized for as many cores as you want
But how many humans do you know that only run one or two windows on their PCs at the same time?
Your average XP user has 4 or 5 windows open at the same time.
At the moment, I have three windows open on this Windows XP machine: Firefox, Command Prompt, and Windows Explorer. All three are waiting for user input, including the Firefox window that I'm typing this post into (between keystrokes). If all the apps you have running in the background don't bring your load average above 1, you aren't likely to benefit from multicore optimization.
And also, through the API achieve reliability. For example, if one of the running copies of Windows crashes, the Apps with multiple worker threads keep some threads running, and they can detect and recover from that failure
Congratulations: you've reinvented VM/CMS. It'll work at least until the CP crashes.
This allows applications to work around issues like blue screens in certain OSes, or kernel panics in others
Blue screens and kernel panics are often caused by defective device drivers. Would your design run these in a VM?
so long as [applications] perform sufficient checkpointing
Third parties paying attention to checkpointing? NBL.
First and foremost, if they are going to go with more CPU's they might as well sort out the problem with the extra heat output.
On a hot day (say, 35C+ or 95F+), I wince when I try to run a multi-threaded application on my dual core machine (Intel Pentium-D 2.67 GHz); or some background process runs at the same time as my foreground process.
Why is this, you maybe asking? Well, it's because it sounds like old-style CD recording gone wrong. It starts off with a low sounding hum and gradually gets louder and louder at increasing pitch with seemingly no end.
It's fine and dandy on a normal or cool day, but unbearable on hot days. I just wonder how many other people have to use CPU limiters to play certain games for a few weeks a year. Alternatively, I realise I could have damaged the thermal paste over the processor when installing it (I have 3 standard fans inside a roomy case).
This OS will be the included OS before Apple starts selling 4- and 8-core consumer machines, and it will have been out long enough for developers to use Grand Central to leverage those cores.
"That would be a big change, but the efficiency and total throughput gained would be huge."
It would be except in secure environments were one has to guarantee that information can't cross boundaries even at the CPU level.
The real problems with exploiting parallelism are (a) a solution is needed since Moor's law has run into a brick wall, excepting major process improvements that the semi-manufacturers dont see, and (b) all current algorithm design going right back to 1948, John von Neumann, has been essentially serial. Threads, and multi-cores are an essentially serial solution to a parallel problem.
.. 1048576 will need new algorithms, for the first time in 60 years of computing and 500 of Mathematics.
Parallel computation, is very hard, see how many kernel (OS) developers we have v app. developers. This is because of problems related to timing and computational order. This produces problems with data sharing and correctness.
Then there is the problem space, in some problems, easily artificially constructed, then the next step depends on the completion of all earlier steps, and the solution cannot be parallelized eg Fibonnachi where you can show a trivial parallel decomposition that wins nothing. In other, and more interesting cases eg the Partial Differential Equations of Mathematical Physics, Routing, and Finite Element systems some to large amounts of parallelism may be possible, particularly will well thought out analysis that capitalizes on special features of the problem or the known solution, eg multimode solution and boundrey condition matching in Navier-Stokes or Elasticity.
The point is that this is at the algorithmic level, it is not about code optimization or other programming paradigms so the kool aid of we need a better tool chain or parallelizing compilers is hope, hand-waving and optimism and of course product support.
Dont get me wrong, I see this as a very good thing, seas of CPU (core = 1 CPU), will fully solve lots of problems, and improve robustness, but it will not help with many problems, 4-16 cores will be generally handy, but 1024
A last thought, when we get to ~1073741824 cores we may start to make progress with AI and need to worry about the Singularity.
Seriously, I'll never understand article after article about mulitple CPUs being wasted when I have 37 processes in Task Manager.
Peter predicted that you would "deliberately forget" creation 2000 years ago...
As has already been explained, Non-Sequential thinking is hard, you postulate double speed, BUT the producer thread, the app finished and handed of the buffer to the OS to send to the GPU, and you say it threads this. Well fine, so the threaded part can run on another core, but then hardware DMAs the data and waits for a GPU interrupt/done-queue ack so how does this speed things up on multicore. Not at all, someone has to set up the DMA and wait, not run, while it completes, so unless all cores are at 100% you have saved nothing, and created additional overhead spawning a new thread
Duh, Marketing Departments
I think I agree with you, BUT... don't fall into the old trap: If ten machines can do the job in 1 month, 1 machine can do the job in 10 months. But it doesn't necessarily follow that if one machine can do the job in 10 months, 10 machines can do the job in 1 month.
Also, the problem with runtime interpreters is not that they don't generate assembly code. The problem is that it is harder to get at the underlying code that is really executing. That code could be optimized if you could see it. But seeing it is just more difficult.
Behold, this dreamer cometh. Come now, and let us slay him... and we shall see what will become of his dreams.
The quoted paragraph in the SlashDot article. Does it appear in the InfoWorld article? I can't see it. The link goes to the article no problems, but where is this quote? Words like "blame" don't even appear!
Am I missing something? Is the link to InfoWorld incorrect?
The reason I wanted to read the original article was because the SlashDot teaser (quote) mentions Windows and Linux performance, but not Mac OS X, and I wanted to see if the original article mentioned that or not.
Help?
Problem? The development tools aren't available and research is only starting.
Nonsense. Here are a few couple of portable tools and libraries that will solve many developers problems.
http://www.threadingbuildingblocks.org/ (c++)
http://developers.sun.com/sunstudio/downloads/ssx/tha/tha_getting_started.html
Research is mature and ongoing.
Education, however, is only starting to reach the mainstream.
i wish i could stop
Insofar as the language proper is defined by the language standards, the I/O libraries are part of C, because they're specified in the ANSI C and C99 reports. Any conforming C implementation must have the standard I/O functions, and they must behave in the way the standard specifies. That differs quite a bit from the situation with networking libraries, which are third-party and not covered by the C standard.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
Most linux programs are small and do a small part. larger apps usually call down to smaller apps. Doesn't this in itself let the os balance the work across multiple processors? It seems that anything that is very intensive (like compiling or video conversion) the apps to make them run across many processors is already done. I have used gcc to compile across 6 diff machines at one point. Transcode uses all available processors. So does folding at home! The only thing I really think is missing is a software based GLX engine that can use all available cores. perhaps a higher end video card isn't needed for basic 3d anymore.
The fact that all we do is sequential tasks on our computer means we are still pretty stupid when it comes to "computing". If you look outside your CPU, you'll see the rest of the computers on this planet are massively parallel and do tons and tons of very complex operations far quicker than the computer running on either one of our desks.
Most of the computers on the planet are organic ones inside of critters of all shapes and sizes. I dont see those guys running around with some context-switching, mega-fast CPU, do you?**. All the critters I see are using parallel computers with each "core" being a rather slow set of neurons.
Basically, evolution of life on earth seems to suggest that the key to success is going parallel. Perhaps we should take the hint from nature.
** unless you count whatever the hell consciousness itself is... "thinking" seems to be single-threaded, but uses a bunch of interrupt hooks triggered by lord knows what running under the hood.
For HD DVD and Blu-ray authoring, the CineVision PSE system we designed for VC-1 used a hybrid spatial/temporal model.
First, the codec itself was 4-way threaded, encoding each 1920x1080 frame as four slices. Then the file was distributed across multiple blades, each processing a section of the video. Since this was for disc-authoring, we knew where chapters were going to be in advance, and so split by chapter; ideally you'd have at least 2x as many chapters as workers.
The key to avoiding the "chunk transitions" was aligning along chapters, since they almost always start at a scene change or a black frame, so it'd be easy to see the problem. Also, there is extensive 3rd pass support to manually tweak a transition that could go wrong. There was a fair amount of workflow that had to get baked in to get full advantage of the paralleization, like prepopulating each worker with the source during the 1st pass and keeping it cached for the 2nd and potentially 3rd passes.
Anyway, it works nicely; that product was used for 90% or so of HD DVD titles and about a third of Blu-ray titles so far. Last I heard, the record for a 2 hour movie encode was about 6 hours for 2 passes. I'm sure it'd be faster yet with more recent processors. That scaled up to 64-128 cores pretty well, given source chapters. With overlapping scene detection in the first pass, it could be scalable well beyond that for long-form content. Of course, with short content you're not so worried about end-to-end encoding time, but full throughput.
As suggested earlier, live streaming is that hard stuff, since you can't do significant temporal slicing without adding a whole lot of latency.
We have a similar kind of issue with Smooth Streaming for Silverlight, where we encode the same source in multiple bitrates, and need to make sure GOPs are aligned across all the data rates for seamless switching. For an example of that:
http://on10.net/blogs/benwagg/Behind-the-Scenes-at-SmoothHDcom-Encoding-Big-Buck-Bunny/
My video compression blog
Since I see little evidence that timothy or Mr. Chapman read the article, I'll do them a favor so they don't have to click:
< article paycheck="undeserved" >
Hi I'm Agam Shah and I'm writing an article about multicore processors, but these concepts are so new to me that I'm putting quotees around "race conditions" like it's frickin' sharks with lasers.
So then I did a Google search on "parallel programming tools" and it help me get another paragraph out of the way.
Oh, and I quote some lamer analyst who has never heard of NUMA or libhoard, so I'll try to fabricate some crisis that the problems they address might never be solved.
Parallel programming is hard, WAH! WAH!
Oh, except when it's not, as in that trivial application named Photoshop. I'll write one of those next weekend.
for the last 2.5 years. I frequently running multithreaded applications across all 8.
Some languages has existed as a bunch of thread for years, like erlang. And event-based designs almost completely solve this problem. Some things like xlib and glib still run as a big ugly loop but there are alternatives like xcb, that at least one desktop manager uses (awesome wm).
The two things that currently peg for me against a single core is firefox's unified javascript loop (this changes a bit in 3.1), and ffmpeg for high def video (multi-threaded is in the works). The fact people use alot of differnt programs at once and as most programs are not very demanding also makes this not that big of a problem. Few applications need single-thread programming (all i can think of is compressors 7zip, video, etc in their top-quality modes, and certain resource allocators), most things would never hit that single-processor head if they were written decently. I think its just a legacy application problem.
Programmers like myself were waiting for a clear direction in terms of language and compiler support for multi-core development, and of course multi-core debugging is a challenge.
Now we have quad processors from multiple vendors and there are plenty of choices for hardware, but there is still not a clear winner when it comes to development tools and methodology. Intel has a threaded toolbox, and beyond that we can roll our own. The only support I have seen that made me smile was the multi-core support in Python, which only exists in the more recent versions, and those versions are not ubiquitous yet.
It is really easy for Intel to unilaterally make a decision to stop processor development at 3GHz and put it on the programmers to reorganize their code in a parallel manner. It is something else again for each software engineer to choose how to do this and commit their clients to those decisions, and the fall out that will last the lifetime of this code. Companies that paid to migrate their applications to hyper threading only got to benefit for a year or two before the environment went away. I am frightened to make a decision today about multi-core that depends on Intel (and AMD) to keep multi-core stable far enough into the future to make development worthwhile.
It is fairly obvious at this point that multi-core is here to stay, but it will be nothing more than a way to sell more expensive hardware until the powers that be provide a cohesive set of tools and methodologies that make multi-core useable to address our current problems. A friend of mine told me of his experiments configuring a multicore Windows box for gaming using process affinities. He indicated that the Windows operating system used about 1.5 cores itself, which in the case of a dual core machine left about a half core for the game. My experience has shown that we have little control over the way tasks are assigned to specific cores, and multi-core seems to do more for the operating system and environment than the threads of a specific application After years of effort addressing this problem, it is still not clear to me which tools and methods will be the most stable over time. It looks to me that there has been very little progress on the software side in the last two years.
Assume we develop affordable 32 bit quantum computers. How does that change this parallelism problem?
Running with Linux for over 20 years!
Ok, so it points out a flaw with Windows 7 and Linux but completely fails to give the praise to the efforts that Apple is doing with Mac OSX and Snow Leopard!!! OSX is incorporating incredible efforts to leverage GPU and Multi-core solutions for developers. Ignoring these pieces is incredibly ignorant of the "personal computer" and "distributed computing" markets.
http://www.apple.com/macosx/snowleopard/
From an application standpoint, how is hyperthreading any different from multi-core?
I would agree that yout typical email client or word processor is not going to benefit much from multiple cores. Most business applications running non Windows aren't likely to need even more than two cores to get their work done. (I supposed one would be using the other two to run the anti-virus software to keep that OS reasonably healthy, eh?) But OSes like Linux tend to have users that are doing more of a variety of tasks simultaneously. They'll have an email client frequently checking for new mail, an audio player running, a windows where they're downloading patches or new source code onto their system, an editor window or two open, windows to other systems on the local network, a browser with multiple tabs being updated frequently, and who know what else. Can you run all the same applications simultaneuously on Windows? Maybe, though without multiple desktops it's unlikely. Alt-Tabbing though a list of multiple programs makes switching from one program to another incredibly clumsy so most people I know avoid running more than 2-3 applications at once making more than dual-core chips mostly overkill. If the extra cores are going to be useful at all in a business environment, I suspect it'll be to run a slew of additional tools used to enslave^Wmanage the desktops centrally. Servers may be a different story but I believe the extra cores would be used more advantagiously by Linux since the servers running it are, more often than not, tasked with running more than a single application at a time; something which Windows servers are still not asked to do in most situations.
CUR ALLOC 20195.....5804M
I have a dual quad-core Xeon server and it keeps all cores busy and is definitely faster than a single or dual-core system. Nothing fancy going on. I run several virtual machines and each of them runs normal software such as web servers.
At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
It's more a language problem. C(++) was never meant to run on systems with several processors. The programms are meant to be execute in a single thread of execution. If you actually want to use multiple processors it's quite hard to do.
Object oriented programming might solve some of the problems.
At least with Windows it is probably due to the fact that each window is connected to a message queue and message queues are thread owned resources.
That means that window messages are always exceeded the thread that created the window. If a synchronous message is send from one thread to another both threads rendezvous until the message executed.
If anything a multi-process design forces the programmer to use asynchronous messages.
Martin
Java does not support multithreading - java.lang.Theads a library function does. Have a look here:
http://en.wikibooks.org/wiki/Ada_Programming/Tasking
See, not a library but language keywords. Note that I a fluent and Java and Ada and have designed and implemented larger projects in both.
But there is a difference in the "ease of use" between a language feature and a library feature. That is unless you use a language like smalltalk where everything is library.
Think of how error prone printf is. If parameters and the little % stuff does not match all goes havoc.
And more so in multithreading. Here the bugs are often sporadic and extremely difficult to find. And a programming language which support it natively is a great help. See:
http://en.wikibooks.org/wiki/Ada_Programming/Tasking
So you know what I understand in "natively".
Great - there is still only one compiler to support "export" - almost 10 years after the standard was defined - and you speak about next standard. So when will the compiler we see the first compiler to support you new library?
Martin
PS: I know a language where all generics are "export" and all compiler support it - since 1983. So it is possible to implement.
The study data to date suggests "thinking" is an illusory, emergent property of all those parallel brain/body processes. There is no sequential state machine in there, but just the fiction of one in the introspective processes of the mind. In fact, our perceptual processes have been shown to reorder events and stimuli quite frequently; it is almost a necessity in that our complex parallel cognitive processes mask their own latencies so we can comprehend and act within a somewhat real-time world.
It is a burden of this illusion that we conceive of computation being serial and needing parallelization, when the physical reality is parallel and requires serialization only to better fit our limited comprehension.
Sure dev are able to cram more crap into spare cpu cycle. But looking at the trends now, single core is mostly the way to go with light OSes and optimized software ( netbook ) for the end user, as for more niche market that find a use in high performance computing (slashdot reader, scentist) I'm pretty sure that multiprocessor is a better way to go rather than multicore ( wich is a way to do cheap multiprocessor design anyway ). Anyway OSes HAL have been ready for a long time for SMP, but multicore broke the abstraction of SMP because of the shared resource on the die between the core. Anyway again the processor vendors create an offer, by putting faster processor, and software vendor put the needs by creating bloated software.
...in a few years when this replaces electronics as the standard method of switching for computation. There are working 30GHz photonic processors, and it won't be uncommon to see 10 times that in a CPU.
Most experienced developers (that use lower level languages like C/C++), do indeed know how to write multi-threaded applications, DESPITE the poor support for doing so by the compiler. This is usually done through threading libraries, rather than native language features supported by the language preprocessor and compiler and linker.
This is not actually the problem. The problem is that most applications simply don't need to be multi-threaded, and in fact adding threading frequently introduces more problems than it solves. Most multi-threaded applications would actually perform better as multi-instance applications. where each instance runs on a seperate core, in virtual isolation.
It seems to me that mutli-core systems facilitate this paradigm with almost no effort required by the developer. As it should be.
Lots of engineers have said "it's a compiler problem" and failed. Cydra, Tera, and most recently Itanium have all bet on compiler breakthroughs and lost. The problem is that most programs are not automatically parallelizable; expecting a compiler to rewrite a program so that it is more parallel is tantamount to expecting the compiler to be as smart as a human programmer. It may happen some day, but don't hold your breath waiting for it.
XMOS have been experimenting in this area already. Their language which is an extension to C supports code for parallel processing on multi core chips. See http://www.xmos.com/
That sounds a lot like hyperthreading: a CPU with multiple sets of registers. The idea was that when a thread of execution hit a latency stall, the CPU could still do useful work by switching to another thread. Since the CPU had several sets of registers, the switching was fast. Some models of the Pentium 4 had this capability.
Core utilization has nothing to do with how many threads and processes you have. It has to do with how many threads/processes you have which are active and compute bound from moment to moment. I have 137 processes (ps aux|wc) running in linux right now, and in toto they are consuming 0.8% of two cores (top).
100 tabs in Firefox should take as much cpu altogether as the one tab you are viewing. That this is not completely so, some of the background ones are animating CPU-sapping Adobe Flash that no one can see, is a design problem. Even so, I often have more than 100 tabs open with little effect on overall system performance other than Firefox's (and other browsers) absurdly gigantic memory usage.
How many programs do you run at once which are actually doing serious computing other than the one you are interacting with? Sure, there are times you are doing database jobs and such, but it isn't much for the typical desktop user.
Not until you've read the replies that have a clue.
If we are talking about technology... The Linux operating system (monolith kernel is the operating system) works great on CPU's what have more than 4 cores. If the article writer did not know, the Linux OS powers almost all supercomputers etc. The problem is that applications ain't developed to use so many threads etc. The OS just works fine but if the applications can not use multiple threads, you do not gain anything. If you do not run multiple instanses of them.
If we are talking about marketing lies and misinformation, the "operating system" (actually a _software system_) does not work at all, because usually this "operating system" can not use the multicore CPU's well. Who should we blame?
Serioysly, Linux just works on multicore CPU's but that is just an operating system. The software systems like Ubuntu, Fedora and Mandriva just ain't working so well.
I only know of one cray model that had Linux actually... Can I get some kind of citation from a trustworthy source on this, as I can't find it on Google?
Change is certain; progress is not obligatory.
Will that be implemented by the same vendors which implemented "export" in last 10 years?
I believe it when MSC++ and G++ have a fully working implementation.
Martin
Do I have an 8-core machine? No. Will I have one at some point in the future? Probably. Will I be happy if support has improved by then? Yes.
Seriously, until I have an 8-core machine I'd probably prefer other improvements (stability, for example) arrived before more efficient 8-core support. Also, given the problems with trying to program for too many cores, is it possibly fair to say that Intel are pushing the tech before the software is ready, or possibly even the wrong tech?
I saw talk earlier in the comments of instruction sets with inherent support of multiple cores. Wouldn't it be better to get something like that out, presumably some form of SIMD-like additions, before pushing the processors with >4 cores?
mysql> SELECT * FROM `places` WHERE `place` LIKE 'home`; Empty set (0.00 sec)
They're more plumbing-complex.
So you have to wait for the main pipes to be laid before you can start putting the utilities on the end (bath/bog/basin...) and to some extent they can be parallelised. But you're still going to have to wait until the basic pipework is ready. And that will run at the slow 1CPU scale. The bath/bog/basin in one room cannot all be installed if there's not the room for all the workers and the space they need to work. And you cannot really do them in another room, so you can't scale.
SMP? Drop it. AMP, Asymetric MultiProcessor is more worthy. Big CPU for most tasks, a few smaller CPU's to take on threads (or used instead of the big CPU if you're not running much), and a few dedicated processors (swapping versatility for power and simplicity).
Pure functional programming with only implicit parallelism (no message passing) might be relatively straightforward and it's true that parallelism is easier to extract automatically than with procedural languages ... but this only allows for a subset of parallel algorithms.
Transactional memory already allows for a little more.
With message passing (ie. Erlang) you finally have the full deal. Removing aliasing from the equation removes a lot of very nasty problems, but some remain. Deadlock, starvation, livelock (with priorities making all those problems more likely to occur too). In fact Erlang is really too lax a language to be automatically checked for those problems (mostly because of the use of asynchronous message passing).
Modern Occam is better in that regard, although I wouldn't say that makes parallel programming easy either ... it's just as good as it gets.
That's an application of Parkinson's law
"The problem my dear programmer, as you so elequently put, is one of choice.."
Seriously. I have been involved with software development from 8-bit pics to Cluster's spanning wans and everything in between for the past 20 years or so.
Multiprocessing involves coordination between the processes. It doesn't matter (too much) whether it's separate cores or separate silicon. On any given modern OS there are plenty of examples of multiprocessor execution: Hard drives each have a processor, video cards each have a processor, USB controllers have a processor. All of these work because there is a well-defined API between them and the OS - a.k.a device drivers. People that write good device drivers (and kernel code) understand how an OS works. This is not generally true of the broader developer population.
Developer's keep blaming the CPU manufactures' that it's their fault. It's not. What prevents parallel processing from becoming mainstream is the lack of a standard inter-process communications mechanism (at the language level) that abstracts a lot of the dirty little details that are needed. Once the mechanism is in place, then people will start using it. I am not referring to semaphores and mutexes. These are synchronization mechanisms, NOT (directly) communication mechanisms... I am not talking about queues either - too much leeway on their use. Sockets would be closer, but most people think of sockets for "network" applications. They should be thinking of them as "distributed applications". As in distrbuted across cores. As an example, Microsoft just recently started to demonstrate that they "get it" because with the next release of VS. It will have a messaging library.
choice:
At this time there are too many different ways to implement multi-threaded/multi-processor aware software. Each implementation has possible bugs - race conditions, lockups, priority inversion, etc. The choices need to be narrowed
Having a standard (language & OS) API is the key to providing a framework for developer's to use, yet still allowing them the freedom to customize for specific needs. So the OS needs an interface for setting CPU/core preferences and the language needs to provide the API. Once there is an API, developer's can "wrap their minds" around the concept and then things will "take off". As I stated previously, I prefer the "message box" mechansims simply because they port easily, are easy to understand and provide for a very loosely coupled interaction. All good tenants of a multi-threaded/multi-processor implementation.
Danger Will Robinson:
One thing that I fear is that once the concept catches on, it will be overused or abused. People will start writing threads and processes that don't do enough work to justify the overhead. Everyone who starts writing programs will "advertise" that it's "multi-threaded", as if this somehow automatically indicates quality and/or "better" software...Not.
I noticed in the latter versions of Java 6 - it takes care of multiple cores automatically - when executing a loop or something else intensive both cores are loaded almost equally. Well I have only 2 cores so I don't know if Java works as well on more than 2 cores.
I mean Java on Ubuntu, haven't tested on windows.
... because in such languages, multicore-usage is already included from the very beginning.
In Haskell, you have to explicitly state, that you do not want something to be spread to more than one core.
With the included total type safety and lazy evaluating, I call that a winner. :)
At least, if you do not want to program hardware directly.
Any sufficiently advanced intelligence is indistinguishable from stupidity.
I don't know about Firefox in particular, but many browsers slow or stop Flash in hidden tabs. So you'd have to split those tabs into windows and tile them across the screen to get your CPU working harder.
Is this really a concern? How many people are tapping out their CPU? Honestly, 95% of people will never actively use more than 75% of their dual core 2.0 GHz CPU's RAM has and will be the limiting factor on most PC's.
Also, on the redhat servers I admin, we don't seem to have much trouble with 4x4 CPU's. Are people really saying there is a difference between 16 procs and 4 quad cores? As the OS sees them...
Odd to even be concerned...
Two 2.26GHz Quad-Core Intel Xeon "Nehalem" processors
6GB (six 1GB) memory
640GB hard drive 1
18x double-layer SuperDrive
NVIDIA GeForce GT 120 with 512MB
Ships: Within 24hrs
Free Shipping
$3,299.00
Taken directly from the Apple Store Mac Pro section
There are two Quad core CPU's inside the 8 core systems. Check before you post and criticize. Also, the two dual core system mentioned was just an example.
"Better to keep silent and look the fool, rather than speak and remove all doubt"
The article doesn't mention Win7 or Linux whoevever wrote the slashdot headline invented that part.
Take a look at top500.org and tell me Linux can't handle more than 4 cores. I don't know much about Win7 but I doubt it will have a problem either.
True, that many _applications_ don't thread well, but that has nothing to do with the OS.
I expect a perfomance increase when I get my 8-way cpu. Especially in situations where my 2 cores are maxed out now.
This stupid generalization does not take into account that not all people use a computer in the same way.
"Windows and Linux aren't designed for PCs beyond quad-core chips" - Flat wrong, not in the article, and misinformation.
I call shenangians on this article.
Clearly, more applications need to be (correctly) multi-threaded. I'm not talking about World of Warcraft or CMU calculation projects here, but more common applications like IE, Office, etc. As polished as Microsoft software is (shielding head against thrown fruit), often the user is still forced to wait while UI rendering is waiting on some other task (Outlook you fat slow pig). Every time the Visual Studio IDE turns white while loading my project, every time Outlook is half rendered and has locked all my input devices, every time some office app "appears" to be idle, yet is locking my mouse (AAARRRGH!), I am reminded of how little (or poorly written) multi-threading there is for mainline software. I assure my boss that my cubicle produces more than just profanity and desk banging.... I have noticed that Mac software appears to be quite a bit better in this regard (shielding head from raging mac haters from earlier posts), as I am not often pounding my fist on the table while using my Mac. I'm not sure if a better architecture, or more "thread aware" programming is the cause.
According to Mark Russinovich "Technical Fellow and Windows Kernel guru", the dispatch scheduler in Windows 7 was reworked to support up to 256 cores (with logical processor groups). Skip to 8:45 for the details.
http://channel9.msdn.com/shows/Going+Deep/Mark-Russinovich-Inside-Windows-7/
Windows and Linux aren't designed for PCs beyond quad-core chips [CC], and programmers are to blame for that.
Developers are not the problem. The problem lies further upstream with whomever is creating the functional and technical requirements. Developers develop against those requirements, and if there wasn't a specification for 8 cores, then don't expect it.
What the article is suggesting is that we implement some sort of car-sharing initiative, we stop taking so many cars to the same destination. Or a bus!
But everything's already being transferred on a bus!
Those aren't the only language tools that can do that (older ones can as well):
C & C++ + Borland Delphi compilers have had access to the CreateThread API -> http://msdn.microsoft.com/en-us/library/ms682453(VS.85).aspx call since they & the Win32 API came out!
(&/or, even
SetProcessorAffinity -> http://msdn.microsoft.com/en-us/library/microsoft.xna.net_cf.system.threading.thread.setprocessoraffinity.aspx
+
SetThreadAffinity -> http://www.delphipraxis.net/topic134206.html
Win32 API calls & that's ALL a body needs (in addition to actual tasks to "spread around" available physical CPUs &/or multiple cores present, once they are detected for, IF you want to go about this manually that is)).
You CAN do this yourself, OR, let the OS process scheduler kernel subsystem component do that for you, your choice...
HOWEVER: Just by coding with multiple threads, you CAN just let the OS process scheduler kernel subsystem take care of it, for you, just by allowing the OS to wait until one of the processors or cores present become fully saturated, & then, it will send other child threads of a parent process to the least saturated CPU cores present.
E.G.-> The OS' process scheduler subsystem in Microsoft's Windows NT-based OS family (Windows NT 3.5x- 4.x, 2000/XP/Server 2003/VISTA/Server 2008/Windows 7) is aware of how many threads (smallest atomic unit of execution on Microsoft OS') an application has &, that is all it needs!
I.E.-> Even taskmgr.exe can show anyone that much, as to how many independent threads of execution an app has...
(The OS & its process scheduler core/kernel component subsystem has to know how many there are in order to send threads of execution that an application has across the least saturated CPU (physical, or core) present, assuming the other CPU's present are @ or nearing 100% cpu cycles saturation).
APK
P.S.=> This is done by the OS, & for ANY multithreaded application, & no "SetProcessAffinity" type API calls (explicit multithreaded apps that do all the checks for CPU's present, & schedule their own thread executions across them as needed) required... multiple threads of execution designed apps (that use what I call "implicit multithreaded design" that use multiple threads) are really all that is required here (though you can do the processor detections yourself, routines abound galore online for this if needed & then send the threads you have to diff. CPUs/cores yourself, manually, as noted above IF need be)...
Well, that's "all you need", & GOOD logic (PLUS, applications that actually require more than a single thread to do a particular job, & NOT ALL DO, & that is "part of the problem", because not all do & many others note it here)
Imo? Well - the article is misleading (it's more about the apps riding on the OS, & not the OS) - however, I have 17 processes running here, and not a single one is single threaded (Dual Core CPU @ present as I look @ this here)...
I use Windows Server OS', which can use 1-8 cores outta the box ->
CPU and memory scalability for Exchange Server 2003 and for Exchange 2000 Server
http://support.microsoft.com/kb/827281
(Read carefully to the bottom & it lists what OS' it applies to & the article deals in this w/ Exchange Server)
Windows Server 2003 &/or Windows Server 2008 can use up to 8 CPUs/cores, outta the box, & install as "WorkStation/Pro" models by default (meaning you can install server-class/back-office class apps like IIS later on, IF ever needed onto them)... apk
Linux runs well
Every time I read one of these 'boo hoo more cores don't make things fasterer' stories I find it strange, since the problem domains with which I'm familiar -- Image Processing and Audio Software -- can and do already take advantage of multiprocessing.
In the audio world, you're pushing samples through a directed graph from inputs to outputs, and it's unambiguous to split the processing into threads that can keep the CPU fairly busy.
In Image Processing, and particularly in the Insight Toolkit that I work with daily, image filters are written to run separate threads on regions of the images. It isn't even particularly hard for most tasks, that iterate through a pixel at a time, requiring only read-only access to an input image.
And for software development, where you run builds and rebuilds all day, make -j 8 makes a hell of a difference in how long you wait to do something.
Computer games could really use more cores as well, because the view on screen has the same property as most image processing -- each pixel on screen is an independent computation. If you do parallel ray tracing, doubling the cores can nearly double the frame rate. That's why hardcore gamers pay the big bucks for multi-card solutions -- the graphics cards are rendering in parallel.
Now if you're talking about a spreadsheet or a web browser, it's hard to see the benefit. That's why so many people buy pokey little Atom netbooks -- nothing they do would have taxed a 1GHZ PIII ten years ago particularly.
Unfortunately, this effect just makes the programmers job worse. It means that if he can only get the complexity estimate to within a factor of 100 for CPU usage, by the time Amdahl's law is done, his estimate will only good within a factor of 1000. To me, this screams, if you really need multi-core capability, you probably need a cluster too.
How likely is it that if a programmer shows a user some code, and the feedback is the code is too slow, that the user will be satisfied with a 2:1 or a 4:1 speedup?
Bull. They scale nicely up to about 16 cores. The problem you are going to have on Windows is Licensing, which for XP/SP3 and Vista allows up to 4 cores.
On Linux, you can rebuild the kernel with a few mods. Unless you are building a cluster and need a fiber backbone, it is not an issue.
As far as "most programs not ready...", that may be true, but as mentioned elsewhere here, it isn't needed for a lot of applications. Learn pthreads (if you're using C/C++) and you'll be fine, or java threads if you are doing Java.
Most supercomputers these days aren't single machines, they're clusters. Google "beowulf" for examples. See http://www.cbronline.com/news/linux_x86_clusters_take_over_top_500_supercomputer_ranking, they noticed the trend back in 2004.
2:1 is probably only just noticeable, assuming it isn't an actual timed test. Anything that highly depends on user responsiveness (i.e. gaming and simulations) needs pretty dramatic pickups before the user will categorically agree it is better.
Even the more reasonable ones will want a "meaningful" increase. So the time saved has to be enough that they could do something useful with it. i.e. shoot another bad guy, beat the market to a good deal, go out for a smoke break, get home an hour earlier, etc.
Behold, this dreamer cometh. Come now, and let us slay him... and we shall see what will become of his dreams.
Well, who said that every application under the sun must be heavily multi-threaded or spawning multiple processes? Where's the need for a email client to spawn 8 or 16 threads? Will my address book be any better if it spans a bunch of processes?
Have you ever used Outlook or Thunderbird when accessing multiple IMAP accounts. No amount of cores will make that tolerable.
Twenty years ago processors were slow, but some UNIX boxes had more than one. Where this was the case, pipes and named pipes could be used to keep more than one CPU busy. Such techniques were often used to for linking troff, eqn, etc. The skill required was not much more than the ability to break a task down into large sized units that could work independently. Of course not all tasks are amenable to such an approach, but many are.
I run a 2.4GHz quad-core, on 4GB ECC ( Phenom II ), in openSUSE 11.0.
Firefox ENDLESSLY freezes.
Flash video *almost* ALWAYS fails to play back smoothly.
Typing into ANY web-form *always* involves having the typing appear intermittently-later than what I'm typing.
I tried running KSysGuard, to discover what the hell's going on, and found out that
a) you have to tell KSysGuard that your scale, for
CPU0-sys
CPU1-sys
CPU2-sys
CPU3-sys
CPU0-user
CPU1-user
CPU2-user
CPU3-user
CPU0-nice
CPU1-nice
CPU2-nice
CPU3-nice
is *400%*, because that's how it thinks.
( 100% of each Core, * 4 cores ).
Stupidly, I'd assumed that 100% would mean 100% of CPU being used...
WHEN I figured that out, though ( by re-compiling the kernel, with "make -j5" ), then
b) I SAW that Firefox *never* uses a second core, except for during startup?
Why the hell can't it send plugins to a separate core?
Why the hell can't it send tab-open to a separate core?
Obviously, I'm not a coder, but to have a 2.4GHz quad-core proc *stuttering* on a BROWSER that has a few windows/tabs open, EVERY TIME I open a new tab, every time I view a video, etc, seems incompetent.
It's like enforcing 1-wheel drive, in a crew-cab pickup-truck!