Windows and Linux Not Well Prepared For Multicore Chips
Mike Chapman points out this InfoWorld article, according to which you shouldn't immediately expect much in the way of performance gains from Windows 7 (or Linux) from eight-core chips that come out from Intel this year. "For systems going beyond quad-core chips, the performance may actually drop beyond quad-core chips. Why? Windows and Linux aren't designed for PCs beyond quad-core chips, and programmers are to blame for that. Developers still write programs for single-core chips and need the tools necessary to break up tasks over multiple cores. Problem? The development tools aren't available and research is only starting."
Give us a year maybe two.
http://www.infoworld.com/archives/emailPrint.jsp?R=printThis&A=/article/09/03/20/Multicore_chips_pose_next_big_challenge_for_industry_1.html
So basically yet another tech writer finds out that a huge number of applications are still single threaded, and that it will be a while before we have applications that can take advantage of the cores that the OS isn't actively using at the moment. Well, assuming you're running a desktop and not a server.
This isn't a performance issue with regards to Windows or Linux, they're quite adept at handling multiple cores. They just don't need that much themselves and the applications run these days, individually, don't need much more than that either.
So yes, applications need parallelization. The tools for it are rudimentary at best. We know this. Nothing to see here.
Is this just me, or is this a classic piece of non-news on a par with the one the post subject is in reference to?
I mean, isn't it a typical and completely rational technological modus operandi that hardware developments come first and software implementations take some time to emerge (with the possible exception of specialized applications)
I mean, imagine software being developed for imaginary or speculatory hardware. Sounds like a big waste of time to me...
Is TFA talking about the Linux or Windows thread and scheduling not good enough for 4+ cores (so your programs no matter how good designed will not benefit from more cores), about being damn hard to split, thread and join tasks, or both?
Multiple virtual machines on the same piece of metal, with a workstation hypervisor, and intelligent balancing of apps between backends.
Multiple OSes sharing the same cores. Multiple apps running on the different OSes, and working together.
Which can also be used to provide fault tolerance... if one of the worker apps fails, or even one of the OSes fails, your processor capability is reduced, a worker app in a different OS takes over, use checkpointing procedures, and shared state, so the apps don't even lose data.
You should even be able to shutdown a virtual OS for windows updates without impact, if the apps that arise get designed properly...
...programmers are to blame for that
The development tools aren't available and research is only starting."
Stupid programmers! Not able to develop software without the tools! In my day we wrote our own tools - in the snow, uphill, both ways! We didn't need no stink'n vendor to do it for us - and we liked it that way!
Firstly, it's false on the face of it: Ubuntu is certified on Sun T2000, a 32-thread and Canonical is supporting it.
Secondly. it's the same FUD as we heard from uniprocessor manufacturers when multiprocessors first came out: this new "symmetrical multiprocessing" stuff will never work, it'll bottleneck on locks.
The real problem is that some programs are indeed badly written. In most cases, you just run lots of individual instances of them. Others, for grid, are well-written, and scale wonderfully.
The ones in the middle are the problem, as they need to coordinate to some degree, and don't do that well. It's a research area in computer science, and one of the interesting areas is in transactional memory.
That's what the folks at the Multicore Expo are worried about: Linux itself is fine, and has been for a while.
--dave
davecb@spamcop.net
The article doesn't really say that Windows and Linux aren't "designed" for quad+ core chips; it just says that most software is still single threaded. No kidding.
Languages like PHP/Perl, as a rule, are not designed for threading - at ALL. This makes multi-core performance a non-starter. Sure, you can run more INSTANCES of the language with multiple cores, but you can't get any single instance of a script to run any faster than what a single core can do.
I have, so, so, SOOOO many times wished I could split a PHP script into threads, but it's just not there. The closest you can get is with (heavy, slow, painful) forking and multiprocess communication through sockets or (worse) shared memory.
Truth be told, there's a whole rash of security issues through race conditions that we'll soon have crawling out of nearly every pore as the development community slowly digests multi-threaded applications (for real!) in the newly commoditized multi-CPU environment.
I have no problem with your religion until you decide it's reason to deprive others of the truth.
"The development tools aren't available and research is only starting"
Hardly. Erlang's been around 20 years. Newer languages like Scala, Clojure, and F# all have strong concurrency. Haskell has had a lot of recent effort in concurrency (www.haskell.org/~simonmar/papers/multicore-ghc.pdf).
If you prefer books there's: Patterns for Parallel Programming, the Art of Multiprocessor Programming, and Java Concurrency in Practice, to name a few.
All of these are available now, and some have been available for years.
The problem isn't that tools aren't available, it's that the programmers aren't preparing themselves and haven't embraced the right tools.
Too bad BeOS died. One of the axioms the developers had was 'the machine is a multi processor machine', and everything was built to support that.
Seems like they were 15 years ahead of their time. But, on the other hand, too late to establish an other OS in a saturated market. Pity, really.
get a mac..
I assume you're talking about Mac OS X 10.6 (Snow Leopard), whose Grand Central framework is supposed to add some tools to make Mac-exclusive multithreaded apps easier to program.
Yes, some problems lend themselves very well to multicore designs. Many others do not. Just because they are building multicore ships does not mean that multicore is the right answer. Current multicore designs have too small cache, and too slow memory bandwidth. If my problem is CPU bound, multicore can be a solution. If my problem is memory access bound, multicore is only going to make it worse.
"To those who are overly cautious, everything is impossible. "
The idea that every program needs to support threading is kinda stupid. Most programs barely use any computational power, in fact there are very few programs that require all that computing power to operation and those are certainly well designed.
did you forget to take your meds?
imagine software being developed for imaginary or speculatory hardware.
I think Sun called it "Java". It was run on emulators long before ARM and others came out with hardware-assisted JVMs such as Jazelle.
Maybe I'm just not a multicore user. Ever thought of that?
The quote presented in the summary is nowhere to be found in the linked article. To make matters worse, the summary claims that linux and windows aren't designed for multicore computers but the linked article only claims that some applications are not designed to be multi-threaded or running multiple processes. Well, who said that every application under the sun must be heavily multi-threaded or spawning multiple processes? Where's the need for a email client to spawn 8 or 16 threads? Will my address book be any better if it spans a bunch of processes?
The article is bad and timothy should feel bad. Why is he still responsible for any news being posted on slashdot?
Who would have ever guessed that most software is single-threaded rather then multi-threaded, and the programmers of Linux and Windows don't really feel like optimizing everything for 8-core CPUs that won't be released for quite some time and won't end up in the average user's box for 3 or more years.
Taxation is legalized theft, no more, no less.
Multiple virtual machines on the same piece of metal, with a workstation hypervisor, and intelligent balancing of apps between backends.
But with how many apps can one user interact? I understood the article to be referring to desktop applications, not server applications. In a desktop environment, most applications spend much of their time waiting for an event. For example, a virus scanner blocks until a file is modified or a removable medium is mounted. Or are you envisioning connecting four terminals to one desktop PC and binding one virtual machine to each terminal?
The /. summary of TFA is almost exquisitely bad. It's not Window or Linux that's not ready for multicore (as both have supported multi-processor machines for on the order of a decade or more), but rather the userspace applications that aren't ready. The reason is simple: Parallel programming is rather hard, and historically most ISVs have haven't wanted to invest in it because they could rely on the processors getting faster every year or two... but no longer.
One area where I disagree with TFA is the claimed paucity of programming models and tools. Virtually every OS out there supports some kind of concurrent programming model, and often more than one depending on what language is used -- pthreads, Win32 threads, Java threads, OpenMP, MPI or Global Arrays on the high end, etc. Most debuggers (even gdb) also support debugging threaded programs, and if those don't have enough heft, there's always Totalview. The problem is that most ISVs have studiously avoided using any of these except when given no other choice.
--t
"My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac
Developers still write programs for single-core chips and need the tools necessary to break up tasks over multiple cores.
So what? If I had a 32 core system, at least each running process (even if single-threaded) could have a core just for itself. Only a few basic applications (such as a browser) really need to be designed for multiples threads.
Most programs barely use any computational power, in fact there are very few programs that require all that computing power to operation and those are certainly well designed.
Home users do use some apps that could benefit from multiple cores. Video encoding is one of them, but that one is embarrassingly parallel because the encoder could just split the video into quadrants and have each of four cores work on one quadrant.
Hey, at least we aren't dealing with the lovely world of Cyrix anymore... those were truly fun times with respect to compiler optimizations (or lack thereof, as it turned out). That and the, um, heat "issues."
512 MB RAM, 20 GB disk, 200 GB transfer, five datacenters. $19.95/month.
When I was studying Comp Sci, I recall that most assignments were to 'understand the concept' and program a solution.
Usually the programs were single-threaded. Maybe a section of a course was on concurrency (mutexes, threading), but not an entire course or courses.
As multi-core becomes more the norm, then perhaps there can be an entire course on concurrency and how to design/program with this thinking in mind.
Uh, Linux geek since 1999.
Where's the need for a email client to spawn 8 or 16 threads?
Message classification. An e-mail client could open a process for each message, and the process would analyze the message to see what labels (spam, work, personal, etc.) belong on the message. If you get a lot of mail, I imagine that classifying several hundred downloaded messages might take a while.
This is a problem, but one specific only to certain programs. Pull up task manager, and take a look at the processes list. Odds are unless you're running something big in the back ground, you won't see any process taking up more than 50% CPU on your dual-core, or 25% CPU on your quad core. In fact, odds are none will be even close to that.
Multi-threading can offer little speed increase there (there is theoretically some as code is executed simultaneously, but it's negligible and probably unnoticeable); its value is only truly seen is when a program can actually make use of more processor power than any single core has. Video conversion is a good example -- on my dual core at home, most of my video conversion tools hit 50% CPU and run at that until done. It's programs like this that can take advantage of multi-threading and therefore having access to more raw processing power at once (double, in fact).
I agree that it would be nice to see more tools out there to add ease to coding for multi-core processors, and to see those few, CPU intense programs suddenly see double the processing power. But given that only a very specific selection of software requires it, and moreover a lot of the time that is not software the "average joe" would be using, it's probably just not vital enough to hit the priority lists yet (especially given that there are a few programs out there that do successfully implement multi-threading, and others that mimic it to a lesser extent).
What good are multiple cores and threads when you are running event driven GUI application?
Mozilla Firefox is an event-driven GUI application. But if I open a page in a new tab, a big reflow or JavaScript run in that page can freeze the page I'm looking at. You can see this yourself: open this page in multiple tabs, and then try to scroll the foreground page. If Firefox used a thread or process per page like Google Chrome does, the operating system would take care of this. Other applications need to spawn threads when calling an API that blocks, such as gethostbyname() or getaddrinfo(); otherwise, the part of the program that interacts with the user will freeze. But these are the kind of threads that are useful even on a single core, not multicore-specific optimizations.
Seriously, no one has brought up functional programming, LISP, Scala or Erlang? When you use functional programming, no data changes and so each call can happen on another thread, with the main thread blocking when (& not before) it needs the return value. In particular, Erlang and Scala are specifically designed to make the most of multiple cores/processors/machines.
See also map-reduce and multiprocessor database techniques like BSD and CouchDB (http://books.couchdb.org/relax/eventual-consistency).
"... and research is only starting."
Hmmm... I remember people doing research on this subject at the University of Illinois when I was a graduate student there in the 1980s.
If you spend more time assigning blame than you do describing the problem, then clearly you don't have anything insightful to say.
"With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea...."
RFC 1925
However, there is an issue of overhead with switching, and it seems like running specific processes on specific cores would do enough to help here. I don't see why the average application needs run on more thane one core. It seems like the OS can assign a core a process, and there would no issue beyond the current multithreading.
Now, like the stuff written for the cray, there is some applications could take advantage of the parallel processing, but I don't see a general need for this. It would be like the original Mac where certain processes weree shifted to the graphics processor by the OS. Not that programs are not going to written differently, but this will happen over time. DOS applications did not become full fledged window applications over night.
"She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
'In fact, TFA doesn't even use the words "Linux" or "Windows."'
Yup. There may be a reason for that too.
The initial SMP support was added to Linux 1.3.42 on 15 Nov 1995. Linux is clearly well adapted to multicore CPUs. That is one of the reasons why Linux dominates over Windows on www.top500.org. The other argument is cost.
If you don't believe me, pull out a profiler and run it on one of your programs, it will show you where things can be easily sped up.
Now, given that the performance of most programs is not processor bound
That's a pretty big leap, I think.
Yes a lot of todays apps are more user bound than anything. But there are plenty of real-world apps that people use that are still pretty processor bound - Photoshop, and image processing in general is a big one. So can be video, which starts out disk bound but is heavily processor bound as you apply effects.
Even Javascript apps are processor bound, hence Chrome...
So there's still a big need for understanding how to take advantage of more cores - because chips aren't really getting faster these days so much as more cores are being added.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Being able to burn a DVD while encoding a video and playing some game all at the same time, is something that benefits from extra cores and does not require the apps themselves to know about the cores. Of course, this is not the most common situation - Perhaps the IT world is starting to realize home/office don't really need as much power? Even since 5 years ago MS was the only force driving us to require more power upgrades, but now with even them focusing for performance in windows 7, perhaps it is going to be the "year of Moore's law no longer relevant in the desktop"
Copyright infringement is "piracy" in the same way DRM is "consumer rape"
So here we are at the multiprocessing dilemma again. The summary gets it all wrong. It is referring to operating systems, which are fine with this kind of stuff. UNIX and derivate (Linux) systems were fine with multiprocessing for decades. Most of the big irons in the top 500 are running multi-core just fine. Even Windows got the hang out of it lately (I guess).
The problem is, that most application developers did not learn to wrap their minds around the multiprocessing paradigm. No tool can magically design your single threaded application to work multi-threaded. The developer needs to analyze the program flow, export computationally expensive operations to separate threads and manage to get a good junction control (locks, balanced threading). It's a design paradigm that has to be learned.
Problem is, that you can't get developers that are not used to the idea of multiprocessing paradigms to switch. Another problem is, that exactly this group of people is also teaching the new generation, so it is not going to change that fast either.
It's a bit like a chicken and egg problem: until there is no large distribution of multi-core systems, no one will have the urge to switch. So that's why it is a good thing that this new CPU's get out. Once they are there, developers will derive the need to utilize them to stay competitive. Kind of like natural selection and adoption of new environments.
Isn't exactly rocket science (well, except if you are writing a rocket guidance system).
He says Windows and Linux but spuriously leaves out Mac?
mac suffers from the same damn problem. The OS and most apps weren't written for multi-core processors.
that's why any true multi-core app is distributed. hence rendering farms, not rendering server.
They're using their grammar skills there.
You need to establish/prove purity to the compiler so it can actually make use of it.
Lisp, Scala and Erlang don't have that property.
Haskell does.
Haskell and other pure languages are where the future of parallelism might lie.
I've been programming for 30 years or so, and I've been feeling ashamed. I've been feeling like I've done something wrong and that I haven't structured my programs right. That if only I was smart enough I would be able to take advantage of these multicore systems.
But I think I'm feeling better about myself. If I write rational multithreaded programs and use scalable patterns like producer / consumer, then I'll be pretty much ready to go.
And it seems like a lot of this isn't really relevant for desktop applications. I mean, there's some amount of keeping the main event thread moving so that your application is responsive, and you do time consuming operations on separate threads. But the only time I've really used a whole lot of threading is in server apps where you have a whole bunch of incoming connections that you're processing concurrently.
I understand that there is a branch of computer science that surrounds parallel computing, and there are some applications that might benefit from this (image processing being the canonical example). But I think it's another tool in the toolbox. Another way to approach a problem like map / reduce or whatever is in vogue. Some problems will benefit from being solved this way. Some won't. Use the right tool.
And I don't understand why we need to beat the drum for more efficient use of multicore. It's cool, we'll figure out what to do with all these cores. And then we'll put that in our toolbox and use it when appropriate.
So you're watching a movie and writing a Slashdot comment. How many other things can you do at once that would require a core? Even if you have 30 other processes open, most of them would just be waiting for input. There's a limit on the tasks that a single human being can care about at once, and Microsoft doesn't appear ready to bring terminal servers to the home market.
Part of the problem is that tools do very little to help break programs down into parallelizable tasks. That has to be done by the programmer, they have to take a completely different view of the problem and the methods to be used to solve it. Tools can't help them select algorithms and data structures. One good book related to this was one called something like "Zen of Assembly-Language Optimization". One exercise in it went through a long, detailed process of optimizing a program, going all the way down to hand-coding highly-bummed inner loops in assembly. And it then proceeded to show how a simple program written in interpreted BASIC(!) could completely blow away that hand-optimized assembly-language just by using a more efficient algorithm. Something similar applies to multi-threaded programming: all the tools in the world can't help you much if you've selected an essentially single-threaded approach to the problem. They can help you squeeze out fractional improvements, but to really gain anything you need to put the tools down, step back and select a different approach, one that's inherently parallelizable. And by doing that, without using any tools at all, you'll make more gains than any tool could have given you. Then you can start applying the tools to squeeze even more out, but you have to do the hard skull-sweat first.
And the basic problem is that schools don't teach how to parallelize problems. It's hard, and not everybody can wrap their brain around the concept, so teachers leave it as a 1-week "Oh, and you can theoretically do this, now let's move on to the next subject." thing.
"And I don't understand why we need to beat the drum for more efficient use of multicore."
Huh? It is really simple. Because the industry wish to perpetuate a need for new products, whether we need them for the moment or not.
In the meantime, maybe some dude may discover the next killer-application which could actually harness the power at hand.
Very few pc programs can make the latest quad cores crawl. They typically handle anything you throw at them. Even most 3D games are swallowed.
So, accept it. The progress is there. If not for the need, so at least because of marketing and market shares.
Who in their right mind would by an inferior product, e.g. a CPU, if the competitor was cheaper and faster and consumed less power?
.
I believe Boost supports multithreading, and is considered a semi-standard for C++ development these days; in fact, I understand that the next version of the C++ standard will incorporate a number of libraries from Boost. Not sure if the threading library is one of them, though.
In the 80's Fortran, which stayed alive and healthy by working in the vector processor communit, got all sorts of instructions that are naturally out-of-order block processes. For example, for-loop and where-loop declarations that say the loop counter or loop array can be interated in any order. It has matrix parallel operation declarations.
Sun's fortran variant Fortress (sort of Java meets fortran) is designed from the start for thread safety so operations don't explicitly have to lock and unlock before expression.
And the new PGI fortran compiler has all sorts of compiler directives for automatic parallelization.
Some drink at the fountain of knowledge. Others just gargle.
Intel releases a new processor this year and the author is surprised that existing software applications aren't immediately taking advantage of it? This isn't a matter of changing a compiler setting or modifying a few methods, parallel computing requires major refactoring and fundamental redesign. And how are Windows 7 and Linux not well prepared? The development tools and applications aren't prepared, not the operating systems.
I disbelieve this entirely. UNIX/Linux is well designed for multiple core CPUs. Just take the whole single program, single small job approach of a pipeline command and you have your multicore solution ready. Programs that can make use of tasks that are IO bound are frequently written with threading in mind. qmail/apache are both well written for mutliple core CPUs. I don't see what the article is trying to say. Its clearly wrong.
Why UNIX?
I think that linux has been used successfully in massive multiprocessor computers (unless most of top 500 computers are mostly single processor ones).
In a desktop pc, the OS will take care of the multiple cpus to run the different apps, unless you are talking about heavily cpu intensive apps, and yes, you can put blame on those specific apps (at least for linux most apps arent OS specific)... but not in the OS.
I am, unfortunately, not an expert in functional languages. I do remember that LISP isn't pure functional.
The main point still stands - functional languages do already address this issue. You're absolutely right that LISP doesn't do all it needs to out of the box to address the issue properly.
I honestly have no idea if Erlang, Scala or Haskell do allow the compiler to identify pure functional calls, although I tend to believe the other AC response that Haskell, at least, does.
The tools has been here for some 10 years now, multithreading has existed a reaaally long time now, documentation was still lacking in late 90s, but running multiple threads is child's play now.
Like someone else stated, mostly programs aren't CPU bound, they spend most of their time waiting for data from HDD etc.
Applications benefitting from multiple cores have been multithreaded, or a lot of them. It's not a software paradigm limiting scalability.
Furthermore Windows 7 is MORE than capable of handling 8 cores, infact, Windows 7 probably starts to shine at 16 cores with it's SMP capabilities. Microsoft spent A LOT of time making sure there's that kind of scalability on Windows kernel.
I can't express enough how misinformed TFA writer is, and how clueless and ignorant he is. I'm SHOCKED that this kind of garbage is on Slashdot! Come on, even half-witted self-respecting geeks know about this stuff already better.
Pulsed Media Seedboxes
I championed it here but there is no software that utilizes it and programming for it is difficult as mentioned here in many articles.
New languages aren't being used to help out multicore or parallel processing with graphics chips.
A graphics chip computer built for gaming and general use would be amazing. It would cost as much as an entry level general chip using pc but could do 3d GAMES!
But would need parallel processing language.
http://www.gpgpu.org/
http://en.wikipedia.org/wiki/GPGPU
A computer that is used in an efficient way will at any time either do nothing (and hopefully switch to standby/hibernate after a couple minutes) or do several things parallelly. While I read Slashdot, my computer is mostly downloading mail, uploading files to a web server, defragging the disk, encoding a video, doing a background backup, etc. Or if it isn't, it can fold proteins. Modern browsers will also soon be multithreaded, some already are, so every tab, plugin etc. can run on its own core.
Apps that lack multithreading can also be a blessing - less overhead, and they are restricted to one core, so no matter how bad an app behaves, there will always be a core that isn't affected by the CPU hog so the machine stays responsive. Responsiveness is much more important than raw computing speed.
I like to differentiate myself with threading as a developer but this article is over the line.
It's absolutely absurd to say that multicore chips won't benefit a system when any modern Windows or Linux installation will not benefit users. I think I have like 20 windows open, and quite a few processes. Some of them are active, and some are not. The fact is, these systems, both Windows and Linux, and if anything, Linux, are designed to serve up multiple threads with multiple users on multiple processors. They -are- mainframe operating systems in a consumer role..
This is my sig.
it is the answer to the question that no one asked...
In a real world application, as others have mentioned pretty much all of a programs time is spent in an idle loop waiting something to happen and in almost all circumstances it is input from the user in whatever form, mouse, keyboard, etc.
So lets say it is something life Final Cut. Now to be sure when someone kicks of a render this is an operation that can be spun off on its own thread or its own process, freeing up the main process loop to respond to other things that the user might be doing, but that is where the rubber really hits the road is user input. The user could do something that affects the process that was just spun off, either as a separate thread or process on the same core or any other number of cores so you have to keep track of what the user is doing in the context of things that have been farmed out into other cores/processes/threads.
Enter the OS.. Take your pick since it really does not matter which OS we are talking about, they all do the same basic things, perhaps differently, but they do. How does an OS designer make sure any of say 16 cores ( dual 8 core processors) are actually well and fairly utilized? Would it be designed to use a core to handle each of the main functions of the OS, lets say Drive Access, Com Stack pick your protocol here, Video Processing etc., or should it just run a scheduler like those that they now run which farms out thread processing based on priority? Is there really any priority scheme for multiple cores that could run say hundreds of threads / processes each? And what about memory? A single core machine that is say truly 64 bit can handle a very large amount of memory and that single core controls and has access to all that ram at its whim ( DMA not withstanding ), but what do you do now that you have 16 cores all wanting to use that memory, do we create a scheduler to schedule access from 16 different demanding stand alone processors or do we simply give each core a finite memory space and then have to control the movement of data from each memory space to another, since a single process thread ( handling the main UI thread for a program ) has to be aware of when something is finished on one core and then get access to that memory to present results either as data written to say a file or written into video memory for display?
I submit that the current paradigm of SMP is inadequate for these tasks and must be rethought to take advantage of this new hardware. I think a more efficient approach is that each core detected would be fired up with its own monitor stack as a place to start so that the scheduling is based upon the feedback from each core. The monitor program would be able to ensure that the core it is responsible for is optimized for the kind of work that is presented. This concept while complicated could be implemented and serve as a basis for further development in this very complex space.
In the terms of "super computers" this has been dealt with but in a very different methodology that I do not think lends itself to general computing. Deep Blue, Cray's and things like that aren't really relevant in this case since those are mostly very custom designs to handle a single purpose and are optimized for things like Chess or Weather Modeling, Nuclear Weapons study where the problem are already discretely chunked out with a known set of algorithms and processes. General purpose computing on the other hand is like trying to heard cats from the OS point of view since you never really know what is going to be demanded and how.
OS designers and user space software designers need to really break this down and think it all the way through before we get much further or all this silicon is not going to used well or efficiently.
Hey KID! Yeah you, get the fuck off my lawn!
Not all linear problems can be solved with parallel processing.
It takes 1 woman 9 months to produce a baby - but 9 women cannot produce a baby in a single month...
Software operates primarily on a linear function: Process A needs to be done before B, and B before C and so forth. The real issue is that dividing a linear process across parallel processors is notoriously difficult: Task "D" is sent to processor 2, however the data it needs to process is already sitting in the cache of processor 0... this slows things down and E finishes before D and the app crashes.
This is where the design of Microsoft's Hyper-V platform shows real promise. By placing a virtualization layer (Hypervisor) between the OS and the processor, the added abstraction layer can distribute dissimilar or unrelated processes to different cores. It can also assist with non-linear computing tasks that work well with parallel processing and even provide the framework by which
Look at it this way: there is no way that Microsoft is going to leave spare resources sitting idle. They'll figure out some way to consume every single one. It's the Microsoft way!
Good security is based upon reality and common sense. Common sense is a function of having common knowledge.
The entire idea of multi-core is not that your performance increases, but that performance doesn't decrease.
I want every thread to run simultaniously instead of timesharing. Imagine all your apps are devided in multiple threads, then you'r all timesharing again and boy, don't you just hate it when your entire computer slows down to a crawl?
I mean look at the succes of 3D window management; you'll lose a little performance overall but when a single process jumps to 100% CPU reservation then at least there's no 2D WM lockup.
Here be signatures
Would it not be much simpler though from a prespective of not needing to deal with complicating motion estimation algorithms and such just to split video work along groups of b-frames? Seems like as long as the video was more then a few frame groups in length you would get just as much gains without even needing to rempliment much if any of your existing codec algorithms.
In these discussions about parallelism, I used to recommend splitting a 2-hour video into four 30-minute parts and feeding each to a single-threaded encoder. But that would need more cache and memory bandwidth, something that a lot of PCs with multicore CPUs lack, and it wouldn't work at all for live streaming. Splitting at group-of-picture boundaries might work better, but it would still add more latency to a live stream.
The idea of an OS and/or suppoet tools handling the SMP problem is nothing more than a crutch for bad programming.
In fact, anyone who grew up with a real multitheaded, multitasking OS is already writing code that will scale just dandy to 8 cores and beyond. When you accept that a thread is nothing more or less than a typical programming construct, you simply write better code. This is no more or less an amazing thing than when regular programmers embraced subroutines or structures.
This was S.O.P. back in the late 80s under the AmigaOS, and enhanced in the early/mid 90s under BeOS. This in not new, and not even remotely tied to the advent of multicore CPUs.
The problem here is simple: UNIX and Windows. Windows had fake multitasking for so long, Windows programmers barely knew what you could do when you had "thread" in the same toolkit as "subroutine", rather than it being something exotic. UNIX, as a whole, didn't even have lightweight preemptive threads until fairly recently, and UNIX programmers are only slowly catching up.
However, neither of these is even slightly an OS problem... it's an application-level problem. If programmers continue to code as if they had a 70s-vintage OS, they're going to think in single threads and suck on 8-core CPUs. If programmers update themselves to state-of-the-1980s thinking, they'll scale to 8-cores and well beyond.
-Dave Haynie
I understand how on a single CPU, the interrupt line is set low and the device puts a unique number on the data bus. How are interrupts handled on these multicore chips.
Multithreading is a system-level thing, not a language level thing.
Sure, there have been languages that make threading ubiquitous, but they've never caught on, and it's hardly necessary.
You'll notice that internet, graphics, and many other programming necessities are not built into C/C++ either. They are higher level functions, and thousands of programmers have no problem understanding C's role here. People have been writing multithreading code in C/C++ for decades... I've personally done in from the 80s until now, under a dozen or so OSs.
Don't use your chosen language as a crutch for sicking to the level of programming practiced when that langauge debuted. The whole point of C was not to define much of anything in C itself.. in truth, the language proper doesn't even do I/O... that's handled via a library. So is threading, so is graphics, etc.
-Dave Haynie
No one's going to learn how to really work around using multiple cores until they're really out there in the wild where developers can work with them.
It really is the next logical step, and no one ought to be bitching no one knows how to dig holes when we just found out how to make shovels.
"Most people, I think, don't even know what a rootkit is, so why should they care about it?"
The simple fact is that most programming tasks are inherently linear. Sure, you can design programs that are better, and you can offload work to other CPU's in clever ways, but at the end of the day, you can't do that much better than a couple of major threads per program, with all of them running on an empty CPU.
In Office apps, you can't "offload" anything at all, really. Possibly a spellcheck or grammar check on the side, but you're not going to make *any* gains over the simplistic setups. Why? Because 99% of the program is spent waiting for the user to do something and, when they do, 99% of the time you can complete that task in a matter of microseconds.
In games, you can offload AI, physics, pathfinding, graphics drawing, etc. but at the end of the day you still have to limit interaction to what the user does (i.e. shoots, moves, etc.) and/or the FPS limit. You can get slightly more done by parallelising in that time, purely because the AI is not reliant on the graphics drawing etc., but every 1/60th of a second you have to bring everything to a halt and pass it off to be drawn in order.
In database apps, you can pass off I/O and tricky queries to other threads and so make gains, but you're just introducing a lot of locks, callbacks and everything else to be able to do that. You can scale with that, but you can't scale that far. And at some point, you've got to read the same data off the same disk as on a single-CPU system and pass that, with *all* it's results, to the user.
In operating systems, you can offload a lot of tasks, but again, most of the time you are looking at waiting for user input to actually do something.
It's an inherent limitation of the machinery and the uses, not the design of a particular operating system. Sure, you can make gains over what we have now, but the simple fact is that at some point you have to manage and collate all those seperate tasks into a result and you can't do it until everything's finished. To use games as an example (because they are a mass-market, hardware-pushing, performance-critical application that will routinely make use of multiple CPU's/GPU's to the full extent), you can't necessarily do the AI until the physics has been done (otherwise bots would walk into moving objects that weren't there a second ago). You can't do the graphics until the AI and physics are both done. And over all that, you have to do SOMETHING every 1/60th of a second whether the other threads have finished or not. And there's only so many ways you can split up tasks. You can do graphics rendering in blocks of pixels, as proposed, but at what point does the locking of memory and random bus access killing the memory cache actually make it *less* efficient that just running from 0 to 1024 and then from 0 to 768 (or whatever).
A lot of applications *don't* thread things that they should. On the desktop, asynchronous DNS is a major culprit in my opinion - I should not be able to hang file manager windows, firewalls, browsers, FTP clients, etc. just because my DNS server has gone down or is momentarily inaccessible. And when I click the god-damn Cancel button, then you should CANCEL the other thread as quickly as possible BUT also let me just get on with whatever else I want to do with this app. However, this has nothing to do with multi-core or operating systems, it's to do with single-threaded apps still being made on systems that have reliably handled multi-thread apps (even on single-core machines) for decades. Ideally, EVERY tab in my browser window should be a different thread. It would mean that a tab with a particularly heavy Javascript or particularly slow flash movie will not slow the operation of the browser itself down. It's quite a simple job but a lot of browsers don't do it - there's a reason for that and it's not because GCC or the operating system doesn't include a "pass this off to another thread" function.
The problem is not new, it's not exciting, it's not revolutionary, it's not going to lead to a whole new way to pr
Question: Is this looking at a single app or multiple? It seems fairly straightforward to me that most individual apps aren't going to see a huge boost from a hefty amount of cores, but multiple apps or threads/instances would probably see plenty.
Servers especially should be able to take advantage of this, where individual cores - just like multiple CPU's before - can handle multiple instances of a server daemon.
From what I've seen thus far, Unixy OS's handle SMP fairly well. I haven't touched windows webservers in awhile, but I'd imagine they might do well enough in that scenario too.
Translation: Not a big increase for your game/spreadsheet, but still some extra bang for multitasking. IO is still going to be a big bottleneck though.
I am writing this from my 8 core Intel box running Linux with 8GB of memory. This is the FASTEST computer I've ever had and the first time I've noticed a big leap forward. I normally don't care about cpu speeds, graphics cards, etc. Hardware tends to be fast enough for the current generation of software (I run Linux) and that's usually all you need. But this 8 core thing is different.
I develop and run very heavy graphics applications, where cpu tends to be the bottleneck. In my world, you used to rely on extra cpu from render farms or clusters to get the job done.
This world is changing. Shorter kind of jobs that require a quick turnaround, can now be done locally instead of sending jobs to the render farm. This is massive. As people start doing more jobs locally, it also frees up space for the longer running batch jobs, so they get done faster too.
When I first got the machine, it had Windows installed and it felt just as slow as a regular (single or double cpu/core) box. That should be of no surprise to anyone around here. But Linux sure knows how to use the multi core magic.
Actually, you should get a perm.
Couldn't stand the weather
This is what i hear, "waa waaa please use more cores so people see a need to get a new 8 way CPU!"
I don't think this is a problem that programmers should solve. Sure its nice if they utilize multiple cores as much as possible but i don't want cores to be used just for the sake of it. If adding another core to an application gives a 30% gain im pretty sure i could use that power for better things in most cases.
The problem in my mind is that core speed has hit a brick wall and tossing more cores at the problem is just a desperate attempt to keep the upgrade treadmill going. Beyond four cores i personally wouldnt see any performance gains other than in rare occasions where i browse, watch movies, encode movies and unzip some large file. I would in those cases also hit the HD and i/o much more than the CPU.
HTTP/1.1 400
--dave
davecb@spamcop.net
The last time I checked, pthreads weren't exactly non-standard. Every reasonable system has it built-in for over a decade, and there's only one system where you need to get that as an add-on. Guess which...
The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
Apple have no 2 core intel systems. Period.
Even the lowly Mac mini is a dual-core system. Every laptop is a dual-core system. The Mac Pro is either 4-core (with hyperthreading for a virtual 8-core) or 8-core (with hyperthreading for a virtual 16-core) system.
"Better to keep silent and look the fool, rather than speak and remove all doubt"
Simon.
Physicists get Hadrons!
Windows and Linux aren't designed for PCs beyond quad-core chips, and programmers are to blame for that. Developers still write programs for single-core chips and need the tools necessary to break up tasks over multiple cores.
How many times do we have to tell that Linux *IS* the fscking kernel??
Given that, including Linux and Windows in the same bag doesn't make sense. Which makes the entire post m00t.
Solutions:
1) s/Windows/Windows NT kernel/
2) s/Linux/GNU\/Linux/
Nice try to get a battle though.
There's not even a way in the C or C++ core language to start a new thread. And with many different third party libraries, there'll never be a reliable standard way to do it.
Never? A standard, reliable way to do it will be part of C++0x - so that's hardly "never"...
Unix has for ages run on multi CPU systems. And it does this well. And with easy tools you can harvest the power of all CPUs: the pipe
Every part of the pipe can run on another CPU.
I recently came across fslint, which is a example of heavily piped shell.
In short (leaving out the parameters and options) it runs
find | sort | tr | sort | bash | merge_hardlinks| uniq | sort | cut | tr | bash | xargs | sort | uniq | cut | sort | tr | xargs | sort | uniq | cut | sort |tr | xargs | sort | uniq | cut | bash | sort
That's a lot of CPUs :-)
OK it's not a great example for CPU hungry programs. But the progress of the modern programming languages which tend to be monolythic beasts to do everything (perl, php, java) lead to programs not using pipes or other types of inter process communication because it's just cumbersome.
The pipe concept enables multi CPU programming without even thinking about how to put tasks on different processors.
Unfortunatly I have not found a language which sets such a simple concept as the fundamental programming principle.
See the unix shell, without the pipe you can't really do much.
Atari rules... ermm... ruled.
How about rewriting the standard libraries for many procedural languages (this includes OO languages, since OO is really just a style of procedural programming) to use multi-threading whenever appropriate? For instance, any array sorter should use a multi-threaded heapsort instead of a quicksort if the array is above a certain size. The program flow would still be procedural, and the average programmer would not have to deal with parallel programming very often, and the parallel specialists can handle the libraries where its needed. Of course this won't work for every circumstance, but it would be a great way to get the most out of the code we already have.
... how dare you not focus your efforts on the 0.0002% of your users who run your desktop app on their server big iron!
========
CINC, 4th Penguin Legion
Doing a multithreaded GUI is hard, and very prone to bugs, and on top of that there are quite a number of assumptions made by JS code that force monothreading.
But anyway, FF does multithreading, in fact in the 3.1+ you can spawn new worker thread from unpriviledged javascript; however that code won't be able to touch the DOM, only pass and receive JSON-encoded messages.
The advantage is when you are using an applications like image processing, 3D rendering, video decompression, downloading where tasks can be run in the background. Scripts and macros that apply on multiple sets of data can be run in the background. Auto-save functionality has always been desirable, but annoying when the entire application freezes because it is single-threaded.
Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
Windows 7 will support 256 cores in the 64-bit version. Microsoft has made significant tweaks to the thread dispatcher code to make this possible. A good discussion can be found here: See http://channel9.msdn.com/shows/Going+Deep/Mark-Russinovich-Inside-Windows-7/
I sometimes despair at what I read here, sin(x), cos(x) are both very smooth, ie infinitely differentiable periodic functions, so why would it surprise you that interpolation off a table, spacing determined to give desired accuracy would not be quicker than the Taylor series, and since the function is periodic the table size is bounded.
Try that with any _ROUGH_ non periodic function and see where you get
eg 1/(1-e^x)
It wasn't C, if it was BCPL when David Barron developed the Transputer, at Southampton I would be surprised, but the world turns and now we have the AMD Hyperchannel and Infini-x
Three Cores for the MAC kings under the sky
Seven for the Windows-lords in their halls of stone
Nine for Linux users doomed to die virgins
One for the Dark Lord on his dark throne
In the Land of MS where the Shadows lie.
One Core to rule them all, One Core to find them,
One Core to bring them all and in the darkness bind them (with restrictive licensing)
In the Land of MS where the Shadows lie.
Sorry if this is straightforward to the hardcore programmers. I'm just a business programming sort of guy. Lots of lists and mailmerges.
One of most common tasks in web programming is
Couldn't a whole bunch of these be farmed off onto different processors?
Yay me!
So what the author is blathering and foaming about are problems found and solved 20+ years ago. Instead of programmers studying anything, the author should study some. NUMA has been in Linux for close to 10 years. It solves the memory bus problem. Multi-threaded applications solves the problem of using more than 1 core. I do it all the time. Did it yesterday, will likely do it tomorrow. Not every program takes advantage of multiple cores. Quite a few do. Those that scream the need for parallel computing use all of the cores (on my nehalem system it shows up as 8 cores). I do with authors would do the tiniest squeak of research before describing how the world is going to end. Oh well.
Knuth's maxim is sufficiently pithy to have become, over time, self referential, as evidenced by your misunderstanding.
The root of all evil used to be deep and singular, now it is broad and shallow. I guarantee you that Knuth did not include choosing the best fundamental algorithm under the label "premature" unless it involves squabbling over log log N terms or stray digits in the exponent term.
http://www.siam.org/pdf/news/174.pdf
An unpacked (deoptimized) version of Knuth's maxim is that the transition from program structure and notation which maximizes readability, comprehension, and conviction (concerning its correctness and merit) to one which favours performance should be delayed as long as possible. Ideally until performance becomes the sole remaining success factor.
(Taking into account the human mind's special capacity to imprint upon evil, Knuth's formulation remains the better one.)
Originally Knuth meant manually hoisting loop constant expressions (often in ways that later turn out to not be fully general) or manually evaluating constant expressions or manually fusing nested function calls and the kind of rot that a good compiler these days will do on your behalf. Anyone used the "register" keyword lately? Once upon a time it seemed like a good idea.
While the principle remains the same, the temptations have changed. Such as parallelizing a bad implementation of a poor algorithm in the misguided belief that the underlying task is not sequentially bound.
That said, projects which do *no* evil typically fail to impress anyone. The ideal is to wrap large amount of cleanly structured and accessible source code around a nugget of pure, smoldering evil, coked to the last clock cycle.
Perversely, the worst example of this is TeX itself. The smoldering nugget of pure evil is the single pass parsing regime and data packing eight bit character values.
I suspect the literature on parallel programming would roughly equal the literature on electro-chemical storage cells. Sheesh, if only those guys were paying attention, we'd have watch batteries powering small cities by now.
On second thought, how much literature could there really be if you can summon the majority of it onto your screen in 4/10'ths of a second for any combination of keywords?
Parallel programming is a lot like fuel cells. You get some pretty impressive results on selected applications involving pristine apparatus in a controlled setting, dating back to the Apollo program (in both cases).
Reality on the ground is rarely so forgiving.
If we hadn't already achieved a pixel processing speed-up between 1980 and 2008 best approximated by a sideways 8, Javascript wouldn't even have entered the conversation.
It boils down to this: ignoring everything you guys have already accomplished, you've pretty much done nothing. I worked for that kind of company once. The guy in charge put on a Cirque du Soleil of intestinal recursion. That's how I feel about the claim that software developers haven't been paying attention to parallelism for elephant years.
It's not really a problem. If you can't split a single task over multiple CPUS, you can just run multiple separate tasks(like erlang does).
- burning DVDS
- playing videos
- generating sweet fractals
- web browser(recently being more CPU intensive)
- Bit torrent
- BOINC
- Indexing files for searching later
- Rendering frames
- Compiling updates
- Compressing backups
- While still having enough spare resources to remain responsive.
...and that is all I have to say about that.
http://jessta.id.au
Optimizing applications for multiple threads is like unrolling loops. Programmers are writing logic not implementation, compilers should be taking care of implementing logic in a way that is optimal for the hardware.
Now, get back to me when -fuse_threads is a compiler option and implicit when choosing o3
Java never caught on? That's news to me....
Oolite: Elite-like game. For Mac, Linux and Windows
It's really quite frustrating to see posts like this. Posts that dont take into account what is needed and focus on what we are incapable of doing - even when they dont need to.
So lets look at reality for second. First, most modern OS's scale very very far past 4 cpu's (not sure what windows scales to, but linux certainly has no limitation based on current cpu reality). So the kernels are just dandy for multi-core cpu's, bring it on! 128 cores, we're ready for ya!.
The same is not true at the application level, and that is a fair comment. But dont confuse linux and windows with their apps for crying out loud! From an application point of view we are capable of parallel coding, but its non-trivial. Its also not something we need alot of the time.
For instance, we now buy servers (our cheapest models) with dual cpu's and quad cores and we're tending to virtualise it up into several machines with 1 or 2 cpu's each. Now whether you do this because you assume the OS will utilise one cpu and the apps will utilise another (as one person told me is irrelavent). Surfice it to say, having 2 cpu's is usually quite nice.
But what requires more then that in reality? well, your desktop might - after all theres alot of things going on at once right? In some point cases, thats true (there are quite a number of very heavy applications out there, and supprise supprise, they can multitask *GASP*).
Same at the server, not many things require that many CPU's and even at the application level, we've gotten good at spreading heavily loaded applications across multiple servers (we call it load balancing, was that too sarcastic?). Take mail (weather its exchange or postfix or sendmail or whatever), or web servers, etc. Those server applications that do require heavy grunt tend to already be coded with "parallel" in mind, even across multiple servers (think oracle RAC).
As for cache contention - well it sounds like the hardware makers are finally fess'ing up to the fact they have a problem, Houston!
make -j bitches
optimized for as many cores as you want
But how many humans do you know that only run one or two windows on their PCs at the same time?
Your average XP user has 4 or 5 windows open at the same time.
At the moment, I have three windows open on this Windows XP machine: Firefox, Command Prompt, and Windows Explorer. All three are waiting for user input, including the Firefox window that I'm typing this post into (between keystrokes). If all the apps you have running in the background don't bring your load average above 1, you aren't likely to benefit from multicore optimization.
And also, through the API achieve reliability. For example, if one of the running copies of Windows crashes, the Apps with multiple worker threads keep some threads running, and they can detect and recover from that failure
Congratulations: you've reinvented VM/CMS. It'll work at least until the CP crashes.
This allows applications to work around issues like blue screens in certain OSes, or kernel panics in others
Blue screens and kernel panics are often caused by defective device drivers. Would your design run these in a VM?
so long as [applications] perform sufficient checkpointing
Third parties paying attention to checkpointing? NBL.
First and foremost, if they are going to go with more CPU's they might as well sort out the problem with the extra heat output.
On a hot day (say, 35C+ or 95F+), I wince when I try to run a multi-threaded application on my dual core machine (Intel Pentium-D 2.67 GHz); or some background process runs at the same time as my foreground process.
Why is this, you maybe asking? Well, it's because it sounds like old-style CD recording gone wrong. It starts off with a low sounding hum and gradually gets louder and louder at increasing pitch with seemingly no end.
It's fine and dandy on a normal or cool day, but unbearable on hot days. I just wonder how many other people have to use CPU limiters to play certain games for a few weeks a year. Alternatively, I realise I could have damaged the thermal paste over the processor when installing it (I have 3 standard fans inside a roomy case).
This OS will be the included OS before Apple starts selling 4- and 8-core consumer machines, and it will have been out long enough for developers to use Grand Central to leverage those cores.
The real problems with exploiting parallelism are (a) a solution is needed since Moor's law has run into a brick wall, excepting major process improvements that the semi-manufacturers dont see, and (b) all current algorithm design going right back to 1948, John von Neumann, has been essentially serial. Threads, and multi-cores are an essentially serial solution to a parallel problem.
.. 1048576 will need new algorithms, for the first time in 60 years of computing and 500 of Mathematics.
Parallel computation, is very hard, see how many kernel (OS) developers we have v app. developers. This is because of problems related to timing and computational order. This produces problems with data sharing and correctness.
Then there is the problem space, in some problems, easily artificially constructed, then the next step depends on the completion of all earlier steps, and the solution cannot be parallelized eg Fibonnachi where you can show a trivial parallel decomposition that wins nothing. In other, and more interesting cases eg the Partial Differential Equations of Mathematical Physics, Routing, and Finite Element systems some to large amounts of parallelism may be possible, particularly will well thought out analysis that capitalizes on special features of the problem or the known solution, eg multimode solution and boundrey condition matching in Navier-Stokes or Elasticity.
The point is that this is at the algorithmic level, it is not about code optimization or other programming paradigms so the kool aid of we need a better tool chain or parallelizing compilers is hope, hand-waving and optimism and of course product support.
Dont get me wrong, I see this as a very good thing, seas of CPU (core = 1 CPU), will fully solve lots of problems, and improve robustness, but it will not help with many problems, 4-16 cores will be generally handy, but 1024
A last thought, when we get to ~1073741824 cores we may start to make progress with AI and need to worry about the Singularity.
Seriously, I'll never understand article after article about mulitple CPUs being wasted when I have 37 processes in Task Manager.
Peter predicted that you would "deliberately forget" creation 2000 years ago...
As has already been explained, Non-Sequential thinking is hard, you postulate double speed, BUT the producer thread, the app finished and handed of the buffer to the OS to send to the GPU, and you say it threads this. Well fine, so the threaded part can run on another core, but then hardware DMAs the data and waits for a GPU interrupt/done-queue ack so how does this speed things up on multicore. Not at all, someone has to set up the DMA and wait, not run, while it completes, so unless all cores are at 100% you have saved nothing, and created additional overhead spawning a new thread
Duh, Marketing Departments
I think I agree with you, BUT... don't fall into the old trap: If ten machines can do the job in 1 month, 1 machine can do the job in 10 months. But it doesn't necessarily follow that if one machine can do the job in 10 months, 10 machines can do the job in 1 month.
Also, the problem with runtime interpreters is not that they don't generate assembly code. The problem is that it is harder to get at the underlying code that is really executing. That code could be optimized if you could see it. But seeing it is just more difficult.
Behold, this dreamer cometh. Come now, and let us slay him... and we shall see what will become of his dreams.
Exceptions and lockouts for security, timeliness, reliability, etc. can always be made. The general solution is for general purpose computing AND special case applications.
Behold, this dreamer cometh. Come now, and let us slay him... and we shall see what will become of his dreams.
The quoted paragraph in the SlashDot article. Does it appear in the InfoWorld article? I can't see it. The link goes to the article no problems, but where is this quote? Words like "blame" don't even appear!
Am I missing something? Is the link to InfoWorld incorrect?
The reason I wanted to read the original article was because the SlashDot teaser (quote) mentions Windows and Linux performance, but not Mac OS X, and I wanted to see if the original article mentioned that or not.
Help?
Problem? The development tools aren't available and research is only starting.
Nonsense. Here are a few couple of portable tools and libraries that will solve many developers problems.
http://www.threadingbuildingblocks.org/ (c++)
http://developers.sun.com/sunstudio/downloads/ssx/tha/tha_getting_started.html
Research is mature and ongoing.
Education, however, is only starting to reach the mainstream.
i wish i could stop
Insofar as the language proper is defined by the language standards, the I/O libraries are part of C, because they're specified in the ANSI C and C99 reports. Any conforming C implementation must have the standard I/O functions, and they must behave in the way the standard specifies. That differs quite a bit from the situation with networking libraries, which are third-party and not covered by the C standard.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
(other than M$ having to update licensing I suppose, since they usually license for 4 cores max now on "normal" windows versions.)
MS limit windows processor support based on the number of "processor units" not the number of cores.
1 processor unit for home, 2 for proffessional (buisness/enterprise/ultimate if you are using vista) 4 for server standard, 8 for server enterprise and even more (I don't remember the exact number) for server datacenter (note: the names of the server editions vary slightly from release to release but it's usually pretty obvious which maps to which).
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
Most linux programs are small and do a small part. larger apps usually call down to smaller apps. Doesn't this in itself let the os balance the work across multiple processors? It seems that anything that is very intensive (like compiling or video conversion) the apps to make them run across many processors is already done. I have used gcc to compile across 6 diff machines at one point. Transcode uses all available processors. So does folding at home! The only thing I really think is missing is a software based GLX engine that can use all available cores. perhaps a higher end video card isn't needed for basic 3d anymore.
The fact that all we do is sequential tasks on our computer means we are still pretty stupid when it comes to "computing". If you look outside your CPU, you'll see the rest of the computers on this planet are massively parallel and do tons and tons of very complex operations far quicker than the computer running on either one of our desks.
Most of the computers on the planet are organic ones inside of critters of all shapes and sizes. I dont see those guys running around with some context-switching, mega-fast CPU, do you?**. All the critters I see are using parallel computers with each "core" being a rather slow set of neurons.
Basically, evolution of life on earth seems to suggest that the key to success is going parallel. Perhaps we should take the hint from nature.
** unless you count whatever the hell consciousness itself is... "thinking" seems to be single-threaded, but uses a bunch of interrupt hooks triggered by lord knows what running under the hood.
For HD DVD and Blu-ray authoring, the CineVision PSE system we designed for VC-1 used a hybrid spatial/temporal model.
First, the codec itself was 4-way threaded, encoding each 1920x1080 frame as four slices. Then the file was distributed across multiple blades, each processing a section of the video. Since this was for disc-authoring, we knew where chapters were going to be in advance, and so split by chapter; ideally you'd have at least 2x as many chapters as workers.
The key to avoiding the "chunk transitions" was aligning along chapters, since they almost always start at a scene change or a black frame, so it'd be easy to see the problem. Also, there is extensive 3rd pass support to manually tweak a transition that could go wrong. There was a fair amount of workflow that had to get baked in to get full advantage of the paralleization, like prepopulating each worker with the source during the 1st pass and keeping it cached for the 2nd and potentially 3rd passes.
Anyway, it works nicely; that product was used for 90% or so of HD DVD titles and about a third of Blu-ray titles so far. Last I heard, the record for a 2 hour movie encode was about 6 hours for 2 passes. I'm sure it'd be faster yet with more recent processors. That scaled up to 64-128 cores pretty well, given source chapters. With overlapping scene detection in the first pass, it could be scalable well beyond that for long-form content. Of course, with short content you're not so worried about end-to-end encoding time, but full throughput.
As suggested earlier, live streaming is that hard stuff, since you can't do significant temporal slicing without adding a whole lot of latency.
We have a similar kind of issue with Smooth Streaming for Silverlight, where we encode the same source in multiple bitrates, and need to make sure GOPs are aligned across all the data rates for seamless switching. For an example of that:
http://on10.net/blogs/benwagg/Behind-the-Scenes-at-SmoothHDcom-Encoding-Big-Buck-Bunny/
My video compression blog
Since I see little evidence that timothy or Mr. Chapman read the article, I'll do them a favor so they don't have to click:
< article paycheck="undeserved" >
Hi I'm Agam Shah and I'm writing an article about multicore processors, but these concepts are so new to me that I'm putting quotees around "race conditions" like it's frickin' sharks with lasers.
So then I did a Google search on "parallel programming tools" and it help me get another paragraph out of the way.
Oh, and I quote some lamer analyst who has never heard of NUMA or libhoard, so I'll try to fabricate some crisis that the problems they address might never be solved.
Parallel programming is hard, WAH! WAH!
Oh, except when it's not, as in that trivial application named Photoshop. I'll write one of those next weekend.
So why doesn't the difference between the native graphics API's on Windows and Mac or Linux similarly prevent you or anyone else from writing graphical applications? You either decide what platform you're targetting and use the API's for that platform, or if you want cross-platform you choose cross-platform (incl. threading) tools. Doh!
From a conceptual point of view how difficult is it anyway to switch from one platform's implementation of the standard "threads+mutexs+condition variables" model to the exact same model on another platform?!
Some languages has existed as a bunch of thread for years, like erlang. And event-based designs almost completely solve this problem. Some things like xlib and glib still run as a big ugly loop but there are alternatives like xcb, that at least one desktop manager uses (awesome wm).
The two things that currently peg for me against a single core is firefox's unified javascript loop (this changes a bit in 3.1), and ffmpeg for high def video (multi-threaded is in the works). The fact people use alot of differnt programs at once and as most programs are not very demanding also makes this not that big of a problem. Few applications need single-thread programming (all i can think of is compressors 7zip, video, etc in their top-quality modes, and certain resource allocators), most things would never hit that single-processor head if they were written decently. I think its just a legacy application problem.
Programmers like myself were waiting for a clear direction in terms of language and compiler support for multi-core development, and of course multi-core debugging is a challenge.
Now we have quad processors from multiple vendors and there are plenty of choices for hardware, but there is still not a clear winner when it comes to development tools and methodology. Intel has a threaded toolbox, and beyond that we can roll our own. The only support I have seen that made me smile was the multi-core support in Python, which only exists in the more recent versions, and those versions are not ubiquitous yet.
It is really easy for Intel to unilaterally make a decision to stop processor development at 3GHz and put it on the programmers to reorganize their code in a parallel manner. It is something else again for each software engineer to choose how to do this and commit their clients to those decisions, and the fall out that will last the lifetime of this code. Companies that paid to migrate their applications to hyper threading only got to benefit for a year or two before the environment went away. I am frightened to make a decision today about multi-core that depends on Intel (and AMD) to keep multi-core stable far enough into the future to make development worthwhile.
It is fairly obvious at this point that multi-core is here to stay, but it will be nothing more than a way to sell more expensive hardware until the powers that be provide a cohesive set of tools and methodologies that make multi-core useable to address our current problems. A friend of mine told me of his experiments configuring a multicore Windows box for gaming using process affinities. He indicated that the Windows operating system used about 1.5 cores itself, which in the case of a dual core machine left about a half core for the game. My experience has shown that we have little control over the way tasks are assigned to specific cores, and multi-core seems to do more for the operating system and environment than the threads of a specific application After years of effort addressing this problem, it is still not clear to me which tools and methods will be the most stable over time. It looks to me that there has been very little progress on the software side in the last two years.
Assume we develop affordable 32 bit quantum computers. How does that change this parallelism problem?
Running with Linux for over 20 years!
Ok, so it points out a flaw with Windows 7 and Linux but completely fails to give the praise to the efforts that Apple is doing with Mac OSX and Snow Leopard!!! OSX is incorporating incredible efforts to leverage GPU and Multi-core solutions for developers. Ignoring these pieces is incredibly ignorant of the "personal computer" and "distributed computing" markets.
http://www.apple.com/macosx/snowleopard/
From an application standpoint, how is hyperthreading any different from multi-core?
I would agree that yout typical email client or word processor is not going to benefit much from multiple cores. Most business applications running non Windows aren't likely to need even more than two cores to get their work done. (I supposed one would be using the other two to run the anti-virus software to keep that OS reasonably healthy, eh?) But OSes like Linux tend to have users that are doing more of a variety of tasks simultaneously. They'll have an email client frequently checking for new mail, an audio player running, a windows where they're downloading patches or new source code onto their system, an editor window or two open, windows to other systems on the local network, a browser with multiple tabs being updated frequently, and who know what else. Can you run all the same applications simultaneuously on Windows? Maybe, though without multiple desktops it's unlikely. Alt-Tabbing though a list of multiple programs makes switching from one program to another incredibly clumsy so most people I know avoid running more than 2-3 applications at once making more than dual-core chips mostly overkill. If the extra cores are going to be useful at all in a business environment, I suspect it'll be to run a slew of additional tools used to enslave^Wmanage the desktops centrally. Servers may be a different story but I believe the extra cores would be used more advantagiously by Linux since the servers running it are, more often than not, tasked with running more than a single application at a time; something which Windows servers are still not asked to do in most situations.
CUR ALLOC 20195.....5804M
I have a dual quad-core Xeon server and it keeps all cores busy and is definitely faster than a single or dual-core system. Nothing fancy going on. I run several virtual machines and each of them runs normal software such as web servers.
At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
It's more a language problem. C(++) was never meant to run on systems with several processors. The programms are meant to be execute in a single thread of execution. If you actually want to use multiple processors it's quite hard to do.
Object oriented programming might solve some of the problems.
I have done multithreading in C / C++ and a programming language which supports it natively (Ada). And there was a huge difference in the amount of bugs (aka deadlocks) I produced.
It your line of argument which hinders multithreading. Not that you argument is wrong. You are right. But that makes it even more dangerous.
The point is: There is nothing wrong with making your live easy. And if a programming language which makes multithreading easy and less error prone would have caught on we would be a lot further now.
Java does not support multithreading - java.lang.Theads a library function does. Have a look here:
http://en.wikibooks.org/wiki/Ada_Programming/Tasking
See, not a library but language keywords. Note that I a fluent and Java and Ada and have designed and implemented larger projects in both.
But there is a difference in the "ease of use" between a language feature and a library feature. That is unless you use a language like smalltalk where everything is library.
Think of how error prone printf is. If parameters and the little % stuff does not match all goes havoc.
And more so in multithreading. Here the bugs are often sporadic and extremely difficult to find. And a programming language which support it natively is a great help. See:
http://en.wikibooks.org/wiki/Ada_Programming/Tasking
So you know what I understand in "natively".
Great - there is still only one compiler to support "export" - almost 10 years after the standard was defined - and you speak about next standard. So when will the compiler we see the first compiler to support you new library?
Martin
PS: I know a language where all generics are "export" and all compiler support it - since 1983. So it is possible to implement.
Sure dev are able to cram more crap into spare cpu cycle. But looking at the trends now, single core is mostly the way to go with light OSes and optimized software ( netbook ) for the end user, as for more niche market that find a use in high performance computing (slashdot reader, scentist) I'm pretty sure that multiprocessor is a better way to go rather than multicore ( wich is a way to do cheap multiprocessor design anyway ). Anyway OSes HAL have been ready for a long time for SMP, but multicore broke the abstraction of SMP because of the shared resource on the die between the core. Anyway again the processor vendors create an offer, by putting faster processor, and software vendor put the needs by creating bloated software.
...in a few years when this replaces electronics as the standard method of switching for computation. There are working 30GHz photonic processors, and it won't be uncommon to see 10 times that in a CPU.
C has built in operators to add, subtract, multiply and divide numbers. You can use the CPU and the RAM memory. You don't need a library for that. I think the ability to do computations on the different cores and using multiple threads is just as basic as the computation and memory and should be part of the core language.
Most experienced developers (that use lower level languages like C/C++), do indeed know how to write multi-threaded applications, DESPITE the poor support for doing so by the compiler. This is usually done through threading libraries, rather than native language features supported by the language preprocessor and compiler and linker.
This is not actually the problem. The problem is that most applications simply don't need to be multi-threaded, and in fact adding threading frequently introduces more problems than it solves. Most multi-threaded applications would actually perform better as multi-instance applications. where each instance runs on a seperate core, in virtual isolation.
It seems to me that mutli-core systems facilitate this paradigm with almost no effort required by the developer. As it should be.
XMOS have been experimenting in this area already. Their language which is an extension to C supports code for parallel processing on multi core chips. See http://www.xmos.com/
Core utilization has nothing to do with how many threads and processes you have. It has to do with how many threads/processes you have which are active and compute bound from moment to moment. I have 137 processes (ps aux|wc) running in linux right now, and in toto they are consuming 0.8% of two cores (top).
100 tabs in Firefox should take as much cpu altogether as the one tab you are viewing. That this is not completely so, some of the background ones are animating CPU-sapping Adobe Flash that no one can see, is a design problem. Even so, I often have more than 100 tabs open with little effect on overall system performance other than Firefox's (and other browsers) absurdly gigantic memory usage.
How many programs do you run at once which are actually doing serious computing other than the one you are interacting with? Sure, there are times you are doing database jobs and such, but it isn't much for the typical desktop user.
Not until you've read the replies that have a clue.
If we are talking about technology... The Linux operating system (monolith kernel is the operating system) works great on CPU's what have more than 4 cores. If the article writer did not know, the Linux OS powers almost all supercomputers etc. The problem is that applications ain't developed to use so many threads etc. The OS just works fine but if the applications can not use multiple threads, you do not gain anything. If you do not run multiple instanses of them.
If we are talking about marketing lies and misinformation, the "operating system" (actually a _software system_) does not work at all, because usually this "operating system" can not use the multicore CPU's well. Who should we blame?
Serioysly, Linux just works on multicore CPU's but that is just an operating system. The software systems like Ubuntu, Fedora and Mandriva just ain't working so well.
I only know of one cray model that had Linux actually... Can I get some kind of citation from a trustworthy source on this, as I can't find it on Google?
Change is certain; progress is not obligatory.
Will that be implemented by the same vendors which implemented "export" in last 10 years?
I believe it when MSC++ and G++ have a fully working implementation.
Martin
Do I have an 8-core machine? No. Will I have one at some point in the future? Probably. Will I be happy if support has improved by then? Yes.
Seriously, until I have an 8-core machine I'd probably prefer other improvements (stability, for example) arrived before more efficient 8-core support. Also, given the problems with trying to program for too many cores, is it possibly fair to say that Intel are pushing the tech before the software is ready, or possibly even the wrong tech?
I saw talk earlier in the comments of instruction sets with inherent support of multiple cores. Wouldn't it be better to get something like that out, presumably some form of SIMD-like additions, before pushing the processors with >4 cores?
mysql> SELECT * FROM `places` WHERE `place` LIKE 'home`; Empty set (0.00 sec)
When I said "by the compiler" perhaps a better explanation of what I'm thinking of is "by the compiler tools".
This would be less compilation and more analysis. The compiler has already gone and done it's job, although tweaks to the compiler (some nonintuitive... not using inline functions) would help. This would be adding metadata to the binary.
For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
Pure functional programming with only implicit parallelism (no message passing) might be relatively straightforward and it's true that parallelism is easier to extract automatically than with procedural languages ... but this only allows for a subset of parallel algorithms.
Transactional memory already allows for a little more.
With message passing (ie. Erlang) you finally have the full deal. Removing aliasing from the equation removes a lot of very nasty problems, but some remain. Deadlock, starvation, livelock (with priorities making all those problems more likely to occur too). In fact Erlang is really too lax a language to be automatically checked for those problems (mostly because of the use of asynchronous message passing).
Modern Occam is better in that regard, although I wouldn't say that makes parallel programming easy either ... it's just as good as it gets.
The 6x86 was a great design that gave us much to be thankful for, in many ways - even though the FPU sucked (Funny enough because Cyrix used to make some magnificent FPUs).
In most cases the heat issues could be minimized by using a better cooler, as long as it was manufactured by IBM - even better sold under the IBM brand
Live long and prosper...
That's an application of Parkinson's law
I would rather describe the Nano as the grandchild of the WinChip, not the 6x86
Live long and prosper...
Why break into quadrants if you have 8 or 16 cores available?
Because it's often impolite to assume that nothing else is going on on the machine at the same time? (This actually means that the video encoder needs to know how many cores are assigned to it so that it knows how many ways to partition the data. Let the OS worry about keeping some resources back for other activity...)
"Little does he know, but there is no 'I' in 'Idiot'!"
"The problem my dear programmer, as you so elequently put, is one of choice.."
Seriously. I have been involved with software development from 8-bit pics to Cluster's spanning wans and everything in between for the past 20 years or so.
Multiprocessing involves coordination between the processes. It doesn't matter (too much) whether it's separate cores or separate silicon. On any given modern OS there are plenty of examples of multiprocessor execution: Hard drives each have a processor, video cards each have a processor, USB controllers have a processor. All of these work because there is a well-defined API between them and the OS - a.k.a device drivers. People that write good device drivers (and kernel code) understand how an OS works. This is not generally true of the broader developer population.
Developer's keep blaming the CPU manufactures' that it's their fault. It's not. What prevents parallel processing from becoming mainstream is the lack of a standard inter-process communications mechanism (at the language level) that abstracts a lot of the dirty little details that are needed. Once the mechanism is in place, then people will start using it. I am not referring to semaphores and mutexes. These are synchronization mechanisms, NOT (directly) communication mechanisms... I am not talking about queues either - too much leeway on their use. Sockets would be closer, but most people think of sockets for "network" applications. They should be thinking of them as "distributed applications". As in distrbuted across cores. As an example, Microsoft just recently started to demonstrate that they "get it" because with the next release of VS. It will have a messaging library.
choice:
At this time there are too many different ways to implement multi-threaded/multi-processor aware software. Each implementation has possible bugs - race conditions, lockups, priority inversion, etc. The choices need to be narrowed
Having a standard (language & OS) API is the key to providing a framework for developer's to use, yet still allowing them the freedom to customize for specific needs. So the OS needs an interface for setting CPU/core preferences and the language needs to provide the API. Once there is an API, developer's can "wrap their minds" around the concept and then things will "take off". As I stated previously, I prefer the "message box" mechansims simply because they port easily, are easy to understand and provide for a very loosely coupled interaction. All good tenants of a multi-threaded/multi-processor implementation.
Danger Will Robinson:
One thing that I fear is that once the concept catches on, it will be overused or abused. People will start writing threads and processes that don't do enough work to justify the overhead. Everyone who starts writing programs will "advertise" that it's "multi-threaded", as if this somehow automatically indicates quality and/or "better" software...Not.
... because in such languages, multicore-usage is already included from the very beginning.
In Haskell, you have to explicitly state, that you do not want something to be spread to more than one core.
With the included total type safety and lazy evaluating, I call that a winner. :)
At least, if you do not want to program hardware directly.
Any sufficiently advanced intelligence is indistinguishable from stupidity.
I don't know about Firefox in particular, but many browsers slow or stop Flash in hidden tabs. So you'd have to split those tabs into windows and tile them across the screen to get your CPU working harder.
Is this really a concern? How many people are tapping out their CPU? Honestly, 95% of people will never actively use more than 75% of their dual core 2.0 GHz CPU's RAM has and will be the limiting factor on most PC's.
Also, on the redhat servers I admin, we don't seem to have much trouble with 4x4 CPU's. Are people really saying there is a difference between 16 procs and 4 quad cores? As the OS sees them...
Odd to even be concerned...
Clearly, more applications need to be (correctly) multi-threaded. I'm not talking about World of Warcraft or CMU calculation projects here, but more common applications like IE, Office, etc. As polished as Microsoft software is (shielding head against thrown fruit), often the user is still forced to wait while UI rendering is waiting on some other task (Outlook you fat slow pig). Every time the Visual Studio IDE turns white while loading my project, every time Outlook is half rendered and has locked all my input devices, every time some office app "appears" to be idle, yet is locking my mouse (AAARRRGH!), I am reminded of how little (or poorly written) multi-threading there is for mainline software. I assure my boss that my cubicle produces more than just profanity and desk banging.... I have noticed that Mac software appears to be quite a bit better in this regard (shielding head from raging mac haters from earlier posts), as I am not often pounding my fist on the table while using my Mac. I'm not sure if a better architecture, or more "thread aware" programming is the cause.
Windows and Linux aren't designed for PCs beyond quad-core chips [CC], and programmers are to blame for that.
Developers are not the problem. The problem lies further upstream with whomever is creating the functional and technical requirements. Developers develop against those requirements, and if there wasn't a specification for 8 cores, then don't expect it.
What the article is suggesting is that we implement some sort of car-sharing initiative, we stop taking so many cars to the same destination. Or a bus!
But everything's already being transferred on a bus!
Every time I read one of these 'boo hoo more cores don't make things fasterer' stories I find it strange, since the problem domains with which I'm familiar -- Image Processing and Audio Software -- can and do already take advantage of multiprocessing.
In the audio world, you're pushing samples through a directed graph from inputs to outputs, and it's unambiguous to split the processing into threads that can keep the CPU fairly busy.
In Image Processing, and particularly in the Insight Toolkit that I work with daily, image filters are written to run separate threads on regions of the images. It isn't even particularly hard for most tasks, that iterate through a pixel at a time, requiring only read-only access to an input image.
And for software development, where you run builds and rebuilds all day, make -j 8 makes a hell of a difference in how long you wait to do something.
Computer games could really use more cores as well, because the view on screen has the same property as most image processing -- each pixel on screen is an independent computation. If you do parallel ray tracing, doubling the cores can nearly double the frame rate. That's why hardcore gamers pay the big bucks for multi-card solutions -- the graphics cards are rendering in parallel.
Now if you're talking about a spreadsheet or a web browser, it's hard to see the benefit. That's why so many people buy pokey little Atom netbooks -- nothing they do would have taxed a 1GHZ PIII ten years ago particularly.
Agreed on all points, especially the irony of how back Cryix 6x86 processors sucked at floating point... they got their start designing math coprocessors, after all :).
512 MB RAM, 20 GB disk, 200 GB transfer, five datacenters. $19.95/month.
Unfortunately, this effect just makes the programmers job worse. It means that if he can only get the complexity estimate to within a factor of 100 for CPU usage, by the time Amdahl's law is done, his estimate will only good within a factor of 1000. To me, this screams, if you really need multi-core capability, you probably need a cluster too.
How likely is it that if a programmer shows a user some code, and the feedback is the code is too slow, that the user will be satisfied with a 2:1 or a 4:1 speedup?
Most supercomputers these days aren't single machines, they're clusters. Google "beowulf" for examples. See http://www.cbronline.com/news/linux_x86_clusters_take_over_top_500_supercomputer_ranking, they noticed the trend back in 2004.
2:1 is probably only just noticeable, assuming it isn't an actual timed test. Anything that highly depends on user responsiveness (i.e. gaming and simulations) needs pretty dramatic pickups before the user will categorically agree it is better.
Even the more reasonable ones will want a "meaningful" increase. So the time saved has to be enough that they could do something useful with it. i.e. shoot another bad guy, beat the market to a good deal, go out for a smoke break, get home an hour earlier, etc.
Behold, this dreamer cometh. Come now, and let us slay him... and we shall see what will become of his dreams.
Twenty years ago processors were slow, but some UNIX boxes had more than one. Where this was the case, pipes and named pipes could be used to keep more than one CPU busy. Such techniques were often used to for linking troff, eqn, etc. The skill required was not much more than the ability to break a task down into large sized units that could work independently. Of course not all tasks are amenable to such an approach, but many are.